Pysyft: GREAT FIRST PROJECT: Make PySyft FAST!!!!

Created on 8 Oct 2018  路  23Comments  路  Source: OpenMined/PySyft

This issue is probably the BEST beginner issue because it doesn't require any knowledge outside of basic python. The goal is just to take this demo

https://colab.sandbox.google.com/drive/17upxCYJmJ6Zoxv0KjiJ1ZbchlJybsfhs#scrollTo=BXp7uO9wi1qo

and make it as fast as possible by modifying the PySyft codebase (READ: it's not necessarily about modifying the demo code itself... more about making the library generically faster so that this demo runs faster)

This is also a group project - meaning you can feel free to jump on at any time. Just comment with your interest below!!!

Also - make sure you join the #team_pysyft channel in our slack slack.openmined.org

Edit: If you're new to optimizing python code - here's a good tutorial

Good first issue Type

Most helpful comment

I am in!!!

All 23 comments

I am new here but interested.

This is the perfect issue for new!

Step 1) see if you can get that colab notebook running :)
Step 2) see if you can get the same demo running on your local machine (install pysyft and such)
Step 3) start trying to make the code faster!

I am interested!

The slack link not working.

Fixed the link!

The main pieces I expect to be the bottlenecks are:

1) anything involving python string computation / comparisons (many of these we can replace with ENUMs)
2) send/receiving the same message multiple times (such as calls to my_var.get_shape())
3) message size (this is a big one)

But this is certainly far from exhaustive.

Another big one is "garbage collection" - at present for remote tensors, we never delete them (even if you delete the pointer to them locally). This means that PyTorch doesn't get to re-use memory objects (which is a big part of its optimization) and instead has to create new objects all the time (this also blows up RAM).

Here's the thing that doesn't work.

x = syft.FloatTensor([1,2,3,4,5])

x.send(bob)

del x

at this point... bob._objects will STILL have the tensor "x" despite the fact that the pointer to it has been deleted!!!

Another optimization is when we repeatedly send the same commands to a worker over and over again... For example

for epoch in range(n_epochs)

     # all my neural network code

We should explore ways to not have to repeatedly send the same operations over and over again when most of our training code is highly iterative (as above)

Another optimization is when we repeatedly send the same commands to a worker over and over again... For example

for epoch in range(n_epochs)

     # all my neural network code

We should explore ways to not have to repeatedly send the same operations over and over again when most of our training code is highly iterative (as above)

Great! Thanks for the ideas. I will start working on this one asap.. :D

The SIMPLEST project to start is to replace our use of "strings" with ENUMS. @LaRiffle has already gotten started with this wonderful PR merged earlier this week (which you can use as an example for reference) https://github.com/OpenMined/PySyft/pull/1593/files

@iamtrask I am interested in contribution

Excellent! Welcome aboard @avnish98!!

@iamtrask I would like to contribute too.

Excellent @bksahu !!! I've put a few ideas for optimization in the thread above - although feel free to come up with your own by poking around the codebase! (and using a python Profiler)

I am in as well

For those of you interested - you may find this blogpost tutorial on profiling python code helpful!

http://mbatchkarov.github.io/2014/07/14/profiling-python/

Especially %lprun can prove useful for profiling on notebooks!

I am in!!!

I'll be helping with this too!

I'd love to help as well!

I think one of the next big things we can do (this is a bigger project) is to replace all of our "if/else" logic with a dictionary of functions. This is particularly important in serialization. (https://github.com/OpenMined/PySyft/blob/master/syft/core/frameworks/encode.py)

You'll notice there are a ton of really long if/else lists which would likely be quite a bit faster with a simple dictionary lookup to the correct function.

I'll help with the if-else issue

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samsontmr picture samsontmr  路  3Comments

beatrizsmg picture beatrizsmg  路  4Comments

iamtrask picture iamtrask  路  3Comments

gmuraru picture gmuraru  路  4Comments

MetaT1an picture MetaT1an  路  3Comments