Pysyft: PyTorch 1.0 Refactor Plan

Created on 3 Oct 2018 · 4Comments · Source: OpenMined/PySyft

Very excited to say that PyTorch 1.0 has been released.

In this issue (discussion) - we will refactor our current code to reach parity with our current implementation which is compatible with PyTorch 0.3.1. We are working in the torch_1 branch.

Roadmap

Part 1: Serde

[x] Serde: https://github.com/OpenMined/PySyft/issues/1656
[x] Codebase Cleanup: https://github.com/OpenMined/PySyft/issues/1783
[x] Tensor Abstractions: https://github.com/OpenMined/PySyft/issues/1657
[x] Garbage Collection: https://github.com/OpenMined/PySyft/issues/1658
[ ] Differential Privacy: https://github.com/OpenMined/PySyft/issues/1824
[ ] Batching/Async Messaging: https://github.com/OpenMined/PySyft/issues/1659
[ ] Message Caching: https://github.com/OpenMined/PySyft/issues/1660
[ ] Test Mode/Decorator for Encrypted Tensors: https://github.com/OpenMined/PySyft/issues/1823

Code Style/Design Choices

Use of Strings - use of string based IDs is strictly avoided for any performance essential operations - "name" features can be added for convenience but must not be used for default identification.
If/Else Control Logic - use of long if/else lists based on typing information is strictly avoided. Dictionaries and type enums must be used instead - keys are the type enums and the values are individual functions for how to handle each type. This is particularly important for serialization
Strong Inline Documentation Enforced - we won't be merging code unless it has detailed explanations for its functionality. This is the standard.. You should be able to learn everything you need to know by reading the comments in the code.
Type Annotations Enforced - enforcing good typing annotations from the start
All Commands Use Same SerDe Logic - instead of multiple separate ways of going from args/kwargs of various types to the serialized format, we will use the same serialize() and deserialize() methods.
Combined Message Logic For Send/Reply - right now we have different control logic for sending / responding to a message. This should instead use the same logic.
Code Generate Unit Tests - we should be able to iterate through every method and have a set of functionality tests that we then re-use for (more or less) all tensor types. This would allow us to have (more or less) full test coverage for every new tensor type right out of the gate.
Clean commit message - The list of commits in a PR should be sufficient to understand what the PR intends to do. This means: Put explicit keywords, sqaush redundant commits before pushing, etc. This is the standard.
All code in the codebase must meet the current style requirements (black+flake8)
All new pull requests must be automatically kept in compliance (Through precommit and travis hooks)

Deprecated Objects/Features

LocalTensor - no more LocalTensor, instead we'll just run the unhooked pytorch code (native Tensor) which should have equal performance with unhooked pytorch. Basically, if .child==None then it will run PyTorch like normal (this will also work for end-of-chain tensors holding data)
No More Loops in Tensor Chains - instead of re-using the data tensor for the wrapper (creating a loop, which really complicated our logic), we'll just take the overhead of always initializing special "wrapper" tensors which will be empty.
No More Variable - PyTorch 1.0 doesn't have the Variable class.

New Client API Features

Async Message Support async messages will be first class citizens - which will return handles which you can force a thread to wait for later if necessary (this should even work for VirtualWorkers)
pointer.child.child interface - you should be able to direct commands to specific children on remote tensors using the same interface you would if they were local (.child.child). However, this should simply give an iterator to the command to tell it which child id (id == placement not unique id... like ptr.child.child.child means "call this command on the 3rd child of the tensor on the remote machine"). It should also work for arguments passed to methods.
Remote Error Propagation - one "message type" to send over the wire should be Exception so that remote exceptions get try/catch and passed to the local worker. Aka - the remote worker should NEVER go down/fail/stop the thread but you should get stack traces locally.
Message Grouping API - you should be able to designate to wait on sending a message by calling .begin_msg_group() on a variable, at which point it blocks all messages until it gets to .release_msg_group() which sends the message as a bulk message. Note that this should combine with the Message Caching (in the previous bullet) to make it so that when the group is sent repeatedly it ends up just sending a hash (single int) instead of the entire group of messages (big performance win!).
Backward Overriding - there will be 3 logical options when calling .backward(). 1) If .child is a pointer then by default it forwards .backward() to the remote which does all backpropagating in C++ there. 2) If .child is None, then it runs .backward() locally (like PyTorch normally does). 3) If .child is something other than None then it writes a custom python-only .backward() which we will write. Note that these can be combined, particularly 1 and 3. You might call .backward() on a pointer locally which calls .backward(2) on the remote machine (say... because .child is a fixedprecisiontensor).
CUDA Support Enforced From the Start - we didn't get around to supporting CUDA and it would be a big pain to add to 0.3.1 now. We should do this from the start in the 1.0 refactor.

Optimization Focused Improvements

Garbage Collection Enforced From the Start - we have tons of memory leaks in the 0.3.1 version which is so complex it's more or less irrecoverable. We need to start from scratch and enforce __del__ working correctly for all types with unit tests at all times (this will require using the "weakref" package to prevent circular dependencies). Note this includes garbage collection of remote objects. However, we won't be supporting multiple pointers to the same remote object yet (this can be an upgrade). Aka - if there are multiple pointers to the same object and one calls .__del__ it should go ahead and delete the object.
Default Tensor Serialization - we're going to serialize Tensors by default using pickle - as torch has a custom pickle handler that is very performant.
Tuple/Enum based serialization instead of JSON/string - instead of a big hierarchy of nested dictionaries with string based keys, we're going to use simple nested tuples where the size/order of the tuple determines its interpretation.
LZ Message Compression By Default - add support for fast message compression - evaluate which contexts it adds to performance.
Message Caching By Default - because model training is highly iterative, most of the commands are the same for every iteration. We should have a way to automatically check to see if we've sent a (series of) message(s) before serializing it. If the message has already been sent, then just send an index to it in the cache (the receiving VirtualWorker should also have a cache of messages sent recently).
One Tensor Diff Message Caching - most messages will be "off by one" in an iterative loop because the source variable will be unique (such as the tensor for activations in a neural network, but the weights are the same). We should provide caching for these partial messages.
Set Destination Tensor / Tensor IDs - in our existing API we have the client prescribe what the ID of the final tensor should be for remote operations. 1) We should bake this into the async support for PointerTensors so that when you call long series of complex pointer logic it executes it as fast as it can (some operations happening in parallel) with things only blocking to make sure the pointer has actually been created. Furthermore, we should be able to use this technique of sending "destination pointers" to tell PyTorch to re-use Tensors it had previously (this will help activate message caching as well).
Message Ints and Hashing - using Enums, messages should be exclusively nested tuples of Integers such that hashing can be very efficient. (the one exception being when data is transferred as a series of bytes in which case the data's ID is used in the hash)
Experimental: content/hash based tensor IDs IPFS identifies their objects based on IDs. This seems like it might be quite convenient in some cases, but more expensive in others (generating a random Int is a lot easier than hashing a large matrix). Possibly we only hash random parts of the tensor? Not sure.
Use C++ When Possible - for anything that we can, let's use the C++ api to minimize Python object overhead.

Testing

Every line of the codebase must be run at some point during testing
This will be verified automatically with coverage.py in travis

Documentation

Every user facing method must have documentation in the form of a docstring
These docstrings will be automatically compiled into a web viewable form using sphinx/gitbook

In particular these documentation resources seem very helpful!
https://pytorch.org/tutorials/advanced/cpp_extension.html
https://pytorch.org/tutorials/intermediate/dist_tuto.html
https://www.geeksforgeeks.org/enum-in-python/
https://www.aeracode.org/2018/02/19/python-async-simplified/

Help Wanted Status Type Type

Source

iamtrask

Most helpful comment

Sadly subclassing is not yet a feature of PyTorch (despite it being included as a possible feature of 1.0) However, the codebase should become _much_ less complicated without the circular references (which is where most of the complexity comes from)

iamtrask on 16 Oct 2018

👍2

All 4 comments

I'm going to kick off this project by creating a new branch for PySyft (torch_1) which is more or less empty. We can then pull in unit tests and functionality from master as needed.

iamtrask on 3 Oct 2018

For clarification, is there a way to extend tensors without hooking/ do we still need a torch tensor at the top of the chain?

robert-wagner on 14 Oct 2018

iamtrask on 16 Oct 2018

👍2

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.