I created this issue as a discussion board about design. So please chip in you views.
A lot of great code has come in recently, but I think we lack a solid design philosophy about what code should go where. And that has resulted in inconsistent code organization, and could get worse if not addressed. For example:
I think of some things that have bearing on how we define our conventions are:
I think after we come to a consensus about design, we should:
Please share your thoughts.
I agree that this will generally be the case in the short run. The only exception will be for a thing called "packing" where instead of encrypting each scalar individually, and entire vector is encrypted as a unit. While this makes it more challenging to index into specific values, it can increase performance significantly in the right setting.
This is a great question. In many cases, yes they should. This is particularly true for cases wherein there are different ways to accomplish the same thing. For example, when multiplying a vector by a scalar, one can either encrypt the scalar and perform multiplication or perform multiplication with the scalar decrypted. The latter is much faster. This kind of logic we want to bake into the Encrypted tensor (and is likely to be slightly different for each variant).
Whether that means it should justify its own encrypted_math.py is less certain. I don't personally feel that this is clear yet.
we probably need to extend the torch.Tensor() interface. While our function names should already be quite similar, I expect that some of the ways we're storing data might need some changing (not 100% on that yet). That being said, PyTorch itself is designed to interop freely with Numpy, so this is probably more about interface design than changing things under the hood (hopefully)
agreed.
A question regarding our pysyft codebase.
So, if you look at some functions(many) in _math.py_ or _tensor.py_, such as dot/sigmoid/tanh/relu/transpose (check _math.py_) etc.
The inputs they accept are either _numpy_ object or a _TensorBase_ object.
But the tickets associated with those clearly mentioned they should accept a tensor as an input, which was based on pytorch conventions.
In pytorch, the same functions do not accept a _numpy_ object. pytorch throws a TypeError. but our functions accept the _numpy_ object and convert it to a TensorBase object within the function and then do whatever the function was supposed to do and output a tensor.
_ensure_tensorbase(input) vs input = _ensure_tensorbase(input) the usage is what bothers me.
@iamtrask @siarez @aiorla @baldassarreFe what do you guys feel about this?
I see the inconsistency and agree that if we want to stick 100% to PyTorch we should raise an error if the value passed is not in the TensorBase hierarchy. On the other hand, I can't think of a situation where being less strict with the requirements on a parameter would break the code. In the end, we are providing more functionality than what we are "promising" in the docstring, so that should be fine.
I'm with @baldassarreFe in that we're "just" being less strict about some parameters and that can't be breaking. However I think that it might be very confusing for a user to have functions with a more or less "constrained" usage in an arbitrary way (as you have pointed that we have now).
I prefer in general strict documentation, what you get is what it's written. The user of the library doesn't have to depend on "tricks" to do certain computations, and if they're really necessary/convenient they should be "officially" in the docs. I think it also helps a bit in debugging.
And about allowing or not implicit conversion of Tensor-like objects I'm not completely sure. If we want to really encourage using always our Tensors (like PyTorch does) it might make sense to avoid it.
I prefer in general strict documentation, what you get is what it's written. The user of the library doesn't have to depend on "tricks" to do certain computations, and if they're really necessary/convenient they should be "officially" in the docs. I think it also helps a bit in debugging.
Agree with @aiorla especially since this is a pure python library it should naturally respect the Zen Of Python
And about allowing or not implicit conversion of Tensor-like objects I'm not completely sure. If we want to really encourage using always our Tensors (like PyTorch does) it might make sense to avoid it.
It would also be helpful for new contributors like me to understand things like:
Most helpful comment
I'm with @baldassarreFe in that we're "just" being less strict about some parameters and that can't be breaking. However I think that it might be very confusing for a user to have functions with a more or less "constrained" usage in an arbitrary way (as you have pointed that we have now).
I prefer in general strict documentation, what you get is what it's written. The user of the library doesn't have to depend on "tricks" to do certain computations, and if they're really necessary/convenient they should be "officially" in the docs. I think it also helps a bit in debugging.
And about allowing or not implicit conversion of Tensor-like objects I'm not completely sure. If we want to really encourage using always our Tensors (like PyTorch does) it might make sense to avoid it.