Describe the bug
When used with n>=3 workers with SecureNN
wks = [ sy.VirtualWorker(hook, id="w#%d" % i) for i in range(3) ]
crypto_prov = sy.VirtualWorker(hook, id="crypto_prov")
t = torch.zeros(3,3)
t = t.fix_prec().share(*wks, crypto_provider=crypto_prov)
t = t * 2
t = t.get()
t = t.float_prec()
print(t)
This doesn't work correctly:
tensor([[-1.0000e-03, 0.0000e+00, 1.0000e-03],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[-1.0000e-03, 1.0000e-03, 1.8447e+13]])
In this particular case where we multiply by a scalar, I think we do something useless:
we first multiply the multiplier by base ** precision_fractional and then truncate. When I remove this, the results are ok.
But I agree that there are still some problems when multiplying 2 ASTs shared between more than 2 workers
EDIT not enough
Use a crypto provider and that should be sufficient
Hmm @LaRiffle, this doesn't seem sufficient to fix the problem of >2 workers:
wks = [ sy.VirtualWorker(hook, id="w#%d" % i) for i in range(3) ]
crypto_prov = sy.VirtualWorker(hook, id="crypto_prov")
t = torch.zeros(3,3)
t = t.fix_prec().share(*wks, crypto_provider=crypto_prov)
t = t * 2
t = t.get()
t = t.float_prec()
print(t)
tensor([[-1.8447e+13, 1.8447e+13, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00],
[-1.0000e-03, -1.8447e+13, 1.8447e+13]])
@LaRiffle, a quick fix using the in-place operation somehow works:
wks = [ sy.VirtualWorker(hook, id="w#%d" % i) for i in range(3) ]
t = torch.zeros(3,3)
t = t.fix_prec().share(*wks)
t *= 2
t = t.get()
t.float_prec()
For the very specific problem of multiplying by an integer, I submitted a pull request and it passed all tests.
Here's another example - calculating avg of multiple zero tensors shared to multiple workers.
Works fine with num = 2.
num = 10
wks = [ sy.VirtualWorker(hook, id="wk#%d" % i) for i in range(num) ]
tensors = [ torch.zeros(5, 5).fix_prec(precision_fractional=16).share(*wks) for _ in range(num) ]
avg = tensors[0]
for i in range(num-1):
avg += tensors[i+1]
avg /= len(tensors)
avg = avg.get().float_prec()
print(avg)
tensor([[ 1.0000e-16, -1.8447e+02, -1.8447e+02, -1.8447e+02, -1.8447e+02],
[ 1.8447e+02, 9.2234e+01, 1.8447e+02, -1.8447e+02, 9.2234e+01],
[ 1.8447e+02, 1.8447e+02, 0.0000e+00, -1.8447e+02, 1.8447e+02],
[ 1.0000e-16, -1.8447e+02, -1.8447e+02, 1.8447e+02, -1.0000e-16],
[ 1.8447e+02, 1.0000e-16, 0.0000e+00, 1.8447e+02, -1.8447e+02]])
Expected is tensor of zeros.
@vvmnnnkv, Division in FixedPrecisionTensor is not even implemented. @Jasopaum, maybe division by integer is a nice feature to be implemented soon.
@Jasopaum A big problem with operations between more than 2 ast's is that when recombining each of them there is a greater likelyhood in losing precision in the least significant bit. An intuitive way to think about this problem is to think of the last digit as a float which adds up to 1 but is rounded. For the n=2 case, unless you split the number such that both are exactly equal to 0.5, one of the numbers will round to 1 and the other to 0. However, when n>2, it is likely that all will round to 0 leading to a loss in precision
We actually used to have test for this which was super flakey due to the above instability
@joseilberto, I think we left out the division for FixedPrecisionTensor on purpose, to avoid a loss of precision when division isn't exact... But I'm not sure why we couldn't have something like a floor division, you're right. Maybe @LaRiffle knows a bit more about FixedPrecisionTensors?
@joseilberto we'd love to have division with fixed_precision! We haven't had a use case yet where division by a FixedPrecision divisor was needed, we only needed integer division, that's why you don't see an implementation so far.
One thing to notice, is that for example if you don't see a method (like __truediv__) in precision.py it doesn't mean it's not implemented, it means that the basic PyTorch behaviour will be used. For integer division for example you don't need to specify a specific behaviour different from PyTorch.
@LaRiffle, I do understand that what it does is exactly use whatever comes from the inheritance (that is why I suggested using the in-place multiplication operator and it works properly). In the division by integer case, we are constrained by the precision_fractional we have set and it performs poorly if we need more precision just as pointed by @vvmnnnkv.
Hey yes you're right, but this is linked to the additive sharing part not the fixed_precision: take the same example from @vvmnnnkv and remove .share() and .get() it will work
So as a recap we have issues with additive shared tensor with mul or div with integers with n>2 workers
Is there any update on this issue? Multiplying and dividing tensors by integers is a pretty common operation, and it took me a while to understand why torch.mean returned an absurd value when working with n>=3 workers.
We just had a great PR merged (#2982) which will help for this issue which is still open!
https://github.com/OpenMined/PySyft/pull/3148 also adresses secure comparison for more workers, but @artix41 if yoy're willing to do the checks for multiplication & addition I'd be very graeful!
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.