[Related to #1909]
I tracked down the error to
'optimizer = optim.Adam(model.parameters(), lr=1e-3)' in my code, 'optimizer = optim.SGD(model.parameters(), lr=1e-3)' works.
My guess is that this is related to me using PyTorch 1.0 .
The code can be found in https://github.com/2fasc/Distributed_Malaria/blob/master/src/federated_training.py
Traceback (most recent call last):
File "/home/fasc/Documents/Distributed_Malaria/src/federated_training.py", line 143, in <module>
simple_federated_model()
File "/home/fasc/Documents/Distributed_Malaria/src/federated_training.py", line 66, in simple_federated_model
federated=True,
File "/home/fasc/Documents/Distributed_Malaria/src/auxiliaries.py", line 111, in train
optimizer.step()
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/torch/optim/adam.py", line 94, in step
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/frameworks/torch/hook.py", line 650, in overloaded_native_method
response = method(*new_args, **new_kwargs)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/frameworks/torch/hook.py", line 486, in overloaded_pointer_method
response = owner.send_command(location, command)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/base.py", line 364, in send_command
_ = self.send_msg(MSGTYPE.CMD, message, location=recipient)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/base.py", line 198, in send_msg
bin_response = self._send_msg(bin_message, location)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/virtual.py", line 6, in _send_msg
return location._recv_msg(message)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/virtual.py", line 9, in _recv_msg
return self.recv_msg(message)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/base.py", line 229, in recv_msg
response = self._message_router[msg_type](contents)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/workers/base.py", line 316, in execute_command
getattr(_self, command_name)(*args, **kwargs)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/frameworks/torch/hook.py", line 636, in overloaded_native_method
raise route_method_exception(e, self, args, kwargs)
File "/home/fasc/miniconda3/envs/malaria/lib/python3.6/site-packages/syft/frameworks/torch/hook.py", line 630, in overloaded_native_method
response = method(*args, **kwargs)
TypeError: addcmul_() takes 2 positional arguments but 3 were given
Hmm - it's not totally clear to me why this would be the case. Will require a deeper look. I don't think we have any unit tests around Adam so this might just require new functionality to get working.
I'm looking at it right now
First observation: Adam optim works in the setting of https://github.com/OpenMined/PySyft/blob/dev/examples/tutorials/Part%208%20-%20Federated%20Learning%20on%20MNIST%20using%20a%20CNN.ipynb if you replace the SGD optim with Adam.
Oh actually it fails, but after a certain number of batch iterations
Train Epoch: 1 [0/60032 (0%)] Loss: 2.303720
Train Epoch: 1 [1920/60032 (3%)] Loss: 3.148396
Train Epoch: 1 [3840/60032 (6%)] Loss: 4.953410
Train Epoch: 1 [5760/60032 (10%)] Loss: 6.552952
Train Epoch: 1 [7680/60032 (13%)] Loss: 8.274794
Train Epoch: 1 [9600/60032 (16%)] Loss: 9.585523
Train Epoch: 1 [11520/60032 (19%)] Loss: 11.709550
Train Epoch: 1 [13440/60032 (22%)] Loss: 14.805451
Train Epoch: 1 [15360/60032 (26%)] Loss: 14.508088
Train Epoch: 1 [17280/60032 (29%)] Loss: 18.298759
Train Epoch: 1 [19200/60032 (32%)] Loss: 16.543806
Train Epoch: 1 [21120/60032 (35%)] Loss: 19.365683
Train Epoch: 1 [23040/60032 (38%)] Loss: 23.546900
Train Epoch: 1 [24960/60032 (42%)] Loss: 26.782951
Train Epoch: 1 [26880/60032 (45%)] Loss: 28.923700
Train Epoch: 1 [28800/60032 (48%)] Loss: 28.413818
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<timed exec> in <module>
<ipython-input-6-72ddef6ce5b6> in train(args, model, device, federated_train_loader, optimizer, epoch)
8 loss = F.nll_loss(output, target)
9 loss.backward()
---> 10 optimizer.step()
11 model.get() # <-- NEW: get the model back
12 if batch_idx % args.log_interval == 0:
~/code/env/pysyft/lib/python3.7/site-packages/torch/optim/adam.py in step(self, closure)
92 # Decay the first and second moment running average coefficient
93 exp_avg.mul_(beta1).add_(1 - beta1, grad)
---> 94 exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
95 if amsgrad:
96 # Maintains the maximum of all 2nd moment running avg. till now
~/code/PySyft/syft/frameworks/torch/hook.py in overloaded_native_method(self, *args, **kwargs)
650 # Send the new command to the appropriate class and get the response
651 method = getattr(new_self, method_name)
I suspect there is a bug in the hooking as the loss is increasing
Ok here is the trouble: Adam uses momentum (actually: second moments of the gradients), which means it stores the gradients in a list and for each batch produces a correction of the current gradient based on the old ones. When changing of batch owner (so in Part 8 of tutorial at the middle of the epoch), you have now gradients from alice which you want to correct with moments of old gradients owned by bob: this is not possible as the data needs to be at the same location and it raises an error, which is here a bit tricky to find.
This is also why momentum is not supported so far on SGD.
The fix for this would probably imply to rewrite the optimizers. This is an important project and maybe could be correlated to the notions of aggregator needed for Federated or Secure Averaging.
Thank you for reporting the error!
Is there a way for us to perhaps clear out the momentum as needed? (reset to None)
Hey @2fasc - seems like the solution here is to have different optimizers for different machines (one for bob, another for alice, etc.) Eventually we'll write custom Federated optimizers but for now this is the solution :)
Fastest way to do this is probably to have a dict of optimizers, the key being the data.location, this would keep the code simple
Just wondering, is this related to #1909 ?
@mari-linhares exactly!
Yes, according @iamtrask , you can write a optimizer list like Part4 or Par10, then you can use Adam, Adadelta or other optimizer with momentum. @2fasc
@LaRiffle I've experienced the increasing loss issues with my Adam as well.
But, since it wasn't raising errors, I tried to see where things went numerically wrong.
Somehow, in Adam's step() method, changing this line
exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
to this line
exp_avg_sq.mul_(beta2).add_(grad.pow(2).mul_(1 - beta2))
and this line
p.data.addcdiv_(-step_size, exp_avg, denom)
to this line
p.data.add_(exp_avg.div(denom).mul_(-step_size))
made it work and the loss decreased with no problem.
So, it seems like addcmul_() and addcdiv_() functions have issues working with PySyft, but doing what those functions do step by step somehow fixes them.
@LaRiffle I've experienced the increasing loss issues with my Adam as well.
But, since it wasn't raising errors, I tried to see where things went numerically wrong.
Somehow, in Adam's
step()method, changing this lineexp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)to this line
exp_avg_sq.mul_(beta2).add_(grad.pow(2).mul_(1 - beta2))and this line
p.data.addcdiv_(-step_size, exp_avg, denom)to this line
p.data.add_(exp_avg.div(denom).mul_(-step_size))made it work and the loss decreased with no problem.
So, it seems like
addcmul_()andaddcdiv_()functions have issues working with PySyft, but doing what those functions do step by step somehow fixes them.
cool
When this is fixed remember to remove the TODO at https://github.com/OpenMined/PySyft/blob/319d59c9aa3ab9b8dfda95c42f7dd5978b3bc48a/examples/tutorials/advanced/websockets-example-MNIST/run_websocket_client.py#L55
Just a small report, I'm running into a similar error, but it appears a few lines earlier:
exp_avg.mul_(beta1).add_(1 - beta1, grad)
The following error appears:
TypeError: add_() takes 1 positional argument but 2 were given
So it's to be a more general problem than only specific functions like addcmul_() and addcdiv_()
Just a small report, I'm running into a similar error, but it appears a few lines earlier:
exp_avg.mul_(beta1).add_(1 - beta1, grad)The following error appears:
TypeError: add_() takes 1 positional argument but 2 were givenSo it's to be a more general problem than only specific functions like
addcmul_()andaddcdiv_()
I encountered an error in the same line, but it raises
syft.exceptions.PureFrameworkTensorFoundError
Syft 0.1.26a1
Torch 1.1.0
Any solutions yet?
Yes - the solution was the FL Optimizer project. We can close this issue.
On Tue, Mar 17, 2020 at 12:49 PM Aditya Malte notifications@github.com
wrote:
Any solutions yet?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/OpenMined/PySyft/issues/2070#issuecomment-600052348,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABBAZEQSAB62MQJMRJT2MQTRH5WW5ANCNFSM4HF52S2Q
.
Could you give a link please?
Yes - the solution was the FL Optimizer project. We can close this issue.
…
On Tue, Mar 17, 2020 at 12:49 PM Aditya Malte @.*> wrote: Any solutions yet? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2070 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBAZEQSAB62MQJMRJT2MQTRH5WW5ANCNFSM4HF52S2Q .
Alright, I'll close the issue then
Most helpful comment
@LaRiffle I've experienced the increasing loss issues with my Adam as well.
But, since it wasn't raising errors, I tried to see where things went numerically wrong.
Somehow, in Adam's
step()method, changing this lineto this line
and this line
to this line
made it work and the loss decreased with no problem.
So, it seems like
addcmul_()andaddcdiv_()functions have issues working with PySyft, but doing what those functions do step by step somehow fixes them.