Describe the bug
I was following the great tutorial put out by @DanyEle and ran into an issue on the very last step of the Jupyter notebook tutorial with the predict comparison.
I am getting the following error after it prints the line "Qing":
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I am trying to understand what the above error is but I can't figure it out. I have a suspicion that this error is occurring due to some error in communication with my two Raspberry Pi's that I have set up for the federated training. I am double checking all my installed packages to make sure everything is the same between the Pi's and control (my laptop).
I am going to try training again and make sure my packages are up to date. Will add error message if the issue persists.
UPDATE: Error is still persisting - I have added my error log below.
Thanks!
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Should output something such as:
Qing
(-0.18) Chinese
(-2.85) German
(-3.16) Italian
Qing
(-0.73) Vietnamese
(-1.23) Korean
(-2.90) Arabic
Daniele
(-1.32) Japanese
(-1.47) German
(-1.69) Greek
Daniele
(-1.95) French
(-2.04) Scottish
(-2.13) Irish
Screenshots
None
Desktop (please complete the following information):
Additional context
Error traceback
> Qing
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-23-6650cbe85de0> in <module>
1
----> 2 predict(model_pointers["alice"], "Qing", alice)
3 predict(model_pointers["bob"], "Qing", bob)
4
5 predict(model_pointers["alice"], "Daniele", alice)
<ipython-input-22-3a84bf13c0eb> in predict(model, input_line, worker, n_predictions)
3 print('\n> %s' % input_line)
4 with torch.no_grad():
----> 5 model_remote = model.send(worker)
6 line_tensor = lineToTensor(input_line)
7 line_remote = line_tensor.copy().send(worker)
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in module_send_(nn_self, dest)
950
951 if module_is_missing_grad(nn_self):
--> 952 create_grad_objects(nn_self)
953
954 for p in nn_self.parameters():
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in create_grad_objects(model)
943 for p in model.parameters():
944 o = p.sum()
--> 945 o.backward()
946 p.grad -= p.grad
947
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
651 except BaseException as e:
652 # we can make some errors more descriptive with this method
--> 653 raise route_method_exception(e, self, args, kwargs)
654
655 else: # means that there is a wrapper to remove
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
645 try:
646 if isinstance(args, tuple):
--> 647 response = method(*args, **kwargs)
648 else:
649 response = method(args, **kwargs)
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
105 products. Defaults to ``False``.
106 """
--> 107 torch.autograd.backward(self, gradient, retain_graph, create_graph)
108
109 def register_hook(self, hook):
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 Variable._execution_engine.run_backward(
92 tensors, grad_tensors, retain_graph, create_graph,
---> 93 allow_unreachable=True) # allow_unreachable flag
94
95
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hi, I'm glad to know you're actually trying my code for Raspberry Pis and that it all seems to be running up to the last point. I have two questions:
Which version of PySyft are you using on the Raspberry PIs and on your laptop? How did you install it?
Did you try using the model locally by getting it with model.copy().get() at all places where it is used in the predict phase? It may be an issue with the remote execution, so running it locally with native PyTorch would solve it.
Hey @DanyEle in response to your questions:
1a. Which version of PySyft are you using on the Raspberry Pi's? How did you install it?
syft==0.1.14a1git clone https://github.com/OpenMined/PySyft.git
cd PySyft
pip3 install -r requirements.txt
python3 setup.py build
sudo -E python3 setup.py install
1b. Which version of PySyft are you using on your laptop? How did you install it?
syft==0.1.14a1pip install syft==0.1.14a1 in a conda python environment (the specific versioning was to match the Raspberry Pi's version of PySyft).Hey @DanyEle - tried your suggestions on part two but still the same error as before; here is the predict function that I was trying to use per your suggestion:
def predict(model, input_line, worker, n_predictions=3):
model = model.copy().get()
print('\n> %s' % input_line)
with torch.no_grad():
model_remote = model.send(worker).copy().get()
line_tensor = lineToTensor(input_line)
line_remote = line_tensor.copy().send(worker)
#line_tensor = lineToTensor(input_line)
#output = evaluate(model, line_remote)
# Get top N categories
hidden = model_remote.initHidden()
hidden_remote = hidden.copy().send(worker)
for i in range(line_remote.shape[0]):
output, hidden_remote = model_remote(line_remote[i], hidden_remote)
topv, topi = output.copy().get().topk(n_predictions, 1, True)
predictions = []
for i in range(n_predictions):
value = topv[0][i].item()
category_index = topi[0][i].item()
print('(%.2f) %s' % (value, all_categories[category_index]))
predictions.append([value, all_categories[category_index]])
P.S. For reference, here is the error log:
> Qing
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-23-6650cbe85de0> in <module>
1
----> 2 predict(model_pointers["alice"], "Qing", alice)
3 predict(model_pointers["bob"], "Qing", bob)
4
5 predict(model_pointers["alice"], "Daniele", alice)
<ipython-input-22-7031e812578d> in predict(model, input_line, worker, n_predictions)
3 print('\n> %s' % input_line)
4 with torch.no_grad():
----> 5 model_remote = model.send(worker).copy().get()
6 line_tensor = lineToTensor(input_line)
7 line_remote = line_tensor.copy().send(worker)
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in module_send_(nn_self, dest)
950
951 if module_is_missing_grad(nn_self):
--> 952 create_grad_objects(nn_self)
953
954 for p in nn_self.parameters():
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in create_grad_objects(model)
943 for p in model.parameters():
944 o = p.sum()
--> 945 o.backward()
946 p.grad -= p.grad
947
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
651 except BaseException as e:
652 # we can make some errors more descriptive with this method
--> 653 raise route_method_exception(e, self, args, kwargs)
654
655 else: # means that there is a wrapper to remove
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
645 try:
646 if isinstance(args, tuple):
--> 647 response = method(*args, **kwargs)
648 else:
649 response = method(args, **kwargs)
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
105 products. Defaults to ``False``.
106 """
--> 107 torch.autograd.backward(self, gradient, retain_graph, create_graph)
108
109 def register_hook(self, hook):
/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 Variable._execution_engine.run_backward(
92 tensors, grad_tensors, retain_graph, create_graph,
---> 93 allow_unreachable=True) # allow_unreachable flag
94
95
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Alright, I looked into it and it was because of the misplaced line with torch.no_grad():, which was referring to a larger block. Here is the fixed code for the predict method
def predict(model, input_line, worker, n_predictions=3):
model = model.copy().get()
print('\n> %s' % input_line)
model_remote = model.send(worker)
line_tensor = lineToTensor(input_line)
line_remote = line_tensor.copy().send(worker)
#line_tensor = lineToTensor(input_line)
#output = evaluate(model, line_remote)
# Get top N categories
hidden = model_remote.initHidden()
hidden_remote = hidden.copy().send(worker)
with torch.no_grad():
for i in range(line_remote.shape[0]):
output, hidden_remote = model_remote(line_remote[i], hidden_remote)
topv, topi = output.copy().get().topk(n_predictions, 1, True)
predictions = []
for i in range(n_predictions):
value = topv[0][i].item()
category_index = topi[0][i].item()
print('(%.2f) %s' % (value, all_categories[category_index]))
predictions.append([value, all_categories[category_index]])
Hey @DanyEle and @iamtrask
pyyaml==5.1sudo apt-get update && sudo reboot after installing PyTorch on the RPi as I had some issues moving forward with installsrequirements.txt for PySyft (when you are trying to build and install it on the RPi) you need to remove the following packages for it to compile/build correctly:python run_websocket_server.py --host 10.42.0.55 --port 8777 --id alicepython run_websocket_server.py --host 10.42.0.56 --port 8778 --id bobalias python=python3.6 while following your instructions so I would suggest adding that as a way of setting your default RPi Python to Python3.6I also think it may be useful to have actually demonstrated how to assign a static IP to the Raspberry Pi's; I stubbed out the basic outline of how to do that here (these steps are to be followed on the RPi's):
Run netstat -nr
Run sudo nano /etc/dhcpcd.conf and at the top of the file write,
ip_address=[Write in the Gateway address but with the last numbers specified to a number you want]/24 (example: if my gateway address is 10.42.0.1, then I would write in something like: ip_address=10.42.0.77/24)static routers=[Write down the Gateway address]static domain_name_servers=[Write down the Gateway address from earlier]Reboot the RPi and SSH into the RPi using the ip_address you just specified earlier.
Thank you very much @TheCedarPrince for the useful fixes to the tutorial I wrote and good job in setting it all up on the two Raspberry PIs! I hope the tutorial helped you with some guidance on how to go through the whole procedure to setup PySyft and Pytorch, at least. :)
I'm surely going to apply your fixes to my tutorial, but a static IP address, even if suggested to carry out experiments multiple times, is not actually a necessary condition to run the RNN example.
Well since everything is fixed here, I am going to close the issue!
This was more than helpful - it is was an amazing piece of work and I am looking forward to looking at other work on PySyft and PyTorch. : )