Pysyft: Federated Recurrent Neural Network Tutorial Issue with Predict Method (Using Two Raspberry Pi's)

Created on 1 Jun 2019  路  7Comments  路  Source: OpenMined/PySyft

Describe the bug
I was following the great tutorial put out by @DanyEle and ran into an issue on the very last step of the Jupyter notebook tutorial with the predict comparison.

I am getting the following error after it prints the line "Qing":

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I am trying to understand what the above error is but I can't figure it out. I have a suspicion that this error is occurring due to some error in communication with my two Raspberry Pi's that I have set up for the federated training. I am double checking all my installed packages to make sure everything is the same between the Pi's and control (my laptop).

I am going to try training again and make sure my packages are up to date. Will add error message if the issue persists.

UPDATE: Error is still persisting - I have added my error log below.

Thanks!

To Reproduce
Steps to reproduce the behavior:

  1. Follow directions from great tutorial and set up the two Pi's
  2. Follow the directions in the Jupyter notebook tutorial
  3. Check output of cell 24

Expected behavior
Should output something such as:

Qing
(-0.18) Chinese
(-2.85) German
(-3.16) Italian

Qing
(-0.73) Vietnamese
(-1.23) Korean
(-2.90) Arabic

Daniele
(-1.32) Japanese
(-1.47) German
(-1.69) Greek

Daniele
(-1.95) French
(-2.04) Scottish
(-2.13) Irish

Screenshots
None

Desktop (please complete the following information):

  • OS: Ubuntu
  • Version 16.04

Additional context

Error traceback

> Qing
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-6650cbe85de0> in <module>
      1 
----> 2 predict(model_pointers["alice"], "Qing", alice)
      3 predict(model_pointers["bob"], "Qing", bob)
      4 
      5 predict(model_pointers["alice"], "Daniele", alice)

<ipython-input-22-3a84bf13c0eb> in predict(model, input_line, worker, n_predictions)
      3     print('\n> %s' % input_line)
      4     with torch.no_grad():
----> 5         model_remote = model.send(worker)
      6         line_tensor = lineToTensor(input_line)
      7         line_remote = line_tensor.copy().send(worker)

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in module_send_(nn_self, dest)
    950 
    951             if module_is_missing_grad(nn_self):
--> 952                 create_grad_objects(nn_self)
    953 
    954             for p in nn_self.parameters():

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in create_grad_objects(model)
    943             for p in model.parameters():
    944                 o = p.sum()
--> 945                 o.backward()
    946                 p.grad -= p.grad
    947 

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    651                 except BaseException as e:
    652                     # we can make some errors more descriptive with this method
--> 653                     raise route_method_exception(e, self, args, kwargs)
    654 
    655             else:  # means that there is a wrapper to remove

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    645                 try:
    646                     if isinstance(args, tuple):
--> 647                         response = method(*args, **kwargs)
    648                     else:
    649                         response = method(args, **kwargs)

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    105                 products. Defaults to ``False``.
    106         """
--> 107         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    108 
    109     def register_hook(self, hook):

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Type

All 7 comments

Hi, I'm glad to know you're actually trying my code for Raspberry Pis and that it all seems to be running up to the last point. I have two questions:

  1. Which version of PySyft are you using on the Raspberry PIs and on your laptop? How did you install it?

  2. Did you try using the model locally by getting it with model.copy().get() at all places where it is used in the predict phase? It may be an issue with the remote execution, so running it locally with native PyTorch would solve it.

Hey @DanyEle in response to your questions:

1a. Which version of PySyft are you using on the Raspberry Pi's? How did you install it?

  • PySyft on Raspberry Pi: syft==0.1.14a1

    • Installed following your directions in the article you wrote using the commands:

git clone https://github.com/OpenMined/PySyft.git
cd PySyft
pip3 install -r requirements.txt
python3 setup.py build
sudo -E python3 setup.py install

1b. Which version of PySyft are you using on your laptop? How did you install it?

  • PySyft on laptop running Ubuntu 16.04: syft==0.1.14a1

    • Installed via running pip install syft==0.1.14a1 in a conda python environment (the specific versioning was to match the Raspberry Pi's version of PySyft).

  1. Did you try using the model locally by getting it with model.copy().get() at all places where it is used in the predict phase? It may be an issue with the remote execution, so running it locally with native PyTorch would solve it.

    • Trying this now - will update soon.

Hey @DanyEle - tried your suggestions on part two but still the same error as before; here is the predict function that I was trying to use per your suggestion:

def predict(model, input_line, worker, n_predictions=3):
    model = model.copy().get()
    print('\n> %s' % input_line)
    with torch.no_grad():
        model_remote = model.send(worker).copy().get()
        line_tensor = lineToTensor(input_line)
        line_remote = line_tensor.copy().send(worker)
        #line_tensor = lineToTensor(input_line)
        #output = evaluate(model, line_remote)
        # Get top N categories
        hidden = model_remote.initHidden()
        hidden_remote = hidden.copy().send(worker)

        for i in range(line_remote.shape[0]):
            output, hidden_remote = model_remote(line_remote[i], hidden_remote)

        topv, topi = output.copy().get().topk(n_predictions, 1, True)
        predictions = []

        for i in range(n_predictions):
            value = topv[0][i].item()
            category_index = topi[0][i].item()
            print('(%.2f) %s' % (value, all_categories[category_index]))
            predictions.append([value, all_categories[category_index]])


P.S. For reference, here is the error log:



> Qing
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-6650cbe85de0> in <module>
      1 
----> 2 predict(model_pointers["alice"], "Qing", alice)
      3 predict(model_pointers["bob"], "Qing", bob)
      4 
      5 predict(model_pointers["alice"], "Daniele", alice)

<ipython-input-22-7031e812578d> in predict(model, input_line, worker, n_predictions)
      3     print('\n> %s' % input_line)
      4     with torch.no_grad():
----> 5         model_remote = model.send(worker).copy().get()
      6         line_tensor = lineToTensor(input_line)
      7         line_remote = line_tensor.copy().send(worker)

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in module_send_(nn_self, dest)
    950 
    951             if module_is_missing_grad(nn_self):
--> 952                 create_grad_objects(nn_self)
    953 
    954             for p in nn_self.parameters():

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in create_grad_objects(model)
    943             for p in model.parameters():
    944                 o = p.sum()
--> 945                 o.backward()
    946                 p.grad -= p.grad
    947 

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    651                 except BaseException as e:
    652                     # we can make some errors more descriptive with this method
--> 653                     raise route_method_exception(e, self, args, kwargs)
    654 
    655             else:  # means that there is a wrapper to remove

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    645                 try:
    646                     if isinstance(args, tuple):
--> 647                         response = method(*args, **kwargs)
    648                     else:
    649                         response = method(args, **kwargs)

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    105                 products. Defaults to ``False``.
    106         """
--> 107         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    108 
    109     def register_hook(self, hook):

/home/src/Programs/miniconda3/envs/federated/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Alright, I looked into it and it was because of the misplaced line with torch.no_grad():, which was referring to a larger block. Here is the fixed code for the predict method

def predict(model, input_line, worker, n_predictions=3):
    model = model.copy().get()
    print('\n> %s' % input_line)
    model_remote = model.send(worker)
    line_tensor = lineToTensor(input_line)
    line_remote = line_tensor.copy().send(worker)
    #line_tensor = lineToTensor(input_line)
    #output = evaluate(model, line_remote)
    # Get top N categories
    hidden = model_remote.initHidden()
    hidden_remote = hidden.copy().send(worker)

    with torch.no_grad():
        for i in range(line_remote.shape[0]):
            output, hidden_remote = model_remote(line_remote[i], hidden_remote)

    topv, topi = output.copy().get().topk(n_predictions, 1, True)
    predictions = []

    for i in range(n_predictions):
        value = topv[0][i].item()
        category_index = topi[0][i].item()
        print('(%.2f) %s' % (value, all_categories[category_index]))
        predictions.append([value, all_categories[category_index]])

Hey @DanyEle and @iamtrask

Can confirm, this bug is now fixed! :tada: :tada: :tada:

Also, @DanyEle since I have your attention here, I wanted to also make you aware of some errors and thoughts I encountered while following your article. Here they are:

  • When you are trying to install PyTorch on the RPi, you must first have installed pyyaml==5.1
  • I had to run sudo apt-get update && sudo reboot after installing PyTorch on the RPi as I had some issues moving forward with installs
  • In the requirements.txt for PySyft (when you are trying to build and install it on the RPi) you need to remove the following packages for it to compile/build correctly:

    • torch

  • On Part 2 in Section 1 of "Start the worker servers on the Raspberry PIs" - I was going crazy until I figured out there was an error in the command you specified - it should be as follows:

    • Raspberry Pi 1: python run_websocket_server.py --host 10.42.0.55 --port 8777 --id alice

    • Raspberry Pi 2: python run_websocket_server.py --host 10.42.0.56 --port 8778 --id bob

  • It is worth making a note in the instructions that training on the RPi using the federated method often takes a very long time - for me, it took about 2 hours for my RPi 3B+'s
  • I found the instructions about using screen confusing so you may want to clear that up just a little bit or include pictures about how to use this with respect to connecting to a raspberry pi
  • I found setting the alias alias python=python3.6 while following your instructions so I would suggest adding that as a way of setting your default RPi Python to Python3.6

I also think it may be useful to have actually demonstrated how to assign a static IP to the Raspberry Pi's; I stubbed out the basic outline of how to do that here (these steps are to be followed on the RPi's):

  • Run netstat -nr

    • Write down the Gateway Address associated with Destination 0.0.0.0
  • Run sudo nano /etc/dhcpcd.conf and at the top of the file write,

    • ip_address=[Write in the Gateway address but with the last numbers specified to a number you want]/24 (example: if my gateway address is 10.42.0.1, then I would write in something like: ip_address=10.42.0.77/24)
    • static routers=[Write down the Gateway address]
    • static domain_name_servers=[Write down the Gateway address from earlier]
  • Reboot the RPi and SSH into the RPi using the ip_address you just specified earlier.

I hope this was helpful - thanks for the help with setting this up properly; this was awesome! Keep up the great work and good luck with grad school @DanyEle!

Thank you very much @TheCedarPrince for the useful fixes to the tutorial I wrote and good job in setting it all up on the two Raspberry PIs! I hope the tutorial helped you with some guidance on how to go through the whole procedure to setup PySyft and Pytorch, at least. :)

I'm surely going to apply your fixes to my tutorial, but a static IP address, even if suggested to carry out experiments multiple times, is not actually a necessary condition to run the RNN example.

Well since everything is fixed here, I am going to close the issue!

This was more than helpful - it is was an amazing piece of work and I am looking forward to looking at other work on PySyft and PyTorch. : )

Was this page helpful?
0 / 5 - 0 ratings

Related issues

akirahirohito picture akirahirohito  路  3Comments

iamtrask picture iamtrask  路  3Comments

aristizabal95 picture aristizabal95  路  3Comments

mgale694 picture mgale694  路  3Comments

tblazina picture tblazina  路  3Comments