Pysyft: Moving a module to cuda throws errors

Created on 10 Feb 2019  Â·  12Comments  Â·  Source: OpenMined/PySyft

```
import numpy as np
import torch
import torch.utils.data as data
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

import torch
from torch import nn
from torch import optim
from torchvision.datasets.mnist import MNIST
import pdb

import syft as sy
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4450, 500)
self.fc2 = nn.Linear(500, 10)

def forward(self, x):                                                                                                                                  
    x = F.relu(self.conv1(x))                                                                                                                          
    x = F.max_pool2d(x, 2, 2)                                                                                                                          
    x = F.relu(self.conv2(x))                                                                                                                          
    x = F.max_pool2d(x, 2, 2)                                                                                                                          
    x = x.view(-1, 4*4*50)                                                                                                                             
    x = F.relu(self.fc1(x))                                                                                                                            
    x = self.fc2(x)                                                                                                                                    
    return F.log_softmax(x, dim=1)

hook = sy.TorchHook(torch)
device = torch.device("cuda")
model = Net().to(device)
print(model) ```

Good first issue Priority Type

Most helpful comment

@mari-linhares
try changing torch.set_default_tensor_type(torch.cuda.FloatTensor) with torch.set_default_tensor_type(torch.FloatTensor) you will be able to reproduce original error.

set_default_tensor_type creates tensors of specified type and I think, setting to cuda float tensor is leading to ensuring hook is also using cuda.floattensor and hence no error as both new_data and native_param_data are on cuda

Ref: https://pytorch.org/docs/stable/torch.html#torch.set_default_tensor_type

All 12 comments

Can you add the stacktrace?
One good-first-issue would be in the case you have only cpu, and try to call .to(device), to fix the error which is only due to us not serializing devices. This could be easily handled.
_The part with the gpu and cuda is not part of the good-first-issue._

Traceback (most recent call last):
  File "test.py", line 37, in <module>
    model = Net().to(device)                                                                                                                                   
  File "/network/home/maloneyj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/network/home/maloneyj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/network/home/maloneyj/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply
    param.data = fn(param.data)
  File "/network/home/maloneyj/PySyft/syft/frameworks/torch/hook.py", line 339, in data
    self.native_param_data.set_(new_data)  # .wrap()
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'source'

in data(): new_data is cuda whereas native_param_data is cpu
fix should be moving new_data to cpu while setting native_param_data
Please correct me if I'm wrong

FYI, for the code snippet above I'm able to fix the error adding this line after importing torch:

torch.set_default_tensor_type(torch.cuda.FloatTensor)

Not sure if this is ideal for every case...

FYI, for the code snippet above I'm able to fix the error adding this line after importing torch:

torch.set_default_tensor_type(torch.cuda.FloatTensor)

Not sure if this is ideal for every case...

@mari-linhares nice finding. This is changing default tensor type and should be considered as a work around for now. What we are truely looking for is to to data transfer over cuda

@bhushan23 do you know why changing the tensor to FloatTensor works? I'm not using my work computer right now, but I assume since there's no error that the data is transferred over to cuda. I'll check when I get home.

@mari-linhares
try changing torch.set_default_tensor_type(torch.cuda.FloatTensor) with torch.set_default_tensor_type(torch.FloatTensor) you will be able to reproduce original error.

set_default_tensor_type creates tensors of specified type and I think, setting to cuda float tensor is leading to ensuring hook is also using cuda.floattensor and hence no error as both new_data and native_param_data are on cuda

Ref: https://pytorch.org/docs/stable/torch.html#torch.set_default_tensor_type

This issue is breaking tutorial 8 at the moment when used with CUDA.

[Python 3.7, PySyft from master branch]

I've tried to apply @mari-linhares 's workaround to no success so far:

  • when set after distributing the dataset, the instruction crashes with IndexError: list index out of range
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook_args.py in hook_function_args(attr, args, kwargs, return_args_type)
    135         # TODO rename registry or use another one than for methods
--> 136         hook_args = hook_method_args_functions[attr]
    137         get_tensor_type_function = get_tensor_type_functions[attr]

KeyError: 'torch.set_default_tensor_type'

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-6-fb50db14b868> in <module>
      3 if device.type == "cuda":
      4   os.environ["CUDA_VISIBLE_DEVICES"] = "0"
----> 5   torch.set_default_tensor_type(torch.cuda.FloatTensor)

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook.py in overloaded_func(*args, **kwargs)
    691             cmd_name = f"{attr.__module__}.{attr.__name__}"
    692             command = (cmd_name, None, args, kwargs)
--> 693             response = TorchTensor.handle_func_command(command)
    694             return response
    695 

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/tensors/interpreters/native.py in handle_func_command(cls, command)
    191             # Note that we return also args_type which helps handling case 3 in the docstring
    192             new_args, new_kwargs, new_type, args_type = syft.frameworks.torch.hook_args.hook_function_args(
--> 193                 cmd, args, kwargs, return_args_type=True
    194             )
    195             # This handles case 3: it redirects the command to the appropriate class depending

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook_args.py in hook_function_args(attr, args, kwargs, return_args_type)
    141     except (IndexError, KeyError, AssertionError):  # Update the function in case of an error
    142         args_hook_function, get_tensor_type_function = build_hook_args_function(
--> 143             args, return_tuple=True
    144         )
    145         # Store the utility functions in registries

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook_args.py in build_hook_args_function(args, return_tuple)
    171     # Build a function with this rule to efficiently the child type of the
    172     # tensor found in the args
--> 173     get_tensor_type_function = build_get_tensor_type(rule)
    174     return args_hook_function, get_tensor_type_function
    175 

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook_args.py in build_get_tensor_type(rules, layer)
    392 
    393     if first_layer:
--> 394         return lambdas[0]
    395     else:
    396         return lambdas

IndexError: list index out of range
  • when set right after import torch, data distribution fails with RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
RuntimeError                              Traceback (most recent call last)
<ipython-input-8-4085cd6569bc> in <module>
      5                        transforms.Normalize((0.1307,), (0.3081,))
      6                    ]))
----> 7     .federate((bob, alice)), # <-- NEW: we distribute the dataset across all the workers, it's now a FederatedDataset
      8     batch_size=args.batch_size, shuffle=True, **kwargs)
      9 

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/federated/dataset.py in dataset_federate(dataset, workers)
     89     datasets = []
     90     data_loader = torch.utils.data.DataLoader(dataset, batch_size=data_size)
---> 91     for dataset_idx, (data, targets) in enumerate(data_loader):
     92         worker = workers[dataset_idx % len(workers)]
     93         logger.debug("Sending data to worker %s", worker.id)

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    613         if self.num_workers == 0:  # same-process loading
    614             indices = next(self.sample_iter)  # may raise StopIteration
--> 615             batch = self.collate_fn([self.dataset[i] for i in indices])
    616             if self.pin_memory:
    617                 batch = pin_memory_batch(batch)

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    613         if self.num_workers == 0:  # same-process loading
    614             indices = next(self.sample_iter)  # may raise StopIteration
--> 615             batch = self.collate_fn([self.dataset[i] for i in indices])
    616             if self.pin_memory:
    617                 batch = pin_memory_batch(batch)

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torchvision/datasets/mnist.py in __getitem__(self, index)
     93 
     94         if self.transform is not None:
---> 95             img = self.transform(img)
     96 
     97         if self.target_transform is not None:

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
     58     def __call__(self, img):
     59         for t in self.transforms:
---> 60             img = t(img)
     61         return img
     62 

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torchvision/transforms/transforms.py in __call__(self, tensor)
    161             Tensor: Normalized Tensor image.
    162         """
--> 163         return F.normalize(tensor, self.mean, self.std, self.inplace)
    164 
    165     def __repr__(self):

/home/xxx/PySyft/venv/lib/python3.7/site-packages/torchvision/transforms/functional.py in normalize(tensor, mean, std, inplace)
    206     mean = torch.tensor(mean, dtype=torch.float32)
    207     std = torch.tensor(std, dtype=torch.float32)
--> 208     tensor.sub_(mean[:, None, None]).div_(std[:, None, None])
    209     return tensor
    210 

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    637                 except BaseException as e:
    638                     # we can make some errors more descriptive with this method
--> 639                     raise route_method_exception(e, self, args, kwargs)
    640 
    641             else:  # means that there is a wrapper to remove

/home/xxx/PySyft/venv/lib/python3.7/site-packages/syft-0.1.14a1-py3.7.egg/syft/frameworks/torch/hook/hook.py in overloaded_native_method(self, *args, **kwargs)
    631                 try:
    632                     if isinstance(args, tuple):
--> 633                         response = method(*args, **kwargs)
    634                     else:
    635                         response = method(args, **kwargs)

RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

This might also break other tutorials when using GPUs, have not tried yet.

_Disclaimer: I needed this running (asap) to train a large-ish dataset on GPU, so it's more like a hacky workaround than an actual solution, but I'll be happy if it helps anyone with the same problem. Or can be used to build up a proper fix._

So I was experiencing the same problem as @jopasserat . I think the problem was that the function that handles function commands in hooks.py needed to convert a _non-tensor_ to a _torch.tensor_ , but this exception was not contemplated. So basically I am catching the IndexError that happens in those cases.

Apart from that I'm using torch.set_default_tensor_type(torch.cuda.FloatTensor). I tried to convert the non-cuda tensors to cuda but for some reason in the wrapped tensors .to() method doesn't seem to work for me. So unless I set it to default I get the problem @bhushan23 was mentioning. Plus, I need to set the num_workers=0 (I think the method has been overwritten so it does not run anymore) and pin_memory=False (as that will only work with dense CPU tensors).

FYI, for the code snippet above I'm able to fix the error adding this line after importing torch:

torch.set_default_tensor_type(torch.cuda.FloatTensor)

Not sure if this is ideal for every case...

I am so glad to see this, it really solved my problem of worrying all afternoon

I found a workaround that worked for my needs. The trick is to hook pysyft after you move your model to cuda.

I found a workaround that worked for my needs. The trick is to hook pysyft after you move your model to cuda.

what do you mean? Hook the worker?But if you hook after these, how can you make federated dataset? For example,in the tutorial example>advanced>CIFAR10. Can you post your code

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samsontmr picture samsontmr  Â·  3Comments

mgale694 picture mgale694  Â·  3Comments

akirahirohito picture akirahirohito  Â·  3Comments

iamtrask picture iamtrask  Â·  3Comments

MetaT1an picture MetaT1an  Â·  3Comments