After I train the model locally by a worker, I do model.get() to retrieve it and I have the following runtime error: "Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_set_".
I am training on GPU (same code runs perfectly if I use CPU) and I am using resnet50 model.
optimizer = optim.Adam(model.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
model.train()
model.send(worker)
for batch_idx, (data, target) in enumerate(batches):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
loss = loss.get()
model.get() # <-- This get causes the error
Hey, thanks for reporting!
can you alors provide your stack trace please? :)
Hi @Hjeljeli @LaRiffle, I encountered the same problem a few months ago, please refer to the comments in #3848 for further explanation.
@LaRiffle
Hi Theo, please find my stack trace. Merci pour ton aide!
<ipython-input-16-1a450dc3c5d4> in <module>
3 logging.basicConfig(format=FORMAT, level=LOG_LEVEL)
4
----> 5 main()
<ipython-input-15-443eb06bbc7c> in main()
208 for epoch in range(1, epochs + 1):
209 logger.warning("Starting epoch %s/%s", epoch, epochs)
--> 210 model = train(model, device, federated_train_loader, test_loader, lr, federate_after_n_batches)
211 test(model, device, test_loader)
<ipython-input-15-443eb06bbc7c> in train(model, device, federated_train_loader, test_loader, lr, federate_after_n_batches, abort_after_one)
112 curr_batches = batches[worker]
113 if curr_batches:
--> 114 local_models[worker] = train_on_batches(worker, curr_batches, model, device, test_loader, lr)
115
116 else:
<ipython-input-15-443eb06bbc7c> in train_on_batches(worker, batches, model_in, device, test_loader, lr)
42 t1 = time.time()
43 # We measure accurancy of worker's model
---> 44 model.get()
45 accuracy = test(model, device, test_loader)
46 accuracies[worker].append(accuracy)
/usr/local/lib/python3.7/dist-packages/syft-0.2.7-py3.7.egg/syft/frameworks/torch/hook/hook.py in module_get_(nn_self)
669 for element_iter in tensor_iterator(nn_self):
670 for p in element_iter():
--> 671 p.get_()
672
673 if isinstance(nn_self.forward, Plan):
/usr/local/lib/python3.7/dist-packages/syft-0.2.7-py3.7.egg/syft/frameworks/torch/tensors/interpreters/native.py in get_(self, *args, **kwargs)
685 Calls get() with inplace option set to True
686 """
--> 687 return self.get(*args, inplace=True, **kwargs)
688
689 def allow(self, user=None) -> bool:
/usr/local/lib/python3.7/dist-packages/syft-0.2.7-py3.7.egg/syft/frameworks/torch/tensors/interpreters/native.py in get(self, inplace, user, reason, *args, **kwargs)
672
673 if inplace:
--> 674 self.set_(tensor)
675 if hasattr(tensor, "child"):
676 self.child = tensor.child
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_set_
+1, I also run into this issue with very similar code to the above, except using mobilenet instead of resnet. RuntimeError message is the same. The issue disappears when I use a neural net I specify for myself, so I think it could be interop with the torch model zoo models?
(for reference I had this same issue back in ~May on syft ~0.2.4 but didn't report- unfortunately some other projects pulled me away from this one)
Thank you for reporting it! It might be a problem that the tensors should be sent to the device (in this case GPU) before using set_
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the
stalelabel to unmark it. Otherwise, this will be closed in 7 days.
Updating so this issue is active.
Hello! Just letting you know that we are no longer planning on supporting anything on the 0.2.x product line and that all work should be ported over to 0.3.x, which is considered a complete rebuild of PySyft. Because of that, I'll be closing this issue. If you feel this is a mistake, or if the issue actually applies to 0.3.x as well, please feel free to ping me on Slack and I'll reopen the issue.
is this issue fixed in 0.3.x? @Hjeljeli. Currently stuck with this bug in 0.2.9 :(.
@naveenggmu Hi Naveen, I could not test on 0.3.x as you know the releases 0.3.x do not support all the privacy-preserving techniques that 0.2.x used to support, for me I am using PySyft for Federated Learning.
Most helpful comment
Thank you for reporting it! It might be a problem that the tensors should be sent to the
device(in this case GPU) before usingset_