I am trying a generic code to load the state-dict of a model on CPU/GPU. It works fine for other vision type models but fails for VGG models on GPU machines
Here is a sample code I am trying out :
import torch
map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
model_pt_path = 'vgg13-c768596a.pth'
state_dict = torch.load(model_pt_path, map_location=map_location)
The above code fails with following error
Traceback (most recent call last):
File "test_vgg.py", line 4, in <module>
state_dict = torch.load(model_pt_path, map_location=map_location)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 147, in __setstate__
self.set_(*state)
RuntimeError: Expected object of device type cpu but got device type cuda for argument #2 'source'
I tried this same code with Densenet/Alexnet/Squeezenet models and it works fine on both GPU and CPU machines
Environment information :
OS : Ubuntu 18.0.4
PyTorch Version : 1.5.1
TorchVision Version : 0.6.1
To narrow this down a little: the error only happens for the vgg11 and vgg13 weights.
This relates to https://github.com/pytorch/vision/issues/2068 although the error message is different, but the root issue is the same (very old serialization format).
The workaround on your side for now could be to avoid using the map_location for CUDA, and convert manually the tensors to GPU if needed (or maybe even just pass the weights on CPU to load_state_dict while the model is on CUDA, I think this will also work).
Most helpful comment
This relates to https://github.com/pytorch/vision/issues/2068 although the error message is different, but the root issue is the same (very old serialization format).
The workaround on your side for now could be to avoid using the
map_locationfor CUDA, and convert manually the tensors to GPU if needed (or maybe even just pass the weights on CPU toload_state_dictwhile the model is on CUDA, I think this will also work).