I just upgraded CUDA version from 10.1 to 10.2 since apex installation keeps encounter bugs for no reason. But I cannot figure out where torch.hub.load calling for libcudart.so.10.1 and raising the bug. Any insight to reinstall or build dependencies is appreciated.
import torch
torch.hub.list('pytorch/fairseq') # [..., 'lightconv.glu.wmt17.zh-en', ... ]
zh2en = torch.hub.load('pytorch/fairseq', 'lightconv.glu.wmt17.zh-en', tokenizer='moses', bpe='subword_nmt')
assert isinstance(zh2en.models[0], fairseq.models.lightconv.LightConvModel)
zh2en.translate('浣犲ソ 涓栫晫')
nvcc --version
10.2
torch.version.cuda
10.2
Can you reproduce this in pytorch alone? Something like:
import torch
x = torch.rand(5, 5).cuda()
torch.mm(x, x)
Or is this only happening when trying to use fairseq? How did you install pytorch? If you want to use CUDA 10.2 I think you need to explicitly specify when installing: conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
Can you reproduce this in pytorch alone? Something like:
import torch x = torch.rand(5, 5).cuda() torch.mm(x, x)Or is this only happening when trying to use fairseq? How did you install pytorch? If you want to use CUDA 10.2 I think you need to explicitly specify when installing:
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
Thank you for your quick reply. Yes, that definitely works. This error only occurs when using some fairseq methods (fairseq-generate, and torch.hub.load). Surprisingly even fairseq-train works well. I'm wondering if some fairseq function is fixed to call libcudart.so.10.1 during installation, since cuda 10.1 does not exist anymore in my system.
Probably some of the fairseq components need to be recompiled. Try torch.hub.load(..., force_reload=True). Alternatively you may need to clone the fairseq source and run pip install --editable ..
Probably some of the fairseq components need to be recompiled. Try
torch.hub.load(..., force_reload=True). Alternatively you may need to clone the fairseq source and runpip install --editable ..
Thanks, I'll have a try.
Fixed, thanks.