The problem arises when using:
my own modified scripts: (give details below)
import torch
import transformers
def main(local_rank):
torch.cuda.set_device(local_rank)
device = torch.device('cuda', local_rank)
if __name__ == "__main__":
print (torch.__version__)
print (transformers.__version__)
print (torch.cuda.is_available())
main(0)
1.4.0
2.11.0
True
Segmentation fault (core dumped)
if commenting out import transformers, everything will be fine.
transformers version: 2.11.0Hey @jcyk,
when trying to reproduce this error with PyTorch 1.5.0, there is no problem.
However, when I run your code with PyTorch 1.4.0 (as you did), I get the following error:
1.4.0
2.11.0
False
Traceback (most recent call last):
File "./bug_4993.py", line 16, in <module>
main(0)
File "./bug_4993.py", line 8, in main
torch.cuda.set_device(local_rank)
File "/home/patrick/anaconda3/envs/pytorch_1_4/lib/python3.8/site-packages/torch/cuda/__init__.py", line 292, in set_device
torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
, which is not related to transformers.
Also when going to PyTorch 1.4 documentation: https://pytorch.org/docs/1.4.0/cuda.html#torch.cuda.set_device
You can see that set_device is not recommended and that you should use https://pytorch.org/docs/1.4.0/cuda.html#torch.cuda.device instead.
Could you try using this function instead and see what happens? Also, it's very hard to trace back
Segmentation fault (core dumped) errors. Can you try getting a more explicit error message?
hi @patrickvonplaten , thanks for your reply.
Please notice that if i remove the line import transformers, the problem will disappear. That is why I suspect there is a problem with transformers. Please see the following two examples.
code0
import torch
#import transformers
def main(local_rank):
device = torch.device('cuda', local_rank)
x = torch.tensor([1,2,3], device=device)
if __name__ == "__main__":
print (torch.__version__)
#print (transformers.__version__)
print (torch.cuda.is_available())
main(0)
output0
1.4.0
True
code1
import torch
import transformers
def main(local_rank):
device = torch.device('cuda', local_rank)
x = torch.tensor([1,2,3], device=device)
if __name__ == "__main__":
print (torch.__version__)
print (transformers.__version__)
print (torch.cuda.is_available())
main(0)
output1
1.4.0
2.11.0
True
Segmentation fault (core dumped)
I am experiencing this exact same problem, and updating to pytorch 1.5 is not an option. did you have any success figuring this out?
EDIT: This problem is caused by the sentencepiece dependency. It goes away if I comment out all references to this dependency. This will break xlnet, xlm_roberta, marian, t5, albert, reformer, and camembert, but if you are using any of the non-sentencepiece models, this should solve your problem.
@daphnei thx for pointing out this!
The solution for me was to upgrade torch to 1.5.1+cu92, and downgrade transformers version to 2.6.0.
Quite weird problem!
That seems like a better fix than my hack! Unfortunately, I'm using a managed machine which doesn't have the CUDA version to support Torch 1.5.
@daphnei thx for pointing out this!
The solution for me was to upgrade torch to1.5.1+cu92, and downgrade transformers version to2.6.0.
Quite weird problem!
I have met the exactly same problem with you, and did you find the root cause? is this issue same to report 'fix segmentation fault' #2207?
The following were my test results with different versions:
BTW, is there a torch release of ‘1.5.1+cuda10.0’?
Thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
1.5.1+cu10.2 it's okay, but my device is Telsa k40m,Here is the Error: RuntimeError: CUDA error: no kernel image is available for execution on the device not support higher version PyTorch version。Here is my Test:
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))" Segmentation fault (core dumped) respondedimport torch; a= torch.Tensor(5,3); a=a.cuda(); a ; RuntimeError: CUDA error: no kernel image is available for execution on the device My Env is :
NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0
I slove this question!
and I've outlined two solutions:
Update your torch, such as from 1.2.0 to 1.7
reduce the version of the associated package, such as sentencepiece : from 0.1.94 to 0.1.91 and delete dataclasses
I tried two above solutions and they both work! And because some reason I cannot upgrade the cuda version to adapt torch1.7, so I use the second solution : )
My Env is :
transformers 3.5.0
python 3.7
CUDA 10.1
pytorch 1.2.0
Most helpful comment
I have met the exactly same problem with you, and did you find the root cause? is this issue same to report 'fix segmentation fault' #2207?
The following were my test results with different versions:
BTW, is there a torch release of ‘1.5.1+cuda10.0’?
Thanks.