Transformers: Importing transformers causes segmentation fault when setting cuda device

Created on 15 Jun 2020 · 10Comments · Source: huggingface/transformers

🐛 Bug

Information

The problem arises when using:
my own modified scripts: (give details below)

To reproduce

import torch
import transformers

def main(local_rank):
    torch.cuda.set_device(local_rank)
    device = torch.device('cuda', local_rank)

if __name__ == "__main__":
    print (torch.__version__)
    print (transformers.__version__)
    print (torch.cuda.is_available())
    main(0)

Expected behavior

1.4.0
2.11.0
True
Segmentation fault (core dumped)

if commenting out import transformers, everything will be fine.

Environment info

transformers version: 2.11.0
Platform: linux
Python version: 3.6.8
PyTorch version (GPU?): 1.4.0 GPU
Tensorflow version (GPU?): None
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

wontfix

Source

jcyk

👀1

Most helpful comment

@daphnei thx for pointing out this!
The solution for me was to upgrade torch to 1.5.1+cu92, and downgrade transformers version to 2.6.0.
Quite weird problem!

I have met the exactly same problem with you, and did you find the root cause？ is this issue same to report 'fix segmentation fault' #2207？
The following were my test results with different versions:

Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.11.0；cuda 10.0
Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.6.0；cuda 10.0
Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.5.1；cuda 10.0
Success: torch 1.1.0 ; transformers 2.5.1；cuda 10.0

BTW, is there a torch release of ‘1.5.1+cuda10.0’?
Thanks.

huangxiaoshuo on 29 Jun 2020

🎉3 🚀2 👍2

All 10 comments

Hey @jcyk,

when trying to reproduce this error with PyTorch 1.5.0, there is no problem.

However, when I run your code with PyTorch 1.4.0 (as you did), I get the following error:

1.4.0
2.11.0
False
Traceback (most recent call last):
  File "./bug_4993.py", line 16, in <module>
    main(0)
  File "./bug_4993.py", line 8, in main
    torch.cuda.set_device(local_rank)
  File "/home/patrick/anaconda3/envs/pytorch_1_4/lib/python3.8/site-packages/torch/cuda/__init__.py", line 292, in set_device
    torch._C._cuda_setDevice(device)
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

, which is not related to transformers.

Also when going to PyTorch 1.4 documentation: https://pytorch.org/docs/1.4.0/cuda.html#torch.cuda.set_device

You can see that set_device is not recommended and that you should use https://pytorch.org/docs/1.4.0/cuda.html#torch.cuda.device instead.

Could you try using this function instead and see what happens? Also, it's very hard to trace back
Segmentation fault (core dumped) errors. Can you try getting a more explicit error message?

patrickvonplaten on 15 Jun 2020

hi @patrickvonplaten , thanks for your reply.

Please notice that if i remove the line import transformers, the problem will disappear. That is why I suspect there is a problem with transformers. Please see the following two examples.

code0

import torch
#import transformers

def main(local_rank):
    device = torch.device('cuda', local_rank)
    x = torch.tensor([1,2,3], device=device)

if __name__ == "__main__":
    print (torch.__version__)
    #print (transformers.__version__)
    print (torch.cuda.is_available())
    main(0)

output0

1.4.0
True

code1

import torch
import transformers

def main(local_rank):
    device = torch.device('cuda', local_rank)
    x = torch.tensor([1,2,3], device=device)

if __name__ == "__main__":
    print (torch.__version__)
    print (transformers.__version__)
    print (torch.cuda.is_available())
    main(0)

output1

1.4.0
2.11.0
True
Segmentation fault (core dumped)

jcyk on 15 Jun 2020

I am experiencing this exact same problem, and updating to pytorch 1.5 is not an option. did you have any success figuring this out?

daphnei on 18 Jun 2020

EDIT: This problem is caused by the sentencepiece dependency. It goes away if I comment out all references to this dependency. This will break xlnet, xlm_roberta, marian, t5, albert, reformer, and camembert, but if you are using any of the non-sentencepiece models, this should solve your problem.

daphnei on 19 Jun 2020

👍3

@daphnei thx for pointing out this!
The solution for me was to upgrade torch to 1.5.1+cu92, and downgrade transformers version to 2.6.0.
Quite weird problem!

jcyk on 19 Jun 2020

That seems like a better fix than my hack! Unfortunately, I'm using a managed machine which doesn't have the CUDA version to support Torch 1.5.

daphnei on 19 Jun 2020

@daphnei thx for pointing out this!
The solution for me was to upgrade torch to 1.5.1+cu92, and downgrade transformers version to 2.6.0.
Quite weird problem!

Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.11.0；cuda 10.0
Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.6.0；cuda 10.0
Segmentation fault (core dumped): torch 1.2.0 ; transformers 2.5.1；cuda 10.0
Success: torch 1.1.0 ; transformers 2.5.1；cuda 10.0

BTW, is there a torch release of ‘1.5.1+cuda10.0’?
Thanks.

huangxiaoshuo on 29 Jun 2020

🎉3 🚀2 👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 29 Aug 2020

1.5.1+cu10.2 it's okay, but my device is Telsa k40m，Here is the Error: RuntimeError: CUDA error: no kernel image is available for execution on the device not support higher version PyTorch version。Here is my Test:

pytorch==1.2.0~1.3.0
It works well in Telsa K40m, running python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))" Segmentation fault (core dumped) responded
pytorch==1.5.0
it's some error for import torch; a= torch.Tensor(5,3); a=a.cuda(); a ; RuntimeError: CUDA error: no kernel image is available for execution on the device

My Env is ：

NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0

changquanyou on 6 Sep 2020

I slove this question!
and I've outlined two solutions:

Update your torch, such as from 1.2.0 to 1.7
reduce the version of the associated package， such as sentencepiece : from 0.1.94 to 0.1.91 and delete dataclasses

I tried two above solutions and they both work! And because some reason I cannot upgrade the cuda version to adapt torch1.7, so I use the second solution : )

My Env is :
transformers 3.5.0 python 3.7 CUDA 10.1 pytorch 1.2.0