Transformers: RuntimeError: CUDA error: device-side assert triggered

Created on 31 May 2019  路  6Comments  路  Source: huggingface/transformers

I got this error when using simple_lm_finetuning.py to continue to train a bert model. Could anyone can help? Thanks a lot.
Here is the cuda and python trace. I confirm that my input max_length don't over max_position_embeddings

/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Loading Train Dataset input_lm.txt
Traceback (most recent call last):
  File "simple_lm_finetuning.py", line 646, in <module>
    main()
  File "simple_lm_finetuning.py", line 592, in main
    loss = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 783, in forward
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 714, in forward
    extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 261, in forward
    position_embeddings = self.position_embeddings(position_ids)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
wontfix

Most helpful comment

Then it鈥檚 definitely that you鈥檝e got a bad index into the positional embeddings.

All 6 comments

Rerun with environmental variable CUDA_LAUNCH_BLOCKING=1 and see what line it crashed on.

This is almost always an out-of-bounds error on some embeddings lookup. Usually positional embeddings, but it could be word embeddings or segment embeddings.

HI @stephenroller , I do set environmental variable CUDA_LAUNCH_BLOCKING=1 and get the previous log. I will check my word embeddings or segment embeddings.

Then it鈥檚 definitely that you鈥檝e got a bad index into the positional embeddings.

But when I removed the positional embeddings, it still posts the error.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

But when I removed the positional embeddings, it still posts the error.

I met the same problem. Did you find how to solve it?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lemonhu picture lemonhu  路  3Comments

HansBambel picture HansBambel  路  3Comments

fyubang picture fyubang  路  3Comments

zhezhaoa picture zhezhaoa  路  3Comments

siddsach picture siddsach  路  3Comments