Model I am using Bert:
Language I am using the model on English:
The tasks I am working on is:
Steps to reproduce the behavior:
CUDA_LAUNCH_BLOCKING=1RuntimeError: CUDA error: out of memoryCUDA_VISIBLE_DEVICES=2 python run_lm_finetuning.py --output_dir=output --model_type=roberta --model_name_or_path=roberta-base --do_train --train_data_file=$TRAIN_FILE --do_eval --eval_data_file=$TEST_FILE --mlm --per_gpu_train_batch_size 1 --per_gpu_eval_batch_size 1
Traceback (most recent call last):
File "run_lm_finetuning.py", line 497, in <module>
main()
File "run_lm_finetuning.py", line 451, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "run_lm_finetuning.py", line 189, in train
outputs = model(inputs, masked_lm_labels=labels) if args.mlm else model(inputs, labels=labels)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 237, in forward
head_mask=head_mask)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 177, in forward
head_mask=head_mask)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_bert.py", line 625, in forward
head_mask=head_mask)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_bert.py", line 346, in forward
layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_bert.py", line 324, in forward
attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_bert.py", line 281, in forward
self_outputs = self.self(input_tensor, attention_mask, head_mask)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/transformers/modeling_bert.py", line 200, in forward
mixed_query_layer = self.query(hidden_states)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/media/user1/storage-1/Ashok_AI/mask_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: cublas runtime error : resource allocation failed at /pytorch/aten/src/THC/THCGeneral.cpp:216
Epoch: 0%| | 0/1 [00:00<?, ?it/s]
Iteration: 0%|
What GPU do you have?
Thanks for your reply and support sir:)
NVIDIA TITAN RTX: 4 脳 24 GB GPUs
Looks like your batch size may be too big?
Thank you so much for your support sir.
I given batch size = 1. May be the latest branch any issues will be present. I will check out previous master and then i will try sir.
Hi, I have the same error. Did you get this problem resolved?
I have the same error too
It may be because of this nn.embedding issue in pytorch . I had the same error. See if you have padded correctly.. or have included some invalid token
Very similar issue with roberta-base (but not bert-base-cased/uncased):
RuntimeError: cublas runtime error : library not initialized at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THC/THCGeneral.cpp:216
I have checked and it isn't a problem with nn.embedding, nor a memory issue.
Very similar issue with roberta-base (but not bert-base-cased/uncased):
RuntimeError: cublas runtime error : library not initialized at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THC/THCGeneral.cpp:216
I have checked and it isn't a problem with nn.embedding, nor a memory issue.
Very similar issue, when using camembert model which is based on roberta,
could you solve the issue ? any thoughts about it plz
Very similar issue with roberta-base (but not bert-base-cased/uncased):
RuntimeError: cublas runtime error : library not initialized at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THC/THCGeneral.cpp:216
I have checked and it isn't a problem with nn.embedding, nor a memory issue.
@YDYordanov Same with you when using roberta-base, have you resolved it?
@YDYordanov @Hadjer13 I found the the solution. In my case , my input example has two sentences, so I use token_type_ids like I use in Bert, but it turns out that I pass the wrong token_type_ids to the RobertaModel. According to the transformers doc, RoBERTa does not make use of token type ids. So using [0,0,..0,1,1..1,0,0,..] as token_type_ids for Roberta is wrong, after I change it to all zeros, i.e. [0,0,...,0,0], the error is fixed. Hope it can help someone!
@YDYordanov @Hadjer13 I found the the solution. In my case , my input example has two sentences, so I use
token_type_idslike I use in Bert, but it turns out that I pass the wrongtoken_type_idsto theRobertaModel. According to the transformers doc, RoBERTa does not make use of token type ids. So using[0,0,..0,1,1..1,0,0,..]astoken_type_idsfor Roberta is wrong, after I change it to all zeros, i.e.[0,0,...,0,0], the error is fixed. Hope it can help someone!
thank you,
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
@YDYordanov @Hadjer13 I found the the solution. In my case , my input example has two sentences, so I use
token_type_idslike I use in Bert, but it turns out that I pass the wrongtoken_type_idsto theRobertaModel. According to the transformers doc, RoBERTa does not make use of token type ids. So using[0,0,..0,1,1..1,0,0,..]astoken_type_idsfor Roberta is wrong, after I change it to all zeros, i.e.[0,0,...,0,0], the error is fixed. Hope it can help someone!