Transformers: RuntimeError: expected device cpu but got device cuda:0

Created on 8 May 2020 · 8Comments · Source: huggingface/transformers

I am traing a roberta model and running the script examples/run_language_modeling.py
The following error occurs when i am trying to resume training.
Traceback (most recent call last):
File "examples/run_language_modeling.py", line 284, in
main()
File "examples/run_language_modeling.py", line 254, in main
trainer.train(model_path=model_path)
File "/home/socian-pc1/anaconda3/envs/XformerEnv/lib/python3.6/site-packages/transformers/trainer.py", line 326, in train
optimizer.step()
File "/home/socian-pc1/anaconda3/envs/XformerEnv/lib/python3.6/site-packages/torch/optim/lr_scheduler.py", line 67, in wrapper
return wrapped(args, *kwargs)
File "/home/socian-pc1/anaconda3/envs/XformerEnv/lib/python3.6/site-packages/transformers/optimization.py", line 155, in step
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
RuntimeError: expected device cpu but got device cuda:0

My config
python examples/run_language_modeling.py \
--train_data_file $TRAIN_FILE \
--eval_data_file $TEST_FILE \
--output_dir ./MyRobertaOutput \
--model_name_or_path ./MyRoBERTa/checkpoint-570000 \
--config_name ../xformer_output \
--tokenizer_name ../xformer_output \
--mlm \
--do_train \
--do_eval \
--line_by_line \
--learning_rate 1e-5 \
--num_train_epochs 2 \
--save_total_limit 20 \
--save_steps 5000 \
--per_gpu_train_batch_size 6 \
--warmup_steps=10000 \
--logging_steps=100 \
--gradient_accumulation_steps=4 \
--seed 666 --block_size=512

Source

zaowad

Most helpful comment

I faced the same problem with RoBERTa pretraining, however inserting line model = model.cuda() before trainer in run_language_modeling.py file helped me
@tebandesade, Thank you!

bokertof on 15 May 2020

👍2 😄1

All 8 comments

Try initialize the model in the trainer script from transformers with self.model = model.cuda()

tebandesade on 9 May 2020

👍2

I am getting the same error. Is there work around ?
What @tebandesade mentioned didnt work out for.

octalpixel on 15 May 2020

I faced the same problem with RoBERTa pretraining, however inserting line model = model.cuda() before trainer in run_language_modeling.py file helped me
@tebandesade, Thank you!

bokertof on 15 May 2020

👍2 😄1

Hello! I'm having trouble reproducing that on master. Do you mind installing from source and letting me know if you still have the issue? Thank you

LysandreJik on 15 May 2020

Hi @LysandreJik, installing from source doesn't fix the issue, though @tebandesade's suggestion works fine.

@octalpixel try editing this line, it shall work;
https://github.com/huggingface/transformers/blob/62427d0815825436fa55b43725f44776e94abb65/src/transformers/trainer.py#L145

pranavpawar3 on 15 May 2020

👍1

I am getting this error too.

I think the issue is optimizers are setup from self.model which is in cpu, but the model is moved to device afterwards. Which is why self.model = model.cuda() fixes the error.

shaoyent on 16 May 2020

👍1

Should be fixed on master thanks to @shaoyent . Give it a try and let us know:)

julien-c on 19 May 2020

I am getting this error too.

I think the issue is optimizers are setup from self.model which is in cpu, but the model is moved to device afterwards. Which is why self.model = model.cuda() fixes the error.

It works if you didn't install transformers package, or you will need to modify the installed package file.