Transformers: model.to(args.device) in run_glue.py taking around 10 minutes. Is this normal?

Created on 11 Oct 2019  ยท  7Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help

Currently line 484 of run_glue.py model.to(args.device) is taking close to 10 minutes to complete when loading the bert-base pretrained model. This seems like a long time compared to what I was seeing in pytorch-transformers.

My configuration:
Tesla V100 - Driver 418.87.00
Cuda toolkit 10.1
PyTorch 1.3.0

The code I am running is:
python example/run_glue.py \ --model_type bert \ --model_name_or_path bert-base-uncased \ --task_name $(MY TASK) \ --do_train \ --do_eval \ --do_lower_case \ --data_dir $(MY_DIR) \ --max_seq_length 128 \ --per_gpu_eval_batch_size=64 \ --per_gpu_train_batch_size=64 \ --learning_rate 2e-5 \ --num_train_epochs 3.0 \ --output_dir $(MY_OUTDIR) \ --overwrite_output_dir \ --fp16

Is this behavior expected or am I doing something wrong? Thanks!

Most helpful comment

Reopening because I found the issue and hopefully it can help someone else. I was comparing model loading times to what I was seeing on the hosted runtimes in Google Colab notebooks.

Even through they have cuda toolkit 10.1 installed as you can see when running the command !nvidia-smi, when you run torch.version.cuda they have 10.0.130 installed instead of the 10.1 version. They are also running pytorch 1.2.0.

I downgraded my environment to match and the model from models.densenet121(pretrained=True) loaded in 4.9 seconds.

Thanks for the help!

All 7 comments

This seems weird, I'm looking into this.

By running the run_glue.py script as it is right now with your exact parameters, I timed to model.to and it took 6.4 seconds

Ok, thanks for looking into that! I'm using my own dataset so I made adjustments to the processor, but I don't think that should be causing the issue when transferring the model to the GPU. I'll run a few more tests and see if I can pinpoint what is going on. It's super helpful to know that you are seeing it take only 6.4 seconds. Thank you!

I just tested again using the SST-2 data keeping the run_glue.py code as is and I'm still having the same issue. My guess is that there is something with my VM set up that's causing the hanging issue. I'm having a hard time identifying what might be the exact cause of the issue.

Hmm do you think you can reproduce it on another VM? Are you running into the same issue if you simply put the model on the device in a standalone script?

Ok, it's definitely an issue with my setup. I have the same issue when running the following:
`from torchvision import models

model = models.densenet121(pretrained=True)
model.to('cuda')`

I'll close the issue and keep troublehsooting on my end. Thanks!

Reopening because I found the issue and hopefully it can help someone else. I was comparing model loading times to what I was seeing on the hosted runtimes in Google Colab notebooks.

Even through they have cuda toolkit 10.1 installed as you can see when running the command !nvidia-smi, when you run torch.version.cuda they have 10.0.130 installed instead of the 10.1 version. They are also running pytorch 1.2.0.

I downgraded my environment to match and the model from models.densenet121(pretrained=True) loaded in 4.9 seconds.

Thanks for the help!

Was this page helpful?
0 / 5 - 0 ratings