Allennlp: RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

Created on 16 Apr 2018 · 7Comments · Source: allenai/allennlp

If I use the following:

ElmoEmbedder(self.config.optfile, self.config.pretrained, cuda_device=0)

then ElmoEmbedder works. But if I set cuda_device to any other value, then it throws the following error.

RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

I have tried on a server with 3 GPUs. Only setting cuda_device to 0 works, 1 and 2 doesn't work. What is the problem?

Source

wasiahmad

👍3

Most helpful comment

If CUDA_VISIBLE_DEVICES is set to a single value of the corresponding GPU id, then you should just use self.config.gpu_id = 0, and it will use the right GPU. You should be able to verify this by using something like nvidia-smi.

matt-gardner on 16 Apr 2018

👍30

All 7 comments

I had a couple of messages I put on here that I deleted, because I posted before really understanding your problem. I'm not sure why devices 1 and 2 don't work, but maybe @DeNeutoy can help.

matt-gardner on 16 Apr 2018

please post the full stack trace, so we can help you more easily.

DeNeutoy on 16 Apr 2018

@DeNeutoy, is this something that you typically fix with CUDA_VISIBLE_DEVICES?

matt-gardner on 16 Apr 2018

👍5 👎2 😕1

In the full stack trace, there is not much information. In the program, I get error in this line:

self.elmo_embedder = ElmoEmbedder(self.config.optfile, self.config.pretrained, self.config.gpu_id)

Where if self.config.gpu_id is set to 0, it works just fine but if I set it to 1 or 2, it throws the error. I am running the program with CUDA_VISIBLE_DEVICES set to the corresponding GPU id.

More clarification: In my NN model, I have the model class which is extending nn.Module and I am using ElmoEmbedder and Elmo in that class.

self.elmo_embedder = ElmoEmbedder(self.config.optfile, self.config.pretrained, self.config.gpu_id)
self.elmo_encoder = Elmo(self.config.optfile, self.config.pretrained, 1)

And in the forward method, I am using them as follows.

embedded_x = self.elmo_embedder.batch_to_ids(sentence)
encoded_x = self.elmo_encoder(embedded_x)['elmo_representations'][0]

But the problem occurs in the init function of my model class because of ElmoEmbedder.

wasiahmad on 16 Apr 2018

matt-gardner on 16 Apr 2018

👍30

oh, I got your point. Since only one GPU id is visible, then I should use gpu_id 0, not 1 or 2. I think I realized my mistake. Thanks!

wasiahmad on 16 Apr 2018

👍7

unset CUDA_VISIBLE_DEVICES

serser on 11 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Allennlp in production - Exporting the pipeline to torch script

sai-prasanna · 4Comments

Can we use the allennlp for the other language, such as Dari, I want to implement the Coreference Resolution

ghezalahmad · 4Comments

Feature request: momentum schedulers

epwalsh · 4Comments

Updates to the docs

DeNeutoy · 4Comments

Configuration error on coreference resolution while using model coref-bert-lstm-2020.02.12.

lighteternal · 4Comments