Allennlp: RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

Created on 16 Apr 2018  路  7Comments  路  Source: allenai/allennlp

If I use the following:

ElmoEmbedder(self.config.optfile, self.config.pretrained, cuda_device=0)

then ElmoEmbedder works. But if I set cuda_device to any other value, then it throws the following error.

RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:88

I have tried on a server with 3 GPUs. Only setting cuda_device to 0 works, 1 and 2 doesn't work. What is the problem?

Most helpful comment

If CUDA_VISIBLE_DEVICES is set to a single value of the corresponding GPU id, then you should just use self.config.gpu_id = 0, and it will use the right GPU. You should be able to verify this by using something like nvidia-smi.

All 7 comments

I had a couple of messages I put on here that I deleted, because I posted before really understanding your problem. I'm not sure why devices 1 and 2 don't work, but maybe @DeNeutoy can help.

please post the full stack trace, so we can help you more easily.

@DeNeutoy, is this something that you typically fix with CUDA_VISIBLE_DEVICES?

In the full stack trace, there is not much information. In the program, I get error in this line:

self.elmo_embedder = ElmoEmbedder(self.config.optfile, self.config.pretrained, self.config.gpu_id)

Where if self.config.gpu_id is set to 0, it works just fine but if I set it to 1 or 2, it throws the error. I am running the program with CUDA_VISIBLE_DEVICES set to the corresponding GPU id.

More clarification: In my NN model, I have the model class which is extending nn.Module and I am using ElmoEmbedder and Elmo in that class.

self.elmo_embedder = ElmoEmbedder(self.config.optfile, self.config.pretrained, self.config.gpu_id)
self.elmo_encoder = Elmo(self.config.optfile, self.config.pretrained, 1)

And in the forward method, I am using them as follows.

embedded_x = self.elmo_embedder.batch_to_ids(sentence)
encoded_x = self.elmo_encoder(embedded_x)['elmo_representations'][0]

But the problem occurs in the init function of my model class because of ElmoEmbedder.

If CUDA_VISIBLE_DEVICES is set to a single value of the corresponding GPU id, then you should just use self.config.gpu_id = 0, and it will use the right GPU. You should be able to verify this by using something like nvidia-smi.

oh, I got your point. Since only one GPU id is visible, then I should use gpu_id 0, not 1 or 2. I think I realized my mistake. Thanks!

unset CUDA_VISIBLE_DEVICES

Was this page helpful?
0 / 5 - 0 ratings