Pytorch3d: Unknown error with CUDA

Created on 15 Feb 2020 · 10Comments · Source: facebookresearch/pytorch3d

Thank you for your great work at first!
When I try to run deformation of sphere to dolphin tutorial, I found an unexpected errors when loading the vertices of meshes to device, which be set to CUDA:0. Here is the error log:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCGeneral.cpp line=50 error=30 : unknown error
Traceback (most recent call last):
File "dolphin.py", line 33, in
faces_idx = faces.verts_idx.to(device)
File "/home/jormungandr/anaconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/cuda/__init__.py", line 197, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/THC/THCGeneral.cpp:50

I've tried rmmod nvidia, nvidia-uvm, but each of these commands has an error about
rmmod: ERROR: Module nvidia_uvm is not currently loaded or
rmmod: ERROR: Module nvidia is in use by: nvidia_modeset
I rebooted once, but nothing changed either.
And my environment is as follows:
Pytorch : 1.4
Python : 3.6.10
CUDA : 10.0 by nvcc(runtime)
cuDNN : 7.0
OS : Ubuntu 18.04

question

Source

zzhat0706

All 10 comments

@zzhat0706 I am assuming you built PyTorch3D from local clone? Were you able to run if you change the device to cpu? Are you able to run other pytorch code on device i.e. not with PyTorch3D? This looks like an issue with PyTorch not PyTorch3D.

There's an issue on the PyTorch repo which is referencing this problem - did you check this? https://github.com/pytorch/pytorch/issues/17108

nikhilaravi on 16 Feb 2020

@nikhilaravi Thanks for your answers!
Before Pytorch3D, I've run some pytorch codes such as CycleGAN and W-GAN. And I built Pytorch3D from Anaconda Cloud but not the local clone.
But I haven't tried cpu yet, I'll check it later.

zzhat0706 on 17 Feb 2020

👍1

Hello,

I did all the things mentioned above but I still get the error:

RuntimeError Traceback (most recent call last)
in
23
24 # Create a textures object
---> 25 tex = Textures(verts_uvs=verts_uvs, faces_uvs=faces_uvs, maps=texture_image)
26
27 # Create a meshes object with textures

~/Disk/Software/Anaconda3/envs/pytorch3d/lib/python3.7/site-packages/pytorch3d/structures/textures.py in __init__(self, maps, faces_uvs, verts_uvs, verts_rgb)
118
119 if self._faces_uvs_padded is not None:
--> 120 self._num_faces_per_mesh = faces_uvs.gt(-1).all(-1).sum(-1).tolist()
121
122 def clone(self):

RuntimeError: CUDA error: no kernel image is available for execution on the device

shersoni610 on 19 Feb 2020

I downgraded the Nvidia driver but the error persists:

shersoni610 on 19 Feb 2020

Surprisingly, all the tests in the test folder passed. But the following error comes in the notebook tutorials.
RuntimeError: CUDA error: no kernel image is available for execution on the device

shersoni610 on 19 Feb 2020

I can't see the name of the GPU you are using due to truncation. ("GeForce GTX TIT...") Is it a TITAN X? If it is one of the other TITANs, then I think it will have compute capability 3.5 and so need a local build of pytorch.

bottler on 19 Feb 2020

Hello,

It is Titan GTX Black card.

Ubuntu 18.04 now have many Nvidia drivers. Which is the latest required
driver to build the pytorch3d?
(440, 435, 410)

On Wed, Feb 19, 2020 at 3:27 AM Jeremy Reizenstein notifications@github.com
wrote:

I can't see the name of the GPU you are using due to truncation. ("GeForce
GTX TIT...") Is it a TITAN X? If it is one of the other TITANs, then I
think it will have compute capability 3.5 and so need a local build of
pytorch.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/facebookresearch/pytorch3d/issues/62?email_source=notifications&email_token=ANZR6GWGGDHS542JASYT4N3RDUJSPA5CNFSM4KVX6OOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMHNBKQ#issuecomment-588173482,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ANZR6GWPVYR4ULVLW35H25TRDUJSPANCNFSM4KVX6OOA
.

shersoni610 on 19 Feb 2020

I don't think the driver matters, but you might as well use the latest driver which is working with your GPU. If nvidia-smi is working then you probably have a working driver (although I am not sure about this).

The problem is that you cannot install pytorch (except old versions which we don't support) from conda with your GPU. You will need to set up a new conda environment, and follow the instructions at https://github.com/pytorch/pytorch#from-source for your gpu. I suggest you checkout the branch v1.4. I think you will then need to install torchvision from source as well, and then install pytorch3d from source.

(If all this sounds too hard, and you just want to get a feel for the tutorials, and you are not expecting that you will be using pytorch3d much on your computer, maybe you can run them on colab instead. Alternatively, if you install just pytorch3d from github, then you may be able to run the tutorial entirely on the CPU - just change device = torch.device("cuda:0") to device = torch.device("cpu") in the tutorial.)

bottler on 20 Feb 2020

❤1

@zzhat0706, @shersoni610 were you able to resolve this installation issue? If so, please share what you did here for others to replicate!

nikhilaravi on 24 Feb 2020

👍1

@nikhilaravi Sorry for the late reply, I eventually chose to run on nvidia-docker, then everything just worked out perfectly!
Thx for your great work again!

zzhat0706 on 9 Mar 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings