Hi. First of all, thanks for developing this long-desired tool. Now, coming to the bug.
I just started working with PyTorch3D and was trying the tutorial from here: https://github.com/facebookresearch/pytorch3d/blob/master/docs/tutorials/deform_source_mesh_to_target_mesh.ipynb
I started with my own jupyter notebook to reproduce the code. However, when I tried to visualize the meshes, by calling the plot_pointcloud() function in the tutorial, I came across the following error:
plot_pointcloud(trg_mesh, "Target mesh")
RuntimeError Traceback (most recent call last)
<ipython-input-39-1e1d27f1793b> in <module>
3 # print(trg_mesh._N)
4 # trg_mesh.valid
----> 5 plot_pointcloud(trg_mesh, "Target mesh")
6 # plot_pointcloud(src_mesh, "Source mesh")
<ipython-input-30-fa31b9ded440> in plot_pointcloud(mesh, title)
2 # Sample points uniformly from the surface of the mesh
3 print(mesh)
----> 4 points = sample_points_from_meshes(mesh, 5000)
5 x, y, z = points.clone().detach().cpu().squeeze().unbind(1)
6 fig = plt.figure(figsize=(5, 5))
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/pytorch3d/ops/sample_points_from_meshes.py in sample_points_from_meshes(meshes, num_samples, return_normals)
39 be filled with 0.
40 """
---> 41 if meshes.isempty():
42 raise ValueError("Meshes are empty.")
43
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/pytorch3d/structures/meshes.py in isempty(self)
430 bool indicating whether there is any data.
431 """
--> 432 return self._N == 0 or self.valid.eq(False).all()
433
434 def verts_list(self):
RuntimeError: CUDA error: device-side assert triggered
I noticed the error was coming by the member mesh.valid. When I called that member directly from the script, I got similar error.
trg_mesh.valid
RuntimeError Traceback (most recent call last)
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
682 """A pprint that just redirects to the normal repr function."""
683 # Find newlines and replace them with p.break_()
--> 684 output = repr(obj)
685 lines = output.splitlines()
686 with p.group():
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/tensor.py in __repr__(self)
157 # characters to replace unicode characters with.
158 if sys.version_info > (3,):
--> 159 return torch._tensor_str._str(self)
160 else:
161 if hasattr(sys.stdout, 'encoding'):
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in _str(self)
309 tensor_str = _tensor_str(self.to_dense(), indent)
310 else:
--> 311 tensor_str = _tensor_str(self, indent)
312
313 if self.layout != torch.strided:
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in _tensor_str(self, indent)
207 if self.dtype is torch.float16 or self.dtype is torch.bfloat16:
208 self = self.float()
--> 209 formatter = _Formatter(get_summarized_data(self) if summarize else self)
210 return _tensor_str_with_formatter(self, indent, formatter, summarize)
211
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/_tensor_str.py in __init__(self, tensor)
81 if not self.floating_dtype:
82 for value in tensor_view:
---> 83 value_str = '{}'.format(value)
84 self.max_width = max(self.max_width, len(value_str))
85
~/miniconda3/envs/pytorch3d/lib/python3.6/site-packages/torch/tensor.py in __format__(self, format_spec)
407 def __format__(self, format_spec):
408 if self.dim() == 0:
--> 409 return self.item().__format__(format_spec)
410 return object.__format__(self, format_spec)
411
RuntimeError: CUDA error: device-side assert triggered
My configuration is:
Ubuntu: 18.04
Python: 3.6.10
Pytorch: 1.4.0
Pytorch3D: 0.1.1
CUDA: 10.1
Thanks!
Hi @rahuldey91! Thank you for your kind words.
This is issue has been reported before (see https://github.com/facebookresearch/pytorch3d/issues/82 and https://github.com/facebookresearch/pytorch3d/issues/63) and is likely due to nans in your meshes. Could you print out or check for nans before you execute sampling?
In the meantime, I will add a check at the beginning of mesh sampling which will raise a better error message!
I added a check that raises an error if non finite values are passed (see https://github.com/facebookresearch/pytorch3d/commit/6c48ff6ad9005cfc03704c77531a4a25d1c8d843).
Hi @gkioxari! Thanks for your quick response and pointing out related issues. I was trying to check for the presence of nans in the mesh, but I was getting the same error even while calling trg_mesh.verts_list(). Then I noticed that my mesh was in device "cuda:7". I reran the code after changing the device to "cuda:0" and I got the desired output without any errors. Could you help me understand why the data being on a device other than cuda:0 would produce an error?
This shouldn't create a problem. Note that we use these ops to train on multiple GPUs, e.g. when training Mesh R-CNN models with distributed training on 8 gpus. Is it possible that your data was living on different devices, or that your GPU is corrupt in any way? I can't think of other reasons why it would fail.
Here is my ipynb file to reproduce the error. If you change the device to device = torch.device("cuda:0"), it will run without errors. For any other gpu, it shoots this error.
sphere_to_dolphin.zip
@rahuldey91 are you using one gpu or multiple gpus? If you are using a GPU other than the default (cuda:0) you may need set it explicitly as :
device = torch.device("cuda:7")
torch.cuda.set_device(device)
Oh I see. That resolves the issue. You can go ahead and close it. Thanks.
Most helpful comment
Oh I see. That resolves the issue. You can go ahead and close it. Thanks.