Hello,
The sample_points_from_meshes() function fails with the error below when my mesh is loaded to my second GPU 'cuda:1' but works when the mesh is loaded to my first GPU, 'cuda:0'.
Please include the following (depending on what the issue is):
import os
import torch
from pytorch3d.io import load_obj, save_obj
from pytorch3d.structures import Meshes
from pytorch3d.utils import ico_sphere
from pytorch3d.ops import sample_points_from_meshes
from pytorch3d.loss import (
chamfer_distance,
mesh_edge_loss,
mesh_laplacian_smoothing,
mesh_normal_consistency,
)
import numpy as np
from tqdm import tqdm_notebook
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib as mpl
def plot_pointcloud(mesh, title=""):
# Sample points uniformly from the surface of the mesh.
points = sample_points_from_meshes(mesh, 5000)
x, y, z = points.clone().detach().cpu().squeeze().unbind(1)
fig = plt.figure(figsize=(5, 5))
ax = Axes3D(fig)
ax.scatter3D(x, z, -y)
ax.set_xlabel('x')
ax.set_ylabel('z')
ax.set_zlabel('y')
ax.set_title(title)
ax.view_init(190, 30)
plt.show()
path = 'cube768.obj'
# Change this to 'cuda:0' and the code works
device = "cuda:1"
verts, faces, aux = load_obj(path)
textures_idx
faces_idx = faces.verts_idx.to(device)
verts = verts.to(device)
center = verts.mean(0)
verts = verts - center
scale = max(verts.abs().max(0)[0])
verts = verts / scale
trg_mesh = Meshes(verts=[verts], faces=[faces_idx])
plot_pointcloud(trg_mesh, "Target mesh")
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::binarySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [0,0,0], thread: [0,1,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::binarySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [0,0,0], thread: [0,2,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::binarySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [0,0,0], thread: [0,3,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/MultinomialKernel.cu:87: int at::native::<unnamed>::binarySearchForMultinomial(scalar_t *, scalar_t *, int, scalar_t) [with scalar_t = float]: block: [0,0,0], thread: [0,0,0] Assertion `cumdist[size - 1] > static_cast<scalar_t>(0)` failed.
Traceback (most recent call last):
File "/home/albert/anaconda3/envs/testConda/lib/python3.7/site-packages/pytorch3d/ops/sample_points_from_meshes.py", line 67, in sample_points_from_meshes
sample_face_idxs += mesh_to_face[meshes.valid].view(num_valid_meshes, 1)
RuntimeError: copy_if failed to synchronize: device-side assert triggered
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=59 : device-side assert triggered
Thank you.
Just realized this has been asked before, sorry about that. I'll see if the closed issues have a solution.
It seems the code runs by setting the cuda device to GPU 1
dev_1 = torch.device("cuda:1")
torch.cuda.set_device(dev_1)
but this doesn't solve the issue if I want to load two meshes on different GPUs.
@awreed do both meshes not fit on the same device?
They do, but they run through functions that I want to run on separate GPUs before calling backwards().
@awreed Can you move them to the same device before calling sample_points_from_meshes?
Yeah that should work, then I could split the sampled points between GPUs afterwards. Thanks for the help!