Pytorch3d: knn_points unexpected error using CUDA

Created on 7 Jul 2020 · 6Comments · Source: facebookresearch/pytorch3d

When using knn_points on CUDA with large K (e.g. 4000) the code fails with the error

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1587428111115/work/aten/src/THC/THCReduceAll.cuh line=327 error=6 : the launch timed out and was terminated
Traceback (most recent call last):
  File "testknn.py", line 11, in <module>
    d = knn_points(g,p,None,None,K).dists
  File "/home/eduardo/Documents/Experiments/pytorch3d/pytorch3d/ops/knn.py", line 161, in knn_points
    p1_dists, p1_idx = _knn_points.apply(p1, p2, lengths1, lengths2, K, version)
  File "/home/eduardo/Documents/Experiments/pytorch3d/pytorch3d/ops/knn.py", line 56, in forward
    if lengths2.min() < K:
RuntimeError: cuda runtime error (6) : the launch timed out and was terminated at /opt/conda/conda-bld/pytorch_1587428111115/work/aten/src/THC/THCReduceAll.cuh:327

Tried all different versions of KNN (0 to 3) and all of them result in the same error. Using a smaller K (e.g. 2000) works just fine. If you change the device to cpu it works just fine. It does not seem to be a memory problem as the GPU memory is way below the maximum RAM of 8Gb.

Using pytorch3d compiled from source on commit 7f1e63aed1252ba8145d4a.

To Reproduce the Issue:

import torch
from pytorch3d.ops import knn_points

K = 4000
device = torch.device('cuda')
p = torch.rand(16, 200000, 3).to(device)
g = torch.rand(16, 8, 3).to(device)
d = knn_points(g, p, None, None, K).dists
print(d.shape)

question

Source

eduardohenriquearnold

Most helpful comment

Our current KNN implementation will be catastrophically slow for K=4000 -- in that regime you will probably be better off using FAISS (https://github.com/facebookresearch/faiss) for KNN instead of PyTorch3D.

jcjohnson on 8 Jul 2020

👍2

All 6 comments

The PyTorch3D KNN is optimized for K < 32 and D < 4 (where D is the feature size). What is your use case for needing to use K = 4000?

nikhilaravi on 8 Jul 2020

👍2

Maybe the calculation is just taking a long time in your case. If you are using the GPU for display as well as computation, the GPU can automatically kill long-running functions. See e.g. here. Can you split your calculation along the batch dimension so that each one will be quicker?

bottler on 8 Jul 2020

👍2

Thanks for the prompt responses.

The PyTorch3D KNN is optimized for K < 32 and D < 4 (where D is the feature size). What is your use case for needing to use K = 4000?

A very large K doesn't seem to make much sense but I will try and explain my case. I'm optimising the sensor pose such that each object that I want to observe has a very large number of points on their surfaces - more accurately, I want to maximise the average number of points over all objects' surfaces. Since the average number of points on objects' surface is not differentiable itself (the counting operation is not differentiable), I am optimising the L2 distance between the centre of the objects and the K nearest points in their neighbourhood. However I need K >> 32, e.g. 1k or 4k. The KNN with large K seems to work fine on the CPU, but if I move the whole pipeline to the CPU my execution time increases drastically. Perhaps there is a simpler loss function to maximise the number of points on the surface of objects? I tried using the Chamfer loss between all points and the centre of objects, but the sensors tend to just collapse into some of the objects, thus I decided to limit the distance to K points only.

Maybe the calculation is just taking a long time in your case. If you are using the GPU for display as well as computation, the GPU can automatically kill long-running functions. See e.g. here. Can you split your calculation along the batch dimension so that each one will be quicker?

Thanks for the suggestion. Unfortunately I cannot disable my X server since I need to generate a visualisation of the optimisation process. Splitting the calculation to the batch dim did help improve to improve the speed when K=1000, but still resulted in the kernel being killed for K=4000.

eduardohenriquearnold on 8 Jul 2020

jcjohnson on 8 Jul 2020

👍2

Thanks, I believe this solves my problem!

eduardohenriquearnold on 10 Jul 2020

@jcjohnson do you happen to know if the FAISS compat with pytorch provides a differentiable KNN implementation? thank you!