Pytorch_geometric: PointNet2 classfication Failure Too Many Resources

Created on 15 May 2020  ·  7Comments  ·  Source: rusty1s/pytorch_geometric

❓ Questions & Help

Hi,

I implemented the program with reference to examples/pointnet2_classification.py and used Google Colablatory's GPU to learn the model.

I was able to save the model I had learned in Colab and call that model in Colab to make an inference.

However, when I save the model that I learned in Colab and try to call that model in Jetson Nano to make an inference, I get the following error.

What is the reason for this? Do you have any solutions?

Thank you very much.

Traceback (most recent call last):
  File "pointnet.py", line 274, in <module>
    test_acc = test(test_loader)
  File "pointnet.py", line 149, in test
    pred = model(data).max(1)[1]
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "pointnet.py", line 99, in forward
    sa1_out = self.sa1_module(*sa0_out)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "pointnet.py", line 38, in forward
    row, col = radius(pos, pos[idx], self.r, batch, batch[idx], max_num_neighbors=64)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch_geometric/nn/pool/__init__.py", line 159, in radius
    return torch_cluster.radius(x, y, r, batch_x, batch_y, max_num_neighbors)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch_cluster/radius.py", line 76, in radius
    max_num_neighbors)
RuntimeError: CUDA error: too many resources requested for launch (launch_kernel at /media/nvidia/WD_BLUE_2.5_1TB/pytorch/20200116/pytorch-v1.4.0/aten/src/ATen/native/cuda/Loops.cuh:103)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f80044258 in /home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libc10.so)


⋮


frame #33: <unknown
 function> + 0x2b06a34 (0x7f82bafa34 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #34: <unknown function> + 0x48e2098 (0x7f8498b098 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #35: <unknown function> + 0x68b034 (0x7faa04a034 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #36: <unknown function> + 0x652d28 (0x7faa011d28 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #37: <unknown function> + 0x296ce4 (0x7fa9c55ce4 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python() [0x52ba70]
frame #41: python() [0x52b108]
frame #42: python() [0x52b69c]
frame #43: python() [0x52b8f4]
frame #45: python() [0x52b108]
frame #46: python() [0x52b69c]
frame #47: python() [0x52b8f4]
frame #49: python() [0x529978]
frame #51: python() [0x5f4d34]
frame #54: python() [0x52b108]
frame #57: python() [0x5f4d34]
frame #59: python() [0x597fb4]
frame #62: python() [0x529978]

All 7 comments

Can you tell me something about your input? Like batch size or number of points? Does it work when reducing either of those?

An example of a test dataset is shown below.

  • Test Dataset

    • Number of datasets: 210
    • batch_size: 10
    • Number of input points: 100
    • The number of input points is 100 for all 210 datasets.
  • DataLoader

    • DataLoader:
    • len(DataLoader): 21
  • Batch

    • Batch: Batch(batch=[1000], pos=[1000, 3], y=[10])
  • Data

    • Data: Data(pos=[100, 3], y=[1])
    • type(Data.pos)
    • Data.pos.dtype torch.float32
    • type(Data.y)
    • Data.y.dtype torch.int64

I experimented with different batch sizes and number of input points.

|device |batchsize |number of input points |result |
|---|---|---|---|
|Colabo |10 |100 |◯ |
|Jetson Nano |10 |100 |× |
|Colabo |1 |100 |◯ |
|Jetson Nano |1 |100 |× |
|Colabo |1 |50 |◯ |
|Jetson Nano |1 |50 |× |

I see, so it does not work for any input. This suggest that radius might be broken on this device. Can you do me a favor and see if it fixes the issue when running the radius method on CPU?

How do I run the radius method on the CPU?

Should I, for example, call the model I learned in Colab on my Mac and make an inference?

just add cpu() calls to all input arguments, and put the output back to the GPU.

I've changed the code to the following.

row, col = radius(pos.cpu(), pos[idx].cpu(), self.r, batch.cpu(), batch[idx].cpu(), max_num_neighbors=64)
row = row.to(device)
col = col.to(device)

As a result, I was able to run the model learned in Colab on the Jetson Nano.

|device |batchsize |number of input points |result |
|---|---|---|---|
|Jetson Nano |10 |100 |◯ |
|Jetson Nano |1 |100 |◯ |
|Jetson Nano |1 |50 |◯ |

I deeply apologize for your courteous response! 🙇‍♂️

Glad that it is working, although I am still unsure what may cause this behavior on GPU.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

a060146251 picture a060146251  ·  3Comments

Zhangzhk0819 picture Zhangzhk0819  ·  3Comments

SaschaStenger picture SaschaStenger  ·  4Comments

weihua916 picture weihua916  ·  3Comments

zhangfuyang picture zhangfuyang  ·  4Comments