Pytorch_geometric: PointNet2 classfication Failure Too Many Resources

Created on 15 May 2020 · 7Comments · Source: rusty1s/pytorch_geometric

❓ Questions & Help

Hi,

I implemented the program with reference to examples/pointnet2_classification.py and used Google Colablatory's GPU to learn the model.

I was able to save the model I had learned in Colab and call that model in Colab to make an inference.

However, when I save the model that I learned in Colab and try to call that model in Jetson Nano to make an inference, I get the following error.

What is the reason for this? Do you have any solutions?

Thank you very much.

Traceback (most recent call last):
  File "pointnet.py", line 274, in <module>
    test_acc = test(test_loader)
  File "pointnet.py", line 149, in test
    pred = model(data).max(1)[1]
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "pointnet.py", line 99, in forward
    sa1_out = self.sa1_module(*sa0_out)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "pointnet.py", line 38, in forward
    row, col = radius(pos, pos[idx], self.r, batch, batch[idx], max_num_neighbors=64)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch_geometric/nn/pool/__init__.py", line 159, in radius
    return torch_cluster.radius(x, y, r, batch_x, batch_y, max_num_neighbors)
  File "/home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch_cluster/radius.py", line 76, in radius
    max_num_neighbors)
RuntimeError: CUDA error: too many resources requested for launch (launch_kernel at /media/nvidia/WD_BLUE_2.5_1TB/pytorch/20200116/pytorch-v1.4.0/aten/src/ATen/native/cuda/Loops.cuh:103)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x78 (0x7f80044258 in /home/jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libc10.so)


⋮


frame #33: <unknown
 function> + 0x2b06a34 (0x7f82bafa34 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #34: <unknown function> + 0x48e2098 (0x7f8498b098 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #35: <unknown function> + 0x68b034 (0x7faa04a034 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #36: <unknown function> + 0x652d28 (0x7faa011d28 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #37: <unknown function> + 0x296ce4 (0x7fa9c55ce4 in /home/kohei-jetson/.virtualenvs/py33d/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python() [0x52ba70]
frame #41: python() [0x52b108]
frame #42: python() [0x52b69c]
frame #43: python() [0x52b8f4]
frame #45: python() [0x52b108]
frame #46: python() [0x52b69c]
frame #47: python() [0x52b8f4]
frame #49: python() [0x529978]
frame #51: python() [0x5f4d34]
frame #54: python() [0x52b108]
frame #57: python() [0x5f4d34]
frame #59: python() [0x597fb4]
frame #62: python() [0x529978]

Source

crea397

All 7 comments

Can you tell me something about your input? Like batch size or number of points? Does it work when reducing either of those?

rusty1s on 15 May 2020

👍1

An example of a test dataset is shown below.

Test Dataset
- Number of datasets: 210
- batch_size: 10
- Number of input points: 100
- The number of input points is 100 for all 210 datasets.
DataLoader
- DataLoader:
- len(DataLoader): 21
Batch
- Batch: Batch(batch=[1000], pos=[1000, 3], y=[10])
Data
- Data: Data(pos=[100, 3], y=[1])
- type(Data.pos)
- Data.pos.dtype torch.float32
- type(Data.y)
- Data.y.dtype torch.int64

I experimented with different batch sizes and number of input points.

|device |batchsize |number of input points |result |
|---|---|---|---|
|Colabo |10 |100 |◯ |
|Jetson Nano |10 |100 |× |
|Colabo |1 |100 |◯ |
|Jetson Nano |1 |100 |× |
|Colabo |1 |50 |◯ |
|Jetson Nano |1 |50 |× |

crea397 on 15 May 2020

I see, so it does not work for any input. This suggest that radius might be broken on this device. Can you do me a favor and see if it fixes the issue when running the radius method on CPU?

rusty1s on 15 May 2020

👍1

How do I run the radius method on the CPU?

Should I, for example, call the model I learned in Colab on my Mac and make an inference?

crea397 on 15 May 2020

just add cpu() calls to all input arguments, and put the output back to the GPU.

rusty1s on 15 May 2020

🎉1

I've changed the code to the following.

row, col = radius(pos.cpu(), pos[idx].cpu(), self.r, batch.cpu(), batch[idx].cpu(), max_num_neighbors=64)
row = row.to(device)
col = col.to(device)

As a result, I was able to run the model learned in Colab on the Jetson Nano.

|device |batchsize |number of input points |result |
|---|---|---|---|
|Jetson Nano |10 |100 |◯ |
|Jetson Nano |1 |100 |◯ |
|Jetson Nano |1 |50 |◯ |

I deeply apologize for your courteous response! 🙇‍♂️

crea397 on 15 May 2020

Glad that it is working, although I am still unsure what may cause this behavior on GPU.

rusty1s on 15 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Batch.to_data_list() for bipartite graph

a060146251 · 3Comments

Require for the support of pytorch 1.7 and cuda 11.0

Zhangzhk0819 · 3Comments

Dataset dimensions

SaschaStenger · 4Comments

Pytorch 1.2.0 compatibility

weihua916 · 3Comments

Is it possible that data.y is also edges, like edge_index

zhangfuyang · 4Comments