Hi all,
I am facing an issue during rendering. The cuda is getting out of memory while rendering. I am using colab, and have 11GB gpu on running nvidia-smi.
I have tried several discussions including pytorch's cuda out of memory error from here on pytorch FAQs.
Here is the runtime error.
RuntimeError Traceback (most recent call last)
9 print(i)
10 optimizer.zero_grad()
---> 11 loss, _ = model()
12 loss.backward()
13 optimizer.step()
6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(input, *kwargs)
549 else:
*--> 550 result = self.forward(input, *kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
34 T = -torch.bmm(R.transpose(1, 2), self.camera_position[None, :, None])[:, :, 0] # (1, 3)
35
---> 36 image = self.renderer(meshes_world=self.meshes.clone(), R=R, T=T)
37
38 # Calculate the silhouette loss
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(input, *kwargs)
549 else:
*--> 550 result = self.forward(input, *kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/renderer.py in forward(self, meshes_world, kwargs)
65 pix_to_face=fragments.pix_to_face,
66 )
*---> 67 images = self.shader(fragments, meshes_world, *kwargs)
68
69 return images
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(input, *kwargs)
549 else:
*--> 550 result = self.forward(input, *kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/shader.py in forward(self, fragments, meshes, kwargs)
226
227 def forward(self, fragments, meshes, *kwargs) -> torch.Tensor:
*--> 228 texels = interpolate_texture_map(fragments, meshes)
229 cameras = kwargs.get("cameras", self.cameras)
230 lights = kwargs.get("lights", self.lights)
/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/texturing.py in interpolate_texture_map(fragments, meshes)
75
76 pixel_uvs = pixel_uvs * 2.0 - 1.0
---> 77 texture_maps = torch.flip(texture_maps, [2]) # flip y axis of the texture map
78 if texture_maps.device != pixel_uvs.device:
79 texture_maps = texture_maps.to(pixel_uvs.device)
RuntimeError: CUDA out of memory. Tried to allocate 4.69 GiB (GPU 0; 11.17 GiB total capacity; 5.98 GiB already allocated; 629.88 MiB free; 10.15 GiB reserved in total by PyTorch)
Sun Jul 5 13:21:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 73C P0 75W / 149W | 10811MiB / 11441MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
This does not sound right and this should not be the case. I don't have access to your code but from the snippets you provide it seems you are doing some unnecessary clone, for example image = self.renderer(meshes_world=self.meshes.clone(), R=R, T=T). Every time you clone in PyTorch you are creating copies of tensors and in the case of meshes there is many tensors being stored.
I got your point.
I avoided the clone operation, but even after that issue is same.
One fix I got is reduce the faces_per_pixel=100 to lower value and render image image reduction.
On coming weekend I will sit again and try to see if I am using unnecessary memory.
Reducing the faces_per_pixel is indeed a way to reduce memory. However, with one image at 256x256 I don't think we expect to see OOM issues.
In this note we provide a formula to compute memory usage for forward and backward: https://github.com/facebookresearch/pytorch3d/blob/master/docs/notes/renderer.md. Can you verify that this formula and what you see are the same?
@rohitdavas did you manage to resolve this issue?
@nikhilaravi Sorry, I got busy. I have not found time to start the project again. But as soon as I start, I will first look into this.
@rohitdavas any updates on this issue? If not please close it!
Sorry, I am not able to start work on this. I will reopen if I find something useful. Thanks for your patience.
Lowering image resolution and faces_per_pixel helps. commented above
I have a very similar problem with a GPU of the same size, batch size 1, image size 256x256. Similar code worked on a larger GPU.
self.shader = SoftPhongShader(device=device, cameras=cameras, lights=lights)
img = self.shader(fragments, meshes_world, **kwargs)
GPU out of memory when trying to generate the image with this shader.
The fix of reducing the faces_per_pixel to 1 did not help.
Hello, is there any way I can get help on the above? @nikhilaravi
Thank you :)
Most helpful comment
Reducing the
faces_per_pixelis indeed a way to reduce memory. However, with one image at 256x256 I don't think we expect to see OOM issues.In this note we provide a formula to compute memory usage for forward and backward: https://github.com/facebookresearch/pytorch3d/blob/master/docs/notes/renderer.md. Can you verify that this formula and what you see are the same?