Pytorch3d: Cuda out of memory during rendering and optimisation on textured mesh on 11 Gb GPU.

Created on 5 Jul 2020  路  9Comments  路  Source: facebookresearch/pytorch3d

Hi all,

I am facing an issue during rendering. The cuda is getting out of memory while rendering. I am using colab, and have 11GB gpu on running nvidia-smi.

I have tried several discussions including pytorch's cuda out of memory error from here on pytorch FAQs.

  1. batch size is 1 only.
  2. image size to render is 256
  3. model have only 3 parameters as the same in your tutorial on camera position optimisation but I am using a textured mesh.

Here is the runtime error.

--------------------------------------------------------------------------------------------------------------------------

RuntimeError Traceback (most recent call last)
in ()
9 print(i)
10 optimizer.zero_grad()
---> 11 loss, _ = model()
12 loss.backward()
13 optimizer.step()

6 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(
input, *kwargs)
549 else:
*
--> 550 result = self.forward(input, *kwargs)

551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)

in forward(self)
34 T = -torch.bmm(R.transpose(1, 2), self.camera_position[None, :, None])[:, :, 0] # (1, 3)
35
---> 36 image = self.renderer(meshes_world=self.meshes.clone(), R=R, T=T)
37
38 # Calculate the silhouette loss

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(
input, *kwargs)
549 else:
*
--> 550 result = self.forward(input, *kwargs)

551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/renderer.py in forward(self, meshes_world, kwargs)
65 pix_to_face=fragments.pix_to_face,
66 )
*---> 67 images = self.shader(fragments, meshes_world, *kwargs)

68
69 return images

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in __call__(self, input, kwargs)
548 result = self._slow_forward(
input, *kwargs)
549 else:
*
--> 550 result = self.forward(input, *kwargs)

551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)

/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/shader.py in forward(self, fragments, meshes, kwargs)
226
227 def forward(self, fragments, meshes, *kwargs) -> torch.Tensor:
*
--> 228 texels = interpolate_texture_map(fragments, meshes)

229 cameras = kwargs.get("cameras", self.cameras)
230 lights = kwargs.get("lights", self.lights)

/usr/local/lib/python3.6/dist-packages/pytorch3d/renderer/mesh/texturing.py in interpolate_texture_map(fragments, meshes)
75
76 pixel_uvs = pixel_uvs * 2.0 - 1.0
---> 77 texture_maps = torch.flip(texture_maps, [2]) # flip y axis of the texture map
78 if texture_maps.device != pixel_uvs.device:
79 texture_maps = texture_maps.to(pixel_uvs.device)

RuntimeError: CUDA out of memory. Tried to allocate 4.69 GiB (GPU 0; 11.17 GiB total capacity; 5.98 GiB already allocated; 629.88 MiB free; 10.15 GiB reserved in total by PyTorch)

----------------- on running nvidia-smi ------------------------

Sun Jul 5 13:21:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.36.06 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 73C P0 75W / 149W | 10811MiB / 11441MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

how to

Most helpful comment

Reducing the faces_per_pixel is indeed a way to reduce memory. However, with one image at 256x256 I don't think we expect to see OOM issues.
In this note we provide a formula to compute memory usage for forward and backward: https://github.com/facebookresearch/pytorch3d/blob/master/docs/notes/renderer.md. Can you verify that this formula and what you see are the same?

All 9 comments

This does not sound right and this should not be the case. I don't have access to your code but from the snippets you provide it seems you are doing some unnecessary clone, for example image = self.renderer(meshes_world=self.meshes.clone(), R=R, T=T). Every time you clone in PyTorch you are creating copies of tensors and in the case of meshes there is many tensors being stored.

I got your point.

I avoided the clone operation, but even after that issue is same.

One fix I got is reduce the faces_per_pixel=100 to lower value and render image image reduction.

On coming weekend I will sit again and try to see if I am using unnecessary memory.

Reducing the faces_per_pixel is indeed a way to reduce memory. However, with one image at 256x256 I don't think we expect to see OOM issues.
In this note we provide a formula to compute memory usage for forward and backward: https://github.com/facebookresearch/pytorch3d/blob/master/docs/notes/renderer.md. Can you verify that this formula and what you see are the same?

@rohitdavas did you manage to resolve this issue?

@nikhilaravi Sorry, I got busy. I have not found time to start the project again. But as soon as I start, I will first look into this.

@rohitdavas any updates on this issue? If not please close it!

Sorry, I am not able to start work on this. I will reopen if I find something useful. Thanks for your patience.

Lowering image resolution and faces_per_pixel helps. commented above

I have a very similar problem with a GPU of the same size, batch size 1, image size 256x256. Similar code worked on a larger GPU.

self.shader = SoftPhongShader(device=device, cameras=cameras, lights=lights)
img = self.shader(fragments, meshes_world, **kwargs)

GPU out of memory when trying to generate the image with this shader.

The fix of reducing the faces_per_pixel to 1 did not help.

Hello, is there any way I can get help on the above? @nikhilaravi

Thank you :)

Was this page helpful?
0 / 5 - 0 ratings