Pytorch-lightning: Multi-GPU Training GPU Usage

Created on 25 Jul 2020  ยท  4Comments  ยท  Source: PyTorchLightning/pytorch-lightning

โ“ Multi-GPU Training GPU Usage

Before asking:

  1. search the issues.
  2. search the docs.

Hi, I'm using lightning and ddp as backend to do multi-gpu training, with Apex amp (amp_level = 'O1'). The gpu number is 8. I noticed that during training, most of time GPU0's utilization is 0%, while others are almost 100%. But their memory usage are the same. Is this normal? I use OpenPAI and have attached their utilization and memeory usage below. Thanks.

amp_O1_gpu_usage

Code

What have you tried?

What's your environment?

  • OS: [e.g. iOS, Linux, Win]
  • Packaging [e.g. pip, conda]
  • Version [e.g. 0.5.2.1]
DDP question

Most helpful comment

Your cpu usage seems high. It could be the cpu is the bottleneck here. Try fewer gpus and observe then observe the gpu utilization.

Yes. I have found it out. Thank you so much!

All 4 comments

check the cpu usage, to make sure dataloading is not a bottleneck.

check the cpu usage, to make sure dataloading is not a bottleneck.

Hi, thanks for the reply. The total metric(including cpu, gpu) is as follows:

Screenshot from 2020-07-26 00-58-22

From these images, I have no idea if dataloading is a bottleneck.

I also did profilers, as mentioned in this issue . The dataloading time is constant. But there's some weird call to ''torch._C._TensorBase' objects' that takes a lot of time.

Thanks!

Your cpu usage seems high. It could be the cpu is the bottleneck here. Try fewer gpus and observe then observe the gpu utilization.

Your cpu usage seems high. It could be the cpu is the bottleneck here. Try fewer gpus and observe then observe the gpu utilization.

Yes. I have found it out. Thank you so much!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maxime-louis picture maxime-louis  ยท  3Comments

baeseongsu picture baeseongsu  ยท  3Comments

mmsamiei picture mmsamiei  ยท  3Comments

justusschock picture justusschock  ยท  3Comments

versatran01 picture versatran01  ยท  3Comments