Pytorch-lightning: CUDA out of memory

Created on 2 May 2020  路  6Comments  路  Source: PyTorchLightning/pytorch-lightning

I set Cuda to 5
INFO:lightning:CUDA_VISIBLE_DEVICES: [5]

however, there always a small part of the memory in gpu 0, when gpu 0 has not enough memory, error is produced.

image

in the img you can see that two processes is running on gpu5,6 and 2 * 577M is running gpu0.
I want the processes to only use what GPU I set.

bug / fix help wanted

All 6 comments

m24.py
image

I have this same issue as well, it happens regardless of memory pinning

I am using this as a temporary fix

if __name__ == "__main__":
    args = parse_arguments()
    os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
    os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(map(str, args.gpus))
    args.gpus = list(range(len(args.gpus)))
    ...
    trainer = pl.Trainer(gpus=args.gpus)

However, if I call torch.cuda.device_count() beforehand like so, the issue still occurs. There might be something that occurred when device_count is called.

    if len(args.gpus) != torch.cuda.device_count():
        os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(map(str, args.gpus))
        args.gpus = list(range(len(args.gpus)))

you shall user batch side finder

can you try running on master? and also decrease your batch size?

can you try running on master? and also decrease your batch size?

It has no matter with the batch size, I solved the problem with torch.cuda.set_device(6)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anthonytec2 picture anthonytec2  路  3Comments

mmsamiei picture mmsamiei  路  3Comments

jcreinhold picture jcreinhold  路  3Comments

monney picture monney  路  3Comments

versatran01 picture versatran01  路  3Comments