Darknet: How do I use 2 gpus to train the same yolov3 model?

Created on 30 Sep 2019 · 6Comments · Source: pjreddie/darknet

I'm having trouble training my yolov3 in my PC, which has 2 gpus of 8 GB. The problem is I can only train it setting Batch = 64 and Subdivisions=64. Any other value for subdivisions (32, 16, 8) generates an error: CUDA out of memory. The problem is I have 2 gpus and the program doesn't recognize the excess and allocate to the other gpu even setting flags -gpus 0,1 and compiling with CUDA. I want to lower subdivision to get a bigger accuracy. How do I use both gpus to train the same yolov3 model?

Source

thimabru1010

Most helpful comment

Do you want to increase mini_batch size 2x and decrease performance 100x?

AlexeyAB on 30 Sep 2019

😄3

All 6 comments

What I want to mean: https://stackoverflow.com/questions/36313934/is-it-possible-to-split-a-network-across-multiple-gpus-in-tensorflow

Here is an example using Tensorflow in python

thimabru1010 on 30 Sep 2019

You can use 2-4x GPUs to train 2x-4x faster, so after each iteration GPUs will be synchronized.

No, you cant use lower subdivisions. Because GPU<->GPU PCI-express interconnect (~16 GB/sec) is much slower than GPU-VRAM (500 GB/sec), so training speed would be 10x-100x slower.

AlexeyAB on 30 Sep 2019

But I don't want to train faster. I just want to allocate more memory.

thimabru1010 on 30 Sep 2019

Do you want to increase mini_batch size 2x and decrease performance 100x?

AlexeyAB on 30 Sep 2019

😄3

Not decrease performance, just want more memory to train with less Subdivisions. But CUDA always runs out of memory with subs<64

thimabru1010 on 1 Oct 2019

So is it possible to split the network between two GPUs, without worrying about the speed, if I just want to have a larger minibatch? Currently I am only able to fit one image in a minibatch, But with very little resolutions, the objects I want to identify are small so they lose features at lower resolution.