Darknet: Cuda error: out of memory

Created on 31 Mar 2018 · 13Comments · Source: pjreddie/darknet

I have succesfully trained with VOC data and now i;m trying to train yolo v3 with my own data.
I'm using 1280x960 sized image and changed yolov3.cfg file (# of filters in conv layer that comes before yolo layer and # of classes in yolo layer, and set batch = 64 and subdivisions = 8) and also changed voc.names and voc.data accordingly.

But after going into 30secs, i get a cuda error saying its out of memory.
I'm using 1080ti for gpu.

Any ideas?

Source

abeyang00

Most helpful comment

@abeyang00
change subdivisions to 16 and try ,if still out of memory then make it 32.

ahsan856jalal on 3 Apr 2018

👍5 🎉2

All 13 comments

Did you get nan's while training v3 on VOC. I am currently training VOC but am getting some nan's

Thanks

yasserkhalil93 on 2 Apr 2018

@abeyang00
change subdivisions to 16 and try ,if still out of memory then make it 32.

ahsan856jalal on 3 Apr 2018

👍5 🎉2

@abeyang00 do you get this

subdivisions: Using default '1'
layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  32  0.299 BFLOPs
    1 conv     64  3 x 3 / 2   416 x 416 x  32   ->   208 x 208 x  64  1.595 BFLOPs
    2 conv     32  1 x 1 / 1   208 x 208 x  64   ->   208 x 208 x  32  0.177 BFLOPs
    3 CUDA Error: out of memory
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.

do you know why subdivisions: Using default '1'?

zzj94 on 25 Jun 2018

i've tried every combination of batch and subdivisions but i still always get the out of memory error

darknet.exe detector yolo-obj
layer filters size 0 conv 32 3 x 3 / 1 1 conv 64 3 x 3 / 2 2 conv 32 1 x 1 / 1 3 conv 64 3 x 3 / 1 4 Shortcut Layer: 1
5 conv 128 3 x 3 / 2 6 conv 64 1 x 1 / 1 7 conv 128 3 x 3 / 1 8 Shortcut Layer: 5
9 conv 64 1 x 1 / 1 10 conv 128 3 x 3 / 1 11 Shortcut Layer: 8
12 conv 256 3 x 3 / 2 13 conv 128 1 x 1 / 1 14 conv 256 3 x 3 / 1 15 Shortcut Layer: 12
16 conv 128 1 x 1 / 1 17 conv 256 3 x 3 / 1 18 Shortcut Layer: 15
19 conv 128 1 x 1 / 1 20 conv 256 3 x 3 / 1 21 Shortcut Layer: 18
22 conv 128 1 x 1 / 1 23 conv 256 3 x 3 / 1 24 Shortcut Layer: 21
25 conv 128 1 x 1 / 1 26 conv 256 3 x 3 / 1 27 Shortcut Layer: 24
28 conv 128 1 x 1 / 1 29 conv 256 3 x 3 / 1 30 Shortcut Layer: 27
31 conv 128 1 x 1 / 1 32 conv 256 3 x 3 / 1 33 Shortcut Layer: 30
34 conv 128 1 x 1 / 1 35 conv 256 3 x 3 / 1 36 Shortcut Layer: 33
37 conv 512 3 x 3 / 2 38 conv 256 1 x 1 / 1 39 conv 512 3 x 3 / 1 40 Shortcut Layer: 37
41 conv 256 1 x 1 / 1 42 conv 512 3 x 3 / 1 43 Shortcut Layer: 40
44 conv 256 1 x 1 / 1 45 conv 512 3 x 3 / 1 46 Shortcut Layer: 43
47 conv 256 1 x 1 / 1 48 conv 512 3 x 3 / 1 49 Shortcut Layer: 46
50 conv 256 1 x 1 / 1 51 conv 512 3 x 3 / 1 52 Shortcut Layer: 49
53 conv 256 1 x 1 / 1 54 conv 512 3 x 3 / 1 55 Shortcut Layer: 52
56 conv 256 1 x 1 / 1 57 conv 512 3 x 3 / 1 58 Shortcut Layer: 55
59 conv 256 1 x 1 / 1 60 conv 512 3 x 3 / 1 61 Shortcut Layer: 58
62 conv 1024 3 x 3 / 2 63 conv 512 1 x 1 / 1 64 conv 1024 3 x 3 / 1 65 Shortcut Layer: 62
66 conv 512 1 x 1 / 1 67 conv 1024 3 x 3 / 1 68 Shortcut Layer: 65
69 conv 512 1 x 1 / 1 70 conv 1024 3 x 3 / 1 71 Shortcut Layer: 68
72 conv 512 1 x 1 / 1 73 conv 1024 3 x 3 / 1 74 Shortcut Layer: 71
75 conv 512 1 x 1 / 1 76 conv 1024 3 x 3 / 1 77 conv 512 1 x 1 / 1 78 conv 1024 3 x 3 / 1 79 conv 512 1 x 1 / 1 80 conv 1024 3 x 3 / 1 81 conv 18 1 x 1 / 1 82 yolo
83 route 79
84 conv 256 1 x 1 / 1 85 upsample 86 route 85 61
87 conv 256 1 x 1 / 1 88 conv 512 3 x 3 / 1 89 conv 256 1 x 1 / 1 90 conv 512 3 x 3 / 1 91 conv 256 1 x 1 / 1 92 conv 512 3 x 3 / 1 93 conv 18 1 x 1 / 1 94 yolo
95 route 91
96 conv 128 1 x 1 / 1 97 upsample 98 route 97 36
99 conv 128 1 x 1 / 1 100 conv 256 3 x 3 / 1 101 conv 128 1 x 1 / 1 102 conv 256 3 x 3 / 1 103 conv 128 1 x 1 / 1 104 conv 256 3 x 3 / 1 105 conv 18 1 x 1 / 1 106 yolo
Total BFLOPS 65.290
Allocate additional Loading weights from seen 64
Done!
Learning Rate: 0.001, Resizing
608 x 608
Try to set subdivisions=64 CUDA status Error: CUDA Error: out of memory
CUDA Error: out of memory: No error train -dont_show data/obj.data yolo-obj.cfg darknet53.conv.74
input output
416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
416 x 416 x 32 -> 208 x 208 x 64 1.595 BF
208 x 208 x 64 -> 208 x 208 x 32 0.177 BF
208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
208 x 208 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
104 x 104 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
13 x 13 x1024 -> 13 x 13 x 18 0.006 BF
13 x 13 x 512 -> 13 x 13 x 256 0.044 BF
2x 13 x 13 x 256 -> 26 x 26 x 256
26 x 26 x 768 -> 26 x 26 x 256 0.266 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
26 x 26 x 512 -> 26 x 26 x 18 0.012 BF
26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
2x 26 x 26 x 128 -> 52 x 52 x 128
52 x 52 x 384 -> 52 x 52 x 128 0.266 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
52 x 52 x 256 -> 52 x 52 x 18 0.025 BF
workspace_size = 49.84 MB
darknet53.conv.74...
Momentum: 0.9, Decay: 0.0005
in your cfg-file.
file: C:\Work\Yolo\darknet\src\cuda.c : cuda_make_array() : line: 213 : build time: Mar 11 2019 - 15:10:50

any hints?

cagnulein on 12 Mar 2019

@cagnulein did u solve it? I have the same problem

appoliana on 2 May 2019

@appoliana no, i gave up :(

cagnulein on 2 May 2019

@cagnulein hi, still wanna hints?
I had the same problem, but I solved it. It seems like you have a relatively weak GPU-RAM, so you can try to change width=352 height=352 or width=320 height=320 in your .cfg file

IvanZheleznyakov on 14 May 2019

👍3

hi,
I am using GeForce GTX 750. 1 GB dedicated memory
and try to train with 50 * 50 images and 4 classes
is it possible to train with this gpu?
now it is still getting Out of memory. Please help me

fspider on 16 Oct 2019

hi, i have same problem, too.
But my gpu is 2060super 8gb.

yoonharryp1 on 23 Oct 2019

i solved this problem by increasing the subdivision number from 2 to 64
i checked this by multiplying 2 until no error with watching on TaskManager/Performance/Gpu Panel

[net]
// Testing
batch=24
subdivisions=64
// Training
// batch=64
// subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

but in your computer, it may not need to 64

fspider on 25 Oct 2019

👍2

Thank you. I increased it from 2 and it worked at 16

yoonharryp1 on 25 Oct 2019

I have the same problem.

when I set : batch=32, subdivisions=16 , with W and H all 768, it could train with single gpus (logs shows too quickly)

but train with "--gpus 0,1,2,3" will break with :

CUDA status Error: file: ./src/dark_cuda.c : () : line: 373 : build time: May 20 2020 - 11:56:45

CUDA Error: out of memory
CUDA Error: out of memory: File exists
darknet: ./src/utils.c:326: error: Assertion `0' failed.

could anyone tell me the relationship between "batch size" and the "batch" here?

KosukeHao on 20 May 2020

When Training keep batch=64 and subdivision=16 or 32. If again "our of memory", then, decrease width and height to multiples of 32 (320, 288 ...). After lots of try, it worked for me.