Darknet: Error: Floating point exception (core dumped)

Created on 8 Feb 2019 · 11Comments · Source: pjreddie/darknet

I am training Darknet YOLO-V3 on cat-dog dataset. When I do the training portion, following error occurs. Can someone help me. The error is
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
23 yolo
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
576
Floating point exception (core dumped)

How can I solve this?

Source

ashnaeldho

👍5

Most helpful comment

I have the same problem and I followed this.
I believe Floating point exception is because batch/subdivision in .cfg file is not integer. I changed it to generate integer and it started working.
@ashnaeldho could you verify this, if this helps?

harshthakkar01 on 14 May 2019

👍2

All 11 comments

I am having the exact same problem. But I cannot even get to the first learning Rate
I have gone through my cfg file line by line too

TacoMeatless on 14 Feb 2019

Same problem here, trying to train darknet on CPU.

damnko on 12 Apr 2019

Same same. I migrated it from CPU to a GPU Gcloud instance but am still seeing the floating point issue. Wondering if the annotation text file conversion from BBox to Yolo got messed up somewhere.

kalyco on 30 Apr 2019

harshthakkar01 on 14 May 2019

👍2

@harshthakkar01 that is the tutorial I followed too.
After fixing an incorrect path on my training set per my comment here, I was still having issues with the CPU defaulting over GPU.
I ended up needing to stop using CMake because it was improperly configuring my Makefile, which needs to include

GPU=1
CUDNN=1
OPENCV=1
DEBUG=1

I've also seen this comment which has helped other people to add

PATH=/usr/local/cuda-<YOUR_VERSION>/bin${PATH:+:${PATH}} 
LD_LIBRARY_PATH=/usr/local/cuda-<YOUR_VERSION>/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
NVCC = /usr/local/cuda/bin/nvcc

to .bashrc

kalyco on 14 May 2019

I came across the same problem.
It was caused by a small mistake in the .data file.
It was supposed to locate the train.txt file like the following, but I didn't.

  1 classes= 20
  2 train  = <path-to-voc>/train.txt
  3 valid  = <path-to-voc>2007_test.txt

hope it may help.

Ti-tanium on 14 Jun 2019

👍1

In my case i was compiling darknet with cmake when the problem occurs, i changed compiling mode to make and then it worked.

ixtiyoruz on 10 Jan 2020

In my case, it worked after setting the subdivisions to a ~lower~ bigger number (4)

maykulkarni on 12 Jan 2020

In my case, it worked after setting the subdivisions to a lower number (4)

If batch_size is also a low number, 4 for example, it still doesn't work. However, a higher batch_size with lower subdivisions often leads to the error "GPU out of memory".

cloudy-sfu on 3 Mar 2020

@cloudy-sfu subdivisions is basically how many images batches to consider while passing it to the model. Meaning, batch_size of 64 and subdivisions of 64 would mean only 1 image would be passed. I have corrected my previous message what I meant is increase the number of subdivisions, not decrease.

maykulkarni on 3 Mar 2020

👍1

Error occurred due to an empty train.txt file created by an external script. Since I have encountered this script several times now on the web, I would add this workaround here. While populating the test and train files the current directory is searched, but the path to the data isn't added.

for pathAndFilename in glob.iglob(os.path.join(current_dir, ".jpg")):
=>
for pathAndFilename in glob.iglob(os.path.join(current_dir, path_data, ".jpg")):

fprotopapa on 18 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Rules for setting width and height in yolov3.cfg

ivomarvan · 3Comments

Help

ghost · 4Comments

Just one CPU core is being used for object detection in darknet tiny yolo on Raspberry pi 3 B+

Vikalp-Reorder · 3Comments

CUDA status Error: file: ./src/dark_cuda.c : () : line: 239 : build time: Apr 16 2020 - 16:43:48 CUDA Error: invalid argument: File exists darknet: ./src/utils.c:325: error: Assertion `0' failed.

SK124 · 3Comments

Issue compiling Darknet with GPU

cadip92 · 3Comments