Darknet: Segmentation fault (core dumped) in training own data yolo2 darknet

Created on 7 Nov 2017 · 20Comments · Source: pjreddie/darknet

i have prepare my data according to the instruction given in this link

i have downloaded weights also.
i have 12GB titanX gpu when i run the darknet for training it give this error.
i use following line for training
./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23

plz how to solve this problem or why it is comming

Source

Aminullah6264

Most helpful comment

Do you solve this problems now? I also have the same problem in v3.Now i can't train my data,so i hope you can help me slove this probel,thanks.

lo-pan on 29 Mar 2018

👍7

All 20 comments

And When i try to run with tiny yolo
it return this

Aminullah6264 on 7 Nov 2017

Did you run the examples? Do they work correctly for you? If it is okay for examples, then can you share your cfg/voc.data cfg/yolo-voc.cfg

workingforfood on 8 Nov 2017

Dear same thing is coming i run example.
This screenshots are same when i run examples.

Aminullah6264 on 8 Nov 2017

Enable debug option at Makefile and compile source code again.
run darknet in gdb to be able to trace segment fault. The 'run/backtrace/where' commands probably will point the line that rises the fault.

rperrones on 9 Nov 2017

👍2

Aminullah6264, show your train list file.

Dmitrivm on 17 Nov 2017

Do you solve this problems now? I also have the same problem in v3.Now i can't train my data,so i hope you can help me slove this probel,thanks.

lo-pan on 29 Mar 2018

👍7

It will be great if you show us your cfg file

ahsan856jalal on 29 Mar 2018

`[net]

Testing

batch=24
subdivisions=8

Training

batch=64

subdivisions=8

width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]
layers=-9

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=64
activation=leaky

[reorg]
stride=2

[route]
layers=-1,-4

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=40
activation=linear

[region]
anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=3
coords=4
num=5
softmax=1
jitter=.3
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1`

i am facing the same issue of seg fault. this cfg is for 3 classes. i have gtx860 4gb. help plz

dee6600 on 10 May 2018

i am facing the same issue of segmentation fault. with tried many solutions,the question not solved.finally,i change my label coordinate which is zero to a tiny float,and it's work.i think i will help for somebody.ignore my bad english.

Jerry3062 on 22 May 2018

👍3

@Jerry3062 can you explain a bit clearly what to change.
what do you mean by label coordinate and in which file it is present?
thanks!

saivineethkumar on 14 Jun 2018

Hi saivineethkumar, in label.txt, I found some coordinates and were zero (0.0).
Remove those reference from the file list.
Its running without any error so far.

arun-kumark on 15 Jun 2018

👍1

Change random flag in the last line of cfg file to 0. Core is getting dumped because image is being resized to very high dimension after some iterations(608 in your case) taking too much memory. If you want random dimensions to increase precision, maybe run the model on cpu instead of gpu.

singhnarotam1997 on 16 Jun 2018

👍3 👎1

@saivineethkumar The annotation file. coordinate means (x,y,w,h) or (x1,y1,x2,y2),i forget yolo3's format. In my dataset,some x or y is zero value.

Jerry3062 on 21 Jun 2018

where present label.txt

Jibin-John on 26 Jun 2018

check out my comment, if it helps; https://github.com/pjreddie/darknet/issues/174#issuecomment-445203621

sachindesh on 7 Dec 2018

The issue with 'cannot load images', 'segmentation fault (core dump)', 'cannot fopen', 'cannot open label file', is that the files edited in Windows or any operating system that doesn't support Unix style file formats ('\r' line ending) are transferred to Unix boxes (Ubuntu 16 in my case).

dos2unix, "tr -d '\r' < file > file" tools used on Ubuntu on txt as well as JPG files, but it doesn't work even.
Solution
Whatever editing/saving of image files, txt files or any other files, including the marking of objects (yolo_mark tool) should be done only using the Ubuntu or like desktops and not on Windows or non-Unix style operating systems.
Cheer!!

check out my comment, if it helps; #174 (comment)

This problem solved after I changed yolov3.weights to yolov3-tiny.weights, also I changed yolov3.cfg to yolov3-tiny.cfg. Because my GPU is only 1G memory but yolov3 needs 4G. So if your GPU's memory is lower than 4G, you can try yolov3-tiny

wilkice on 24 Jan 2019

leg
Hey guys,
Here am training on small dataset with larger size(3520*4280) with yolov3-tiny & darknet19_448.conv.23 , even i'm facing the same issue(Segmentation fault-core dumped),i made changes in configuration file(random 1 to 0, batch & Subdivision).Can somebody help me to resolve this?

Mahibro on 15 Apr 2019

In my case my training data was the culprit. Make sure your training data is correct.
Specifically, I removed a class after using it on few of the images, which raised this issue.