Darknet: Segmentation fault (core dumped) in training own data yolo2 darknet

Created on 7 Nov 2017  Â·  20Comments  Â·  Source: pjreddie/darknet

i have prepare my data according to the instruction given in this link

https://pjreddie.com/darknet/yolo/

i have downloaded weights also.
i have 12GB titanX gpu when i run the darknet for training it give this error.
i use following line for training
./darknet detector train cfg/voc.data cfg/yolo-voc.cfg darknet19_448.conv.23

plz how to solve this problem or why it is comming

image

Most helpful comment

Do you solve this problems now? I also have the same problem in v3.Now i can't train my data,so i hope you can help me slove this probel,thanks.

All 20 comments

And When i try to run with tiny yolo
it return this
image

Did you run the examples? Do they work correctly for you? If it is okay for examples, then can you share your cfg/voc.data cfg/yolo-voc.cfg

Dear same thing is coming i run example.
This screenshots are same when i run examples.

Enable debug option at Makefile and compile source code again.
run darknet in gdb to be able to trace segment fault. The 'run/backtrace/where' commands probably will point the line that rises the fault.

Aminullah6264, show your train list file.

Do you solve this problems now? I also have the same problem in v3.Now i can't train my data,so i hope you can help me slove this probel,thanks.

It will be great if you show us your cfg file

`[net]

Testing

batch=24
subdivisions=8

Training

batch=64

subdivisions=8

width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

#

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]
layers=-9

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=64
activation=leaky

[reorg]
stride=2

[route]
layers=-1,-4

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=40
activation=linear

[region]
anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=3
coords=4
num=5
softmax=1
jitter=.3
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1`

i am facing the same issue of seg fault. this cfg is for 3 classes. i have gtx860 4gb. help plz

i am facing the same issue of segmentation fault. with tried many solutions,the question not solved.finally,i change my label coordinate which is zero to a tiny float,and it's work.i think i will help for somebody.ignore my bad english.

@Jerry3062 can you explain a bit clearly what to change.
what do you mean by label coordinate and in which file it is present?
thanks!

Hi saivineethkumar, in label.txt, I found some coordinates and were zero (0.0).
Remove those reference from the file list.
Its running without any error so far.

Change random flag in the last line of cfg file to 0. Core is getting dumped because image is being resized to very high dimension after some iterations(608 in your case) taking too much memory. If you want random dimensions to increase precision, maybe run the model on cpu instead of gpu.

@saivineethkumar The annotation file. coordinate means (x,y,w,h) or (x1,y1,x2,y2),i forget yolo3's format. In my dataset,some x or y is zero value.

where present label.txt

check out my comment, if it helps; https://github.com/pjreddie/darknet/issues/174#issuecomment-445203621

  1. The issue with 'cannot load images', 'segmentation fault (core dump)', 'cannot fopen', 'cannot open label file', is that the files edited in Windows or any operating system that doesn't support Unix style file formats ('\r' line ending) are transferred to Unix boxes (Ubuntu 16 in my case).
  2. dos2unix, "tr -d '\r' < file > file" tools used on Ubuntu on txt as well as JPG files, but it doesn't work even.
    Solution
    Whatever editing/saving of image files, txt files or any other files, including the marking of objects (yolo_mark tool) should be done only using the Ubuntu or like desktops and not on Windows or non-Unix style operating systems.
    Cheer!!

check out my comment, if it helps; #174 (comment)

This problem solved after I changed yolov3.weights to yolov3-tiny.weights, also I changed yolov3.cfg to yolov3-tiny.cfg. Because my GPU is only 1G memory but yolov3 needs 4G. So if your GPU's memory is lower than 4G, you can try yolov3-tiny

leg
Hey guys,
Here am training on small dataset with larger size(3520*4280) with yolov3-tiny & darknet19_448.conv.23 , even i'm facing the same issue(Segmentation fault-core dumped),i made changes in configuration file(random 1 to 0, batch & Subdivision).Can somebody help me to resolve this?

In my case my training data was the culprit. Make sure your training data is correct.
Specifically, I removed a class after using it on few of the images, which raised this issue.

Use AlexeyAB repo for better exception handling. Some of your data in annotations file might be going out of bound (x, y < 0 / > 1 )

For future reference...

corrupted images can also cause a segmentation fault (core dumped) during training (and probably also during detection!).

In my case, after a few iterations (with no clear pattern) training would just halt and output segmentation fault (core dumped).
Hope it helps!

Best regards,
André

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AndyZX picture AndyZX  Â·  3Comments

gpsmit picture gpsmit  Â·  3Comments

kthordarson picture kthordarson  Â·  3Comments

arianaa30 picture arianaa30  Â·  3Comments

job2003 picture job2003  Â·  3Comments