Darknet: Segmentation fault during detector validation

Created on 31 Jan 2019  路  8Comments  路  Source: AlexeyAB/darknet

Greetings.

Problem:
When I use the command darknet detector valid [data] [cfg] [weights] I get segmentation fault.

My configuration

  • GPU: 2x Geforce 1080 Ti
  • GPU=1, CUDNN=1, OPENCV=1, AVX=1, OPENMP=1
  • CUDA 9.0, CUDNN v7
  • cfg:
batch=64
subdivisions=16

width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0005
burn_in=2000
max_batches = 15000
policy=steps
steps=400000,450000
scales=.1,.1

Why I think it is a bug:

  1. The commands darknet detector map and darknet detector recall work fine.
  2. All text data correct and checked for empty lines by sed -n '/^$/=' [file.txt]

Screen shot of a problem
image

Solved want enhancement

All 8 comments

@8greg8 Hi,

Try to change this line: https://github.com/AlexeyAB/darknet/blob/bd91d0a908fd0cd9364e16a00c395500f59cbf58/src/detector.c#L478
to these 2 lines:

    int nthreads = 4;
    if (m < 4) nthreads = m;

@AlexeyAB sorry but it is still the same.

p.s. I was 9 commits behind and updated the repo. Now I have the latest commit but still, the problem is the same.

@8greg8

  • Can you show content of bad.list and bad_label.list?
  • Can you show your obj.data file?
  • How many images are in the valid.txt?
  • Does this error occur only for one dataset or for any dataset?
  • Does this error occur for
    GPU=1, CUDNN=1, OPENCV=0, AVX=0, OPENMP=0
    and for
    GPU=0, CUDNN=0, OPENCV=0, AVX=0, OPENMP=0

@AlexeyAB I will send you the data on Monday.

Can you show content of bad.list and bad_label.list?

bad.list
q

no bad_label.list file

Can you show your obj.data file?

classes=36
train = /home/gregork/dataset/yolo/train.txt
valid = /home/gregork/dataset/yolo/val.txt
backup = /home/gregork/dataset/yolo/backup
names = /home/gregork/dataset/yolo/data/fridge.names

First, a had 35 classes in obj.data and 36 classes in cfg files, but I changed it and it still didn't work. As I can see from the detector.c code, the classes attribute in obj.data file is only used for demo mode.

How many images are in the valid.txt?

43915

Does this error occur only for one dataset or for any dataset?

I tried on two similar datasets. The datasets were generated in the same way but have different sizes and bounding boxes.
Didn't try it on Pascal though.

Does this error occur for GPU=1, CUDNN=1, OPENCV=0, AVX=0, OPENMP=0

Yes

Does this error occur for GPU=0, CUDNN=0, OPENCV=0, AVX=0, OPENMP=0

Yes

Debug results

Ok. I was able to configure VSCode and debug the validation.
Segmentation fault is signaled on line
https://github.com/AlexeyAB/darknet/blob/f7cb538b32d95e99a6751bf312f7c846419389d7/src/detector.c#L387 in detector.c
Variables on this line:

i=0
j=32
dets[i].prob[j] = 0.638847947
xmin = 493.575775
xmax = 560.430603
ymin = 200.865601
ymax = 329.710144
id = "img_5411"
fps[j] = 0x0

For previous j the probability was always 0.

I found out that something is wrong with the file pointers. If I change the line https://github.com/AlexeyAB/darknet/blob/f7cb538b32d95e99a6751bf312f7c846419389d7/src/detector.c#L387 with fprintf(stdout, "%s %f %f %f %f %f\n", id, dets[i].prob[j], xmin, ymin, xmax, ymax); everything works.

Note if I add -prefix to the command still not working. I have gcc/g++ -v 5.5

@8greg8

for (j = 0; j < classes; ++j) {
    if(fps[j] == NULL) {
        printf(" Can't create file! \n");
        exit(-1);
    }
}

then what output do you get?

Can you rename your cfg-file to txt file and drag-n-drop it to your message?

yolov2_fridge_train_2gpu.txt

Can you show content of your obj.data file?

fridge.txt

If you don't use default datasets (MS COCO, Pascal VOC, ImageNet, ...) why do you want to use detector valid instead of detector map ?

When I started using this repo I thought this is the main command to validate detectors. If you type darknet detector in terminal you get usage: darknet detector [train/test/valid/demo/map] [data] [cfg] [weights (optional)]. So I thought the valid is used to get all the metrics (precision, recall, etc.) and map only for map. Unfortunately you don't have classical darknet -h command for help, and is somehow hard to figure it out what each command is supposed to do. Don't get me wrong though. This repo is great and I love it now that I am more familiar with it.

then what output do you get?

I also printed buff and now we got it.

I didn't have the results folder in my yolo directory.

I forgot to tell you that I'm not using the repo folder. I want to keep it clean.

image

After adding the folder the program is working without segmentation fault.

image

Thank you for helping me :).

@8greg8

After adding the folder the program is working without segmentation fault.

This part of Makefile should automatically create the folder /results/

https://github.com/AlexeyAB/darknet/blob/ce2e0eff009653e207df34f83fa6259a39f6fe84/Makefile#L151-L152


I will think about adding ./darknet -h and ./darknet -help

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Yumin-Sun-00 picture Yumin-Sun-00  路  3Comments

bit-scientist picture bit-scientist  路  3Comments

siddharth2395 picture siddharth2395  路  3Comments

zihaozhang9 picture zihaozhang9  路  3Comments

Cipusha picture Cipusha  路  3Comments