Darknet: free(): invalid next size (fast) when using custom trained weights

Created on 16 May 2018  路  8Comments  路  Source: pjreddie/darknet

Hello, I keep receiving this output when I used my custom weights:

Loading weights from backup/yolov3-logo_50000.weights...Done!
/home/ryan/Projects/data_synthesization/output/dataset/01a0ad79-94d3-4923-b210-d87b3cfd4f82.mov-0001.jpg: Predicted in 9.369853 seconds.
hyundai: 100%
free(): invalid next size (fast)

The command is:
./darknet detector test cfg/voc.data cfg/yolov3-logo.cfg backup/yolov3-logo_50000.weights /home/ryan/Projects/data_synthesization/output/dataset/01a0ad79-94d3-4923-b210-d87b3cfd4f82.mov-0001.jpg

voc.data

classes= 1
train  = /home/pjreddie/data/voc/train.txt
valid  = /home/pjreddie/data/voc/2007_test.txt
names = data/voc.names
backup = backup

yolov3-logo.cfg

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
...
[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=1
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

Weights file: https://drive.google.com/open?id=1X69M6Qb_pSvoM0Dr2lN2Q04eUzVek1V4

Most of the times when darknet detects something this error would occur. If darknet fails to detect anything then this error would not occur.

Most helpful comment

Fixed!
In Yolov3, in the .cfg file, there are 3 occurrences of classes and 3 occurrences of filters (technically there are more occurrences of filters=xxx, but you just want to change those that by default =75). In yolov2 there is only one.
Changing those numbers to the correct value (filters = (num/3) * (5+classes) , where num = 9 by default) fixed the both darknet: ./src/parser.c:312: parse_yolo: Assertion `l.outputs == params.inputs' failed. and free(): invalid next size (fast)

All 8 comments

there are 3 positions pair of filters and classes (in your case filters=18 and classes=1). Are you replace all of them ?

3 total? I thought there was only one like YOLOv2, I guess I'll check it again. It's kinda weird though, since I had no problem with training

@ryanaleksander there are no any problem when training, but when you testing, it'll occur

I get the same error. On my laptop using CPU only everything seems fine. On a GPU machine I get

* Error in `./darknet': free(): invalid next size (fast): 0x0000000001f13b40 *

The command I use is
./darknet detector test -thresh 0.1 voc.data cfg/yolov3.cfg backups/yolov3-weights.backup image.jpg

In the debugger I see that it happens in ./src/network.c

#0  0x00007fdcf2e12c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fdcf2e16028 in __GI_abort () at abort.c:89
#2  0x00007fdcf2e4f2a4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fdcf2f61350 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007fdcf2e5b82e in malloc_printerr (ptr=<optimized out>, str=0x7fdcf2f614f0 "free(): invalid next size (fast)", action=1) at malloc.c:4998
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3842
#5  0x0000000000461fc9 in free_detections (dets=0x7fdca2ee73a0, n=3) at ./src/network.c:573
#6  0x000000000042238b in test_detector (datacfg=0x7fff7d11ab52 "/volume/share/public/SUN2012pascalformat/sun2012-voc.data", cfgfile=0x7fff7d11ab8c "/volume/share/public/SUN2012pascalformat/yolov3-sun10-learning-rate.cfg", 
    weightfile=0x7fff7d11abd4 "/volume/share/public/SUN2012pascalformat/weight_backups/yolov3-sun10-learning-rate.backup", filename=0x0, thresh=0.100000001, hier_thresh=0.5, outfile=0x0, fullscreen=0) at ./examples/detector.c:605
#7  0x0000000000422957 in run_detector (argc=8, argv=0x7fff7d119f48) at ./examples/detector.c:845
#8  0x0000000000426a0b in main (argc=8, argv=0x7fff7d119f48) at ./examples/darknet.c:434
(gdb) frame 5
#5  0x0000000000461fc9 in free_detections (dets=0x7fdca2ee73a0, n=3) at ./src/network.c:573
573             free(dets[i].prob);
(gdb) list
568
569     void free_detections(detection *dets, int n)
570     {
571         int i;
572         for(i = 0; i < n; ++i){
573             free(dets[i].prob);
574             if(dets[i].mask) free(dets[i].mask);
575         }
576         free(dets);
577     }

I set a watchpoint on dets[0].prob and it is modified here:
Hardware watchpoint 3: *0xfd88318

Old value = 23029904
New value = 262246768
__memcpy_sse2 () at ../sysdeps/x86_64/multiarch/../memcpy.S:206
206     ../sysdeps/x86_64/multiarch/../memcpy.S: No such file or directory.
(gdb) bt
#0  __memcpy_sse2 () at ../sysdeps/x86_64/multiarch/../memcpy.S:206
#1  0x00007ffff03075ce in __GI_qsort_r (b=0xfd88300, n=3, s=48, cmp=<optimized out>, arg=<optimized out>) at msort.c:271
#2  0x00000000004703b0 in do_nms_sort (dets=0xfd88300, total=3, classes=10, thresh=0.449999988) at ./src/box.c:77
#3  0x0000000000422313 in test_detector (datacfg=0x7fffffffeb36 "/volume/share/public/SUN2012pascalformat/sun2012-voc.data", cfgfile=0x7fffffffeb70 "/volume/share/public/SUN2012pascalformat/yolov3-sun10-learning-rate.cfg", 
    weightfile=0x7fffffffebb8 "/volume/share/public/SUN2012pascalformat/weight_backups/yolov3-sun10-learning-rate.backup", filename=0x0, thresh=0.100000001, hier_thresh=0.5, outfile=0x0, fullscreen=0) at ./examples/detector.c:603
#4  0x0000000000422957 in run_detector (argc=8, argv=0x7fffffffe828) at ./examples/detector.c:845
#5  0x0000000000426a0b in main (argc=8, argv=0x7fffffffe828) at ./examples/darknet.c:434

Unfortunately I have no idea whether do_nms_sort is supposed to change it or not.

I have found the actual source of the error: In my .cfg file, one of the three yolo layers had a different number of classes than the other two. 馃う

I find same problem in my custom trained weights.
Will I need to retrain my model?or it is same when training?
sorry for my poor English

Fixed!
In Yolov3, in the .cfg file, there are 3 occurrences of classes and 3 occurrences of filters (technically there are more occurrences of filters=xxx, but you just want to change those that by default =75). In yolov2 there is only one.
Changing those numbers to the correct value (filters = (num/3) * (5+classes) , where num = 9 by default) fixed the both darknet: ./src/parser.c:312: parse_yolo: Assertion `l.outputs == params.inputs' failed. and free(): invalid next size (fast)

I am having the same problem: *** Error in `python3': free(): invalid next size (fast): darknet...
I thought that my .cfg file was the same error as @felixendres.

I have found the actual source of the error: In my .cfg file, one of the three yolo layers had a different number of classes than the other two. 馃う

But I can't see where is the error:

Relevant parts of yolov3.fcg

[net]
# Testing
batch=1
subdivisions=1
# Training
#batch=32
#subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

...

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear


[yolo]
mask = 6,7,8
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=6
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1


[route]
layers = -4

...

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear


[yolo]
mask = 3,4,5
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=6
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1



[route]
layers = -4

...

[convolutional]
size=1
stride=1
pad=1
filters=33
activation=linear


[yolo]
mask = 0,1,2
anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
classes=6
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

Edit:
The error was on the .data file. I trained 6 classes but on .data file I forgot to change the number of classes:

classes= 7
train  = train.txt  
valid  = test.txt  
names = 
backup = backup
Was this page helpful?
0 / 5 - 0 ratings

Related issues

bujingdexin picture bujingdexin  路  3Comments

ghost picture ghost  路  4Comments

AaronYKing picture AaronYKing  路  3Comments

gpsmit picture gpsmit  路  3Comments

arianaa30 picture arianaa30  路  3Comments