
Hi all,
I am trying to use yolov3-tiny_3l.cfg for my custom dataset with 2 classes.
I changed my .cfg for classes, filters and also obj.data files. I generated anchors for my custom dataset and put it into .cfg file.
I can see the loss going down, but the mAP fluctuates very much.(see graph with mAP)
How can I solve this problem?Any suggestions..?
THanks
@aditbhrgv Hi,
@AlexeyAB Thanks for your reply !
Attached is .cfg file !
[net]
batch=64
subdivisions=32
width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.0005
burn_in=2000
max_batches = 35000
policy=steps
steps=360000,380000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 6,7,8
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
classes=2
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 3,4,5
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
classes=2
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -3
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 6
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 0,1,2
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
classes=2
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
The training and validation datasets are separate. There are no intersections between them
Did you divide it uniform randomly or not?
Did you check your dataset by using Yolo_mark?
Can you show cloud.png image after this command?
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 -show
No
I checked using Yolo mark, it shows correct BB on images.
Attached is the cloud.png

@aditbhrgv
Try to train by using these mask and filters from the begining
filters=7
[yolo]
mask = 8
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
.....
filters=14
[yolo]
mask = 6,7
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
...
filters=42
[yolo]
mask = 0,1,2,3,4,5
anchors = 8, 10, 11, 12, 14, 11, 18, 14, 25, 15, 36, 18, 49, 23, 71, 25, 93, 42
THanks ! I'll try that..
Can you please tell me the reasoning behind doing this ?
WOuld be really helpful!
Thanks
After training - show your Loss & mAP chart
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)*
before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.
@AlexeyAB Can you please let me know the possible reasons for this fluctuating mAP?
I have currently set random=0 in .cfg file and started training. This lead to less fluctuating behavior(than previous attached graph).
I have started training with changed anchors you decribed before and share the results once its done.
Also, could you please give me the a bit more interpretation of cloud.png ?
And , I tried to train same dataset on Pytorch implemetation and my mAP got converged after 23 epochs. My inital LR was 0.01 and decreased by 10 after 20,50,100 epochs.
Can I set the same LR schedule iin .cfg file ?
Thanks
Can you please let me know the possible reasons for this fluctuating mAP?
There can be many reasons.
And , I tried to train same dataset on Pytorch implemetation and my mAP got converged after 23 epochs. My inital LR was 0.01 and decreased by 10 after 20,50,100 epochs.
Can I set the same LR schedule iin .cfg file ?
If you have 5400 training images and set batch=64, then epoch = 5400/64 = 84 iterations
So
20 epochs = 1680 iterations
50 epochs = 4200 iterations
100 epochs = 8400 iterations
Set
steps=1680, 4200, 8400
scales=0.1, 0.1, 0.1
instead of
https://github.com/AlexeyAB/darknet/blob/099b71d1de6b992ce8f9d7ff585c84efd0d4bf94/cfg/yolov3.cfg#L22-L23
and learning_rate=0.01 instead of https://github.com/AlexeyAB/darknet/blob/099b71d1de6b992ce8f9d7ff585c84efd0d4bf94/cfg/yolov3.cfg#L18
Hi @AlexeyAB ,
I got the below result after following the above LR schedule.
learning_rate=0.01
steps=1680, 4200, 8400
scales=0.1, 0.1, 0.1

But , I trained this without random option in .cfg file. I can try to train with random option in .cfg file again and obtain the results again.
Looking at the mAP graph, I think I reduced the LR too quickly as it converged to 75% mAP finallly which could be better around 82% (as seen from the graph.)
I will try to set "scales=0.05, 0.05, 0.05" in .cfg file again and see the results. Do you have any other suggestions?
Also, can I generate a video of the predictions on the validation set using my trained model ? I can use "./build/darknet detector test" option to see the visualizations but it gives one image at a time. I want to give whole validation set and save the output.
Also, can I generate a video of the predictions on the validation set using my trained model ? I can use "./build/darknet detector test" option to see the visualizations but it gives one image at a time. I want to give whole validation set and save the output.
Are your validation images - frames from video?
Just run detection on this video.
Also you can downlod http://mplayerwin.sourceforge.net/downloads.html and run this command in the folder where are only Validation images
mencoder mf://*.jpg -mf w=1280:h=720:fps=15:type=jpg -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=4000:mbd=2:trell -oac copy -o conveyor_valid.avi
so videofile conveyor_valid.avi will be generated
Then run:
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
Also you can try
./darknet detector test data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights < data/conveyor_valid.txt
Are your validation images - frames from video?
No, they are .jpg files located in a folder.
Also you can downlod http://mplayerwin.sourceforge.net/downloads.html and run this command in the folder where are only Validation images
Is there same tool for Ubuntu ?
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
@AlexeyAB I used this command to draw the BB on the .avi but I see a bit of offset on the detected objects. What can be a problem?
May be wrong annotations, check dataset by using https://github.com/AlexeyAB/Yolo_mark
annotations
I tested on single image and the BB is perfectly overlaid on the image using "/darknet detector test" command.
It seems it's only a problem when I give input .avi video. I see the offsets for the objects when they are relatively closer and not when they are at a some distance away.
Maybe, I can try ./darknet detector test data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights < data/conveyor_valid.txt
instead of
./darknet detector demo data/conveyor.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_last.weights conveyor_valid.avi -out_filename out_conveyor_valid.avi
@AlexeyAB How can I reduce the fps of the output video generated ? It's too fast as of now.
Change 1st line and comment 2nd: https://github.com/AlexeyAB/darknet/blob/099b71d1de6b992ce8f9d7ff585c84efd0d4bf94/src/demo.c#L186-L187

@AlexeyAB Now, I get the new mAP which converged around 81%. Precision = 84%, REcall = 71% F1 = 77%. . However, these results I got without using "random" flag. I think results can be better with multi-scale option.
Yes, try to train with random=1

@AlexeyAB I tried with random=1 option, but mAP, precision, recall and F1 reduced instead of increasing.
Could you please suggest something?
Thanks

@AlexeyAB I have a new dataset for which cloud.png is shown. How can I set mask for the anchors according to this distribution? Is there any link where I can better understand cloud.png interpretation?
Hi @aditbhrgv - I found this explanation helpful for determining custom anchors.
HI @DarylWM
Thank you !
Can you please explain the significance of cloud.png.?
I can see the anchors and the training data points distributed along them. Is my understanding correct ?
If yes, how does the training samples lying outside these anchors will be detected ?
Thanks again !
Hi @aditbhrgv - I found this explanation helpful for determining custom anchors.
Most helpful comment
Hi @aditbhrgv - I found this explanation helpful for determining custom anchors.