Darknet: Results of the Lastest commit (4892071) and latest cfg files of the YOLO v2

Created on 13 Aug 2017  路  79Comments  路  Source: AlexeyAB/darknet

Hello,

The Yolo authors has changed the YOLO-VOC CFG files in their original website. I mean here:

2017-08-13_10-51-46

Training with this new CFG files (with either thresh = 0.2 or 0.001) does not handle the results as when I had trained it using older repo and older YOLO-VOC CFG file. I can say results are way worse than before.

Besides, training with YOLO-VOC-2.0 handles slightly better results but still it can not compete with older results (with either thresh = 0.2 or 0.001).

I have used the latest commit of the repo here (4892071). What is the problem? the new CFG files or repo codes?

Most helpful comment

  1. In general IoU value is most important than Recall
  2. Also Yolo calculates IoU more correctly than Recall: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

    • Yolo calculates the best IoU (with low threshold) instead of average IoU - so it's not quite correct
    • Yolo calculates True-positives instead of Recall - so this is not correct at all
  • IOU - the bigger, the better (says about accuracy) - better to use
  • Recall - the bigger, the better (says about accuracy) - actually Yolo calculates true positives, so it shouldn't be used
  1. The best IoU - this is an important metric for improving the quality of some parts of the neural network, but not the best metric for the whole detection algorithm, the better to use average IoU

All 79 comments

Training with this new CFG files (with either thresh = 0.2 or 0.001) does not handle the results as when I had trained it using older repo and older YOLO-VOC CFG file. I can say results are way worse than before.

I must say training results with this CFG file was zero for IoU and Recall and no detection. it just worked with YOLO-VOC-2.0.CFG, but the calculated results was still not close to the older commits. Either with thresh = 0.001 or thresh = 0.2, I could not get better than IoU = 71% on my own Dataset which is far below IoU = 75%, on the older commits and older CFG file.

@VanitarNordic The main diff in both cfg-file and the ways of training in original Linux repo.
This repo doesn't support to train yolo-voc.cfg, for example desn't param burn_in=1000.
Also, for comparison, you should use 100% identical cfg files, which repeatedly changed a little bit.
And you have to compare the old commit and the new commit right now, but do not have to rely on your memory. Even little change of params: random, width, height, subdivision, batch - can has effect.

All parameters are identical, such as batch random, resolution or others. I have both commits on my hard drive and I test based on validation IoU and Recall results.

if nothing has changed in the codes except than the threshold, then training with yolo-voc-2.0.cfg and threshold = 0.001 should NOT handle worse validation IoU and Recall results. if the intention is enhance the model not to make it worse by using a new CFGs.

So what remains suspicious here? maybe the code.

@VanitarNordic

Also earlier the valid dataset was taken from data/voc.2007.test but now it will be taken from valid= in data-file, and if it is absant then will be taken from data/train.txt: https://github.com/AlexeyAB/darknet/commit/97ed11ca1503953199495e9b4386974ceba44687#diff-d77fa1db75cc45114696de9b1c005b26L371

So you can just copy function validate_detector_recall() from old commit to the new commit to be sure that these validation is identical. And call here as validate_detector_recall(cfg, weights); https://github.com/AlexeyAB/darknet/blame/master/src/detector.c#L552

Then show such table with the same threshold = 0.001 and with the same cfg-file:

|---|IoU|IoU|
|---|---|---|
|---|Trained on old commit|Trained on new commit|
|Tested on old commit|---|---|
|Tested on new commit|---|---|

I replaced the function validate_detector_recall(). it compiles successfully but it crashes when I try to validate the model. I have copied the voc.2007.test also in place.

@AlexeyAB

You did not reply about the crashing problem, so in the meantime I trained and tested the old commit using the yolo-voc-2.0.cfg. The result is a new record and very good, 76% IoU and good detection.

I think you have made some modifications in the code, but have not tested the result if these modifications will be good.

2017-08-13_23-17-19

I replaced the function validate_detector_recall(). it compiles successfully but it crashes when I try to validate the model. I have copied the voc.2007.test also in place.

I didn't know why it crushes in your case, it should works with the same IoU.
Only after you get such a table, I can say something:

|---|IoU|IoU|
|---|---|---|
|---|Trained on old commit|Trained on new commit|
|Tested on old commit|---|---|
|Tested on new commit|---|---|

CFG file in all training and testing experiments were yolo-voc-2.0.cfg

Trained and tested on the old commit: 76.20%
Trained and tested on the last commit: 75.60% (Identical recall function as old commit - crash was because of the remained extra argument in the function call in few lines after, which must be modified)

Recall of the last commit = Recall of the old commit = 90.38%

Tested on the last commit, but used weight was from the old commit = 76.22%
Tested on the old commit, but used weight was from last commit = 75.34%

Therefore we can say the weights of the old commit has trained slightly better (around 1%). Recall function has no issue.

Now you have your table.

So now we can say, that trained slightly better on old commit:

|---|IoU|IoU|
|---|---|---|
|---|Trained on old commit|Trained on new commit|
|Tested on old commit|76.20%|75.34%|
|Tested on new commit|76.22%|75.60%|

Do you use the same CUDA in both, and did you compile with cuDNN both new/old commits?


What files was changed are: data.c and blas_kernels.cu.

So you can get these files from old commit by this url and put these files into new commit - then train and test again:

Replacing these files one by one and train&test again, you can find out which of them has an effect on the IoU.


I'll test this and tell you the results, but before that I wanted to inform you something else.

During training, sometimes randomly I face this -nan issue, both in old and new commits.

2017-08-13_21-22-04

Results:

  • Replacing data.c from older commit: IoU=74.78% Recall=88.46%
  • Replacing blas_kernels.cu from older commit (data.c had also replaced already in the previous step): IoU=75.87% Recall=90.38%

Do you use the same CUDA in both, and did you compile with cuDNN both new/old commits?

Yes, I use the same CUDA and both compiled using CuDNN. Actually CuDNN boosts at least 50% in training speed. But I saw no difference between 5.1 and 6.

Now I keep the data.c unchanged in the last commit and train again just by replacing the blas_kernels.cu. This would be the final test.

I did the final test: IoU= 75.24% Recall=90.38%

I got confused. if you have more suggestions, please welcome.

So, no ideas. Theoretically, a 1 percent difference can be a random fluctuation.

You know modern models fight for the 1st place for lower than 1 percent in difference.

YOLO has many unknown sides such as accuracy calculation, mAP and anchors calculation which just we have to test by trial and fail method.

What do you think of SSD-300 (7++12+COCO)? is it better than YOLO?


_Also, I want to clarify that these discussions are to learn something and make the model better, not a personal war. At least from my side. I learned many many things of you and I never forget. I hope you don't get these discussions personal. these are scientific debates and are normal between scientists_

SSD-300 (7++12+COCO) is faster and more accurate than Yolo v2 (7++12+COCO).
So probably if you have large dataset and small objects, then SSD-300 more preferably to use.
Did you try to train SSD-300/512 successfully?

Why do I use Yolo v2:

  • it has small dependecies, and can be implemented on FPGA (then it can be implemented in ASIC)
  • source code can be easily changed/added using C/CUDA C++
  • code can be explained to people who know only C, and don't know C++ and Python as in Caffe+SSD
  • also there is Yolo 9000

I perceive discussions normally only if several conditions are met:

  • If I have time for them
  • No knowingly false assertions
  • My questions are also answered in full
  • The discussion does not take the form of a polemic

I trained again on the old repo and I could met the IoU=76.25% like before.

Therefore we can assume that there is a minor issue somewhere. Actually I did training on the old commit 3 times and the results of all were equal. may I ask you to re-think what can cause this?

Regarding SSD, Yes I had fine-tuned it some times ago and I showed good mAP (it was calculating), but it was not easy to work as YOLO. besides its original repo is the best one and written just for Linux and Caffe. There is a windows distribution of the Caffe, but SSD author has modified the original Caffe. I'm not quite sure if there is another good implementation of SSD in Keras, Tensorflow or whatever.

And actually I go to fine-tune a model rather than scratch training, because somebody with many GPUs has trained it before and we can use those weights. that's why I believe we are not fine-tuning the YOLO. We train from scratch with an initial classification weights.

But you made a very very good repo of the YOLO actually, That's the reason I try and train day and night to make it better. I don't want to show something for my coursework like many others. That's why I comment much more than other people :-). Your knowledge is invited and correct and you are very professional in C/C++.

beside YOLO is very memory efficient. SSD-300 was easily casing overflow on my 6G GPU and I had to reduce the images' size too much.

Also I have not tested YOLO-9000. maybe it is more accurate than YOLO v2.

really I got tired of training and testing the last commit and you don't want to a little bit consider that might be true, in contrast you think all comments are false assertion!. for what benefit I don't know.

The last commit does not train as good this: https://github.com/AlexeyAB/darknet/tree/a71bdd7a83e33f28d91b88551b291627728ee3e7

False assertions only:

  • that thresh in recall is not the same as thresh in the test
  • that we should calculate relative coords by dividing by max(w,h)

Based on your training tests - a slight decrease IoU (~1%) due to some changes in new commits - this problem is present, I agree.
Do you know how to fix this problem?

No man, I have not changed the coordinates :-). That was a different story. threshold is also set as the old ones. 0.001.

The thing is also when I test the trained model, I can see the difference in detection, not huge, but you can see the effect of that percentages.

sometimes this difference goes higher, maybe 2 or even 3%, but it is not stable. but in the commit which I linked you, if you train 10 times with the same settings, all results are equal with not even a penny change.

I tried many many things. actually I work on it day and night. but I have no idea really. The thing is it does not handle a unique result each time but the old commit does. therefore I suspect the issue is from somewhere which affects the training.

Also, I must give you a big thumbs up and all of my credits, because today I trained it also on the original Darknet repo and Linux, but the result was not good as yours. it means your repo is better in all aspects. Therefore it is shame for this very small issue and I'm trying to find a solution.

Thank you. Can you show when cfg based on yolo-voc.2.0.cfg:

  • what IoU when trained on original Linux and tested recall on original Linux?
  • what IoU when trained on original Linux and tested recall on this Windows (if set thresh = 0.001)?

I started training on Ubuntu, but it seems there is a problem in training and random functions, you know the count value, see the picture. Do you want that I continue? I used the latest commit. (d3577a5)

screenshot from 2017-08-18 02-48-49

Yes, I know, I will comment this srand() again later.

I mean what IoU if you train used original Linux repo: https://github.com/pjreddie/darknet
And then test result weights

And yes, you said about original Linux repo. above comment is your repo.
The original linux repo I could not see the recall because it was recording the results as a report and IoU was not calculated there, but from the detections I thought there is a significant difference in IoU between Linux and your repo. I mean Linux was worse.

And yes, I brought the trained weight from the original Linux to windows to test it, but it showed 0 for both IoU and Recall.

and both in Linux and Windows I had yolo-voc.2.0.cfg in place to train and test.

The original linux repo I could not see the recall because it was recording the results as a report and IoU was not calculated there

I.e. original Linux repo doesn't show IoU when called ./darknet detector recall ...?

I did fixes 5 minutes ago: https://github.com/AlexeyAB/darknet/commit/4d2fefd75a57dfd6e60680eaf7408c82e15a025d
So you can try to train on Linux using this repo with last commit: https://github.com/AlexeyAB/darknet/

Also do you use valid-dataset the same as training-dataset, or valid-txt differ than train-txt files in the obj.data? https://github.com/AlexeyAB/darknet/blob/master/cfg/voc.data#L3

Also do you use valid-dataset the same as training-dataset, or valid-txt differ than train-txt files in the obj.data? https://github.com/AlexeyAB/darknet/blob/master/cfg/voc.data#L3

of course validation dataset is different from training. I think that's the rule to test a typical model. Besides the train.txt and the valid.txt were identical for all experiments. Also, the voc.2007.test was identical to valid.txt .

I.e. original Linux repo doesn't show IoU when called ./darknet detector recall ...

I think I used the valid parameter. I have the repo. I'll make another test with recallparameter.

l will train your Linux repo tomorrow and tell you the results.

Excuse me, the above result was on the Linux repo with on thresh = 0.2, I just forgot to change it like windows before I make it. Your Linux repo can achieve even significantly higher IoU.

Your Linux Repo outperforms the Original Darknet Linux Repo by 2.33%. Excellent.

Therefore I summarize your latest Linux repo (4d2fefd) as this:

  • Threshold = 0.2 : IoU = 72.38% , Recall = 84.62%
  • Threshold = 0.001: IoU = 76.33% , Recall = 88.46%

Therefore we can suspect the issue in the Windows side

Also to have all results in one place:

  • Original Darknet Linux Repo: IoU = 74% , Recall = 88.4%

This is the result of the old windows repo (https://github.com/AlexeyAB/darknet/tree/a71bdd7a83e33f28d91b88551b291627728ee3e7):

  • IoU = 76.22% , Recall = 90.33%

Therefore we can say the result of this old windows commit, and last Linux commit are almost identical, except that this old windows Repo is 2% better in terms of the Recall.

I think now you have the clue.

Yes, there is a difference.

Also to have all results in one place:

  • Original Darknet Linux Repo: IoU = 74% , Recall = 88.4%

Also can you test on Windws the weights already trained on Original Darknet Linux Repo, will it be the same IoU = 74% , Recall = 88.4%?

let me gather all in one place:

Trained weights on the Original Darknet repo, tested on Alexey Windows Repo (thresh=0.001):

Old Windows repo: IoU = 0% , Recall = 0%
Latest Windows Repo: IoU = 74% , Recall = 88.44%

Therefore:

The results of the latest windows repo (weights from Original Darknet) = Results of the original Darknet repo

You are right

Do you have any clue?

No, I havn't.

tracing from this commit https://github.com/AlexeyAB/darknet/tree/a71bdd7a83e33f28d91b88551b291627728ee3e7 till the latest commit might give you a clue.

Hi Alex, I have a good news for you.

I recompiled the old commit using OpenCV 2.4.13 (which was 2.4.9 before) and CUDA 8.0.61 + Patch-2 (which was CUDA 8.0.61 before) and CuDNN 6 (Which was CuDNN 5.1 before)

You know what? The same results as the latest commit!!!!!!!! (I have not tested 4d2fefd).

Now I suspect one of these peace of shit CUDA or CuDNN. if OpenCV also influences the training or whatever, then that can cause this effect.

The interesting is that I had used the same setting in Linux for your repo, I mean (CUDA 8.0.61 + patch-2, CuDNN 6), but good results.

@VanitarNordic Hi, I think that this is due to CUDA and cuDNN.


OpenCV loads images, so it can have effect on image quality: https://github.com/AlexeyAB/darknet/blob/4d2fefd75a57dfd6e60680eaf7408c82e15a025d/src/image.c#L599
You can simply change this line image out = load_image_cv(filename, c); to this image out = load_image_stb(filename, c); here to disable opencv effects - But I think that the OpenCV is not to blame: https://github.com/AlexeyAB/darknet/blob/4d2fefd75a57dfd6e60680eaf7408c82e15a025d/src/image.c#L1312

Should I change both Line 599 and 1312?

You have described about the line 1312.

You should only change line 1312

I tested the OpenCV. that not cause the issue. next step is to test CuDNN.

is there anywhere in the code is related to the cuBLAS?

Actually I tested it with CuDNN5.1 and the results was the same. therefore the only thing remained is Cuda patch. I should remove the Cuda and install it again without applying the Patch-2 and see if it generates the issue.

Alright, I tested the code but it did not solve the case. if you know some other alternative issues that could affect the Visual Studio, please just let me know to test. It happened after I recompiled it.

nVidia Display Driver could affect? (it is not the same as it comes with Cuda)

No thoughts on this matter.

Alright. The story finished.

I downloaded OpenCv 2.4.9 and recompiled and trained with it. The same results appeared. IoU = 76.18% and Recall = 90.38%

I did not amaze actually because I had faced these OpenCV issues before from version to version. One function was showing different behaviors just by changing the versions. Besides it shows OpenCV has more impact than we had imagined.

Besides 2% more in Recall in your Windows repo is possibly because the Rand_s() function creates better quality random numbers than rand() which is true.

Now I'll update the CUDA and CuDNN to their last versions and retest.

I have a question. The IoU value is most important or Recall?
For example if 1% increase in IoU causes 2% decrease in Recall, do you prefer?

  1. In general IoU value is most important than Recall
  2. Also Yolo calculates IoU more correctly than Recall: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

    • Yolo calculates the best IoU (with low threshold) instead of average IoU - so it's not quite correct
    • Yolo calculates True-positives instead of Recall - so this is not correct at all
  • IOU - the bigger, the better (says about accuracy) - better to use
  • Recall - the bigger, the better (says about accuracy) - actually Yolo calculates true positives, so it shouldn't be used
  1. The best IoU - this is an important metric for improving the quality of some parts of the neural network, but not the best metric for the whole detection algorithm, the better to use average IoU

H,
is it possible to change the training snapshots?
I mean after passing the 1000 iterations, it just saves snapshots on each 1000th iterations. I want to change this order. for example on each 100th as it saves from 1th to 1000th.

Hello Alex,

I could fix the problem on the old commit with this sequence:
Compiling the repo with OpenCV 2.4.9 and Cuda 8.0.61 (without applying Patch-2) and using CuDNN-5.1, also the display drivers must be installed with CUDA package and it should not be installed separately by nVIDIA drivers. I could reach to a new record of IoU=77.46% , Recall=94.23%

I could replicate these results on the old commit but I should restart the system before the training! I don't why but if I don't do so, the same results will not be achieved. I should mention the same results may not be necessarily come at the same iteration (for example 2000) but if we train longer, finally all will be converged to almost the same number.

But still I can not get these results with the latest commit. Do you think this is related to memory releasing issues?

I must say I have already set the threshold=0.001 on the latest commit and tested the Recall function with the pre-trained weights (from the old commit) and I can see the exact same numbers, so there is issue with this function.

I see you have modified some parts related to memory after the old commit. Could those modifications generate such effects?
I mean if memory releases at the wrong time or not released on time. something like that.

@VanitarNordic May be.

Just let me I do my final test today, if it does not help, I'll tell you to have a look at those memory related issues.

Hello,

I tested again the latest commit but unfortunately it shows no progress. if you have any suggestion based on these evidences just tell me, I'll test because the results are important for me.


_I should also mention that in the meantime I trained and tested the SSD on this dataset, but because the SSD ignores the background-only images, I could not compare these two models.
The author of the SSD has already mentioned where to change (which function) to include these background-only images in training, but because it is in C++, nobody has done it so far. if you like, you can make a new repo based on this idea. I think it would be a peace of cake for you._

Hi, thank you, when I'll have a time, I'll see in source code of SSD.

Do you have any suggestion for the latest commit of YOLO?

No ideas.

You can make a test yourself if you have any doubt.

Just help me which commits are related to DLL making part and which ones are related to the main code, because some memory leaks were related to the DLL section.

I have not any doub that some changes in the last commits affect the IoU.

  • But I do not see how I can easily fix it
  • And because best-IoU is not the best indicator, then we should use more useful indicator mAP on a more diverse dataset (Pascal VOC 2007 + Pascal VOC + 2012 + MS COCO) to say that there really is a significant change in accuracy. But even knowing this, I still did not see in the commits what could influence accuracy.

By these links in the left part there are commits that related to the DLL:

DLL-code used only when yolo_cpp_dll.sln compiled. But if compiled darknet.sln then nothing used related to DLL source code.

Thank you very much.

may I ask you how did you notice about memory leaks in the main code?

Besides, I see several modifications inside image.c and network.c and layer.c

and one more thing. is there any C function which might work good in Linux, but not the same in windows? I mean something like rand() issue. it was in-front of eye which we detected it, but maybe there is or there are some other functions either which might show such kind of behaviors.

  • There were fopen(filename, "w"); and fopen(filename, "r"); which were changed to fopen(filename, "wb"); and fopen(filename, "rb"); to work on Windows.
  • Also there are still some rand() and srand(time(0)) functions in the darknet.

Thank you.

I have played with those rand() functions before, but it did not help. I mean I have changed them all to ranadom_gen() except this function float rand_uniform(float min, float max) which was not possible to change the rand() to random_gen(). So as a result there is no remained conflict between windows and Linux.

Now I'm looking inside warnings. there are more than 1600 warnings which might give a clue.

Alright, I list you interesting warnings:

Function: dim3 cuda_gridsize(size_t n); inside cuda.h
Warning:
2017-08-28_0-59-46

This code: l.update = update_deconvolutional_layer; inside deconvolutional_layer.c
Warning:
2017-08-28_1-02-50

Alright, I could solve the case. At least now it is like the Linux repo, with 2% more in Recall. Again I should congratulate to you to make the complex original Darknet even better.

First, I did the same steps as I did for the old commit
Then I changed all rand() with random_gen(), except float rand_uniform(float min, float max). (I had done this before for the old commit also, in General it helps the model to converge faster and handles better results)
I also copied the opencv_imgproc249.dll near darknet.exe.

I also suppressed some warnings. majority of them are just type conversion warnings and easy to suppress. it may put some effects also but I'm not quite sure. But anyway suppressing them does not have any side effect and it is good to do.

Regarding the SSD, according to the author this is the part of the code must be modified to include background-only images within training, otherwise it just ignores them.

https://github.com/weiliu89/caffe/blob/ssd/src/caffe/util/bbox_util.cpp#L850

issue talks about this: https://github.com/weiliu89/caffe/issues/146

When you had time just introduce your repository. I don't think it takes long time for you because as I see you write complex C/C++ codes easily like breathing.

I have one more comment regarding this minor issue:

29259939-a7ef83fe-80db-11e7-92d9-1f4360319be5

I think it is reproduced randomly when the random_gen()%m generates 0. because typically the range would be between 0 to m. here count=0 has no meaning and at least it should be equal to 1.

Are you agree that it should be modified like this to be in range from 1 to m?
(random_gen()%m) + 1

I think apart from this we also don't need zero in random generation nowhere in the code. This issue exist also in original Darknet repo which has used the same concept with rand()

I think cubin files should be updated regarding the newer GPU architehctures and CUDA also. NVIDIA told me something like this:

From CUDA 8.0 Toolkit, nvcc can generate cubin files native to the Pascal architectures (compute capability 6.0 and 6.1). When using CUDA Toolkit 8.0, to ensure that nvcc will generate cubin files for all recent GPU architectures as well as a PTX version for forward compatibility with future GPU architectures, specify the appropriate -gencode= parameters on the nvcc command line as shown in the examples below.

nvcc.exe -ccbin "C:\vs2010\VC\bin" -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT" -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_61,code=compute_61 --compile -o "Release\mykernel.cu.obj" "mykernel.cu"

maybe source files are not compatible with latest version of CUDA and Pascal GPUs.

Hello,

One question, why you don't use the random_gen() everywhere instead of the rand()? is there any reason behind it?

Hi,
To make a replacement everywhere, you need to test all the functionality of darknet (classifier, rnn, go, cifar, ...) - I have not done it yet.

Okay, so you mean it is not easy as just replacing the functions. in the other words, these two are not totally identical, yes?

At least rand_s from random_gen much more slower than rand. So when I replaced many rand to random_gen, then darknet initialize detector much more longer.
The maximum possible value also differs.

Yes, you are right. good to know that. Thank you.

The current used instances of random_gen() in the commit is okay with you?

Yes.

Was this page helpful?
0 / 5 - 0 ratings