No you don't need to change your training set.
You need to calculate your anchors as previously on yolo2 but multiply by 32 (and round).
Then split the anchors among the layers. If you have 9 anchors you can split them 3 ways, but decide based on size. Each anchor should have 5+number of objects filters.
I got ok results with the default anchors but you could recompute. Remember your anchor calc should be the same scale as the input size for the network.

ndg123 on 29 Mar 2018

👍7 😄3

@sonalambwani Just wait about 1000 iterations, and nan will disappear: https://github.com/AlexeyAB/darknet/issues/504#issuecomment-377290060

You can re-calculate anchors, but it is not necessary. You can calculate anchors for Yolo v3 using this fork: https://github.com/AlexeyAB/darknet
and this command if in your cfg-file width=416 and height=416:
darknet.exe detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416

This anchors you can use in your cfg-file (without multiplication by 32)

You can use the same labels as for Yolo v2

AlexeyAB on 29 Mar 2018

👍23 👀1

@AlexeyAB ,hello,but after waiting about 1000 iterations, and nan still appear:

ss199302 on 30 Mar 2018

Hi,
I am trying to do Training YOLO on VOC.

below command i ma using , ./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

But nans keep on increasing. Is it normal or some issue.
Loaded: 0.000063 seconds
Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
3296: -nan, nan avg, 0.001000 rate, 0.416401 seconds, 3296 images

satya2550 on 30 Mar 2018

I have a same issue. The error is shown in second yolo layer. Did you solve this problem?

springkim on 30 Mar 2018

same tooooo

ShiningCoding on 30 Mar 2018

@ss199302
If there are only some nan then training goes well, but if there are all nan then training goes wrong.

AlexeyAB on 30 Mar 2018

😕3

@AlexeyAB
As you suggested, I am now training with my new dataset with the default COCO anchor boxes. I am training from "Scratch", i.e., no initialization with the pretrained convolutional weights as you have done in https://github.com/AlexeyAB/darknet/issues/504#issuecomment-377290060

For me, I see nans even after 2500 iterations. The loss (after starting off really high) has dropped within a reasonable range, but there is more of a fluctuation between the loss for each mini-batch.

Have you, or anyone else here, noticed similar behavior?

sonalambwani on 30 Mar 2018

For me, I see nans even after 2500 iterations.

All lines have nan values or only some lines?
How many classes and images in your dataset? And what tool did you use for labeling?
What batch and subdivision do you use?
Do you use random=1?
Do you train using multi-GPU?

AlexeyAB on 30 Mar 2018

It's just a few lines with nans.

Used an in-house tool for labeling.

batch=16, subdivisions = 16

Not sure about random=1. Where do I check/set that??

It's a single GPU.

sonalambwani on 30 Mar 2018

@AlexeyAB
"How many classes and images in your dataset? And what tool did you use for labeling?"

5 classes, ~17k images in the training set.

sonalambwani on 30 Mar 2018

@sonalambwani Looks like normal output of training.

AlexeyAB on 30 Mar 2018

You have batch and subdivision 16. That means one image per iteration and depending on the density of objects in your images, it's possible that no object will be found in a given layer which will lead to nan. Also depends if the ground truths are similar to the anchors. If they are all very small for all very large then you may not detect them in the very large or very small layers.

So I agree with alexeyAB that this looks normal. Can you reduce the subdivisions so more images per mini batch.

ndg123 on 30 Mar 2018

👍9 🎉1

I have a same issue
default
batch=64, subdivisions = 8
I have followed this instruction
I really didn't understand if I should change anchors in yolo-obj.cfg when i have own dataset.

UgolUgol on 30 Mar 2018

@ndg123
Thank you for your suggestions. I am now testing with batch = 64 and subdiv=16. Right off the bat, I see fewer nans. There are a few, but it's looking better.

sonalambwani on 30 Mar 2018

per my training on customer dataset, If not all of them are nans, it is fine. since 3 different scales, that means in some scale, no object is detected. you may try different input image size, or divide into 2, or 4 different scales instead of 3? then the number of nans should change.

lynnw123 on 30 Mar 2018

👍1

@AlexeyAB thanks your reply,but i can't test anything

ss199302 on 2 Apr 2018

@AlexeyAB darknet.exe detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416,this command how to write in ubuntu darknet?

ss199302 on 2 Apr 2018

@ss199302 ./darknet detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416

TheMikeyR on 4 Apr 2018

I am trying to run calc_anchors in linux using what @TheMikeyR says and it returns to command line immediately and gives no output. Is it supposed to print the anchors to std out? I'm new to C. Where can I find the code this command runs?

Also, I'm training on my own data, and the bounding boxes in my training data are all the exact same size, and they are all squares. Do I still need to specifiy more than one anchor?

brieh on 5 Apr 2018

Is it possible to detect the Signature (or any handwritten area) in printed receipts using YOLO ? which would be the best cfg file for the same, and any suggestions before I start ?

AbhishekAshokDubey on 5 Apr 2018

@brieh try Alexey repo https://github.com/AlexeyAB/darknet
Here is the code https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L839

TheMikeyR on 5 Apr 2018

❤2 🎉1

@TheMikeyR Thanks. I was using the pjreddie fork.

brieh on 5 Apr 2018

@AlexeyAB hello! I use this command ./darknet detector calc_anchors data/voc.data -num_of_clusters 9 -width 416 -heigh 416 to get anchors,but it don't return anything.

ss199302 on 12 Apr 2018

@ss199302 same for me. Have u found the solution?

ntudy on 12 Apr 2018

@ss199302 @spenceryue97 did you create the labels (*.txt) files first?

sonalambwani on 12 Apr 2018

@ss199302 @spenceryue97 and you're definitely using AlexeyAB's fork?

I never ended up getting it working. I didn't want to switch to AlexeyAB's fork because we've modified our fork of pjreddie's fork. I tried copy/pasting the code that does the clustering from AlexeyAB's detector.c to the one I have and remaking, but still gave no output.

brieh on 12 Apr 2018

@sonalambwani Yes

ntudy on 13 Apr 2018

@brieh I'm using pjreddie's repo

ntudy on 13 Apr 2018

@spenceryue97 @brieh you can just get AlexeyAB's fork, run the calc_anchors and then take the numbers to your cfg in pjreddie's repo.

TheMikeyR on 13 Apr 2018

@AlexeyAB Can you tell me when i use recall ，why my IOU appear nan，and recall and precision is 0.5% ，thanks！

ss199302 on 15 Apr 2018

@UgolUgol Have you compare the result between default anchors and the anchors calculated with the command from https://github.com/AlexeyAB/darknet?

anguoyang on 24 Apr 2018

@AlexeyAB I am training my own objects, and am weirdly getting valudes for all Region 106 results and -nan for everything else:

```Region 106 Avg IOU: 0.103513, Class: 0.206469, Obj: 0.001993, No Obj: 0.000744, .5R: 0.000000, .75R: 0.000000, count: 4
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001404, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000535, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.183575, Class: 0.167765, Obj: 0.002698, No Obj: 0.000716, .5R: 0.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001364, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000517, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.112761, Class: 0.219895, Obj: 0.001320, No Obj: 0.000692, .5R: 0.000000, .75R: 0.000000, count: 4
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001196, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000538, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.518243, Class: 0.616705, Obj: 0.000801, No Obj: 0.000739, .5R: 1.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001336, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000534, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.067241, Class: 0.113757, Obj: 0.002734, No Obj: 0.000756, .5R: 0.000000, .75R: 0.000000, count: 7
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001470, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000540, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.064037, Class: 0.159617, Obj: 0.005763, No Obj: 0.000764, .5R: 0.000000, .75R: 0.000000, count: 5
Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001454, .5R: -nan, .75R: -nan, count: 0
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000550, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.092829, Class: 0.161946, Obj: 0.004937, No Obj: 0.000723, .5R: 0.000000, .75R: 0.000000, count: 8

813: 6.122332, 6.256896 avg, 0.000437 rate, 8.432507 seconds, 26016 images
```

It's the consistency that's worrying me. I've got 16 classes with around 4500 images. The one particularly odd thing about my setup is that I've set the height and width for every identified object to 0.01 (e.g. 2 0.808552 0.933797 0.01 0.01), as I only care about the position, not the bounds of the object. Hopefully that's not messing things up?

jfries289 on 3 May 2018

@jfries289

every identified object to 0.01 so you will get nan for Region 82 and 94 always, but it isn't a problem. Training goes well.

But for slightly better accuracy, even if you need only position, it's better to set the real width and height of the objects, so Yolo will know which of 3 [yolo] layers (with higher receptive filed, or with higher resolution without subdiscretization) should be used to detect this object.

AlexeyAB on 3 May 2018

@AlexeyAB Thanks for the reply. It's clear I need to improve my understanding of the regions, etc.
However, I'm still not sure my training is going successfully:

2612: 3.408263, 1.774134 avg, 0.001000 rate, 12.983424 seconds, 83584 images

My avg seems to be oscillating between 1.2 and 1.7. At this stage, I would have expected my avg to be lower. Is this the system temporarily stuck in a local minimum, or has something possibly gone wrong?

jfries289 on 3 May 2018

@jfries289

My avg seems to be oscillating between 1.2 and 1.7. At this stage, I would have expected my avg to be lower. Is this the system temporarily stuck in a local minimum, or has something possibly gone wrong?

I think this is because Yolo can't select the optimal [yolo]-layer (1 of 3), so the last [yolo]-layer predicts objects with the big error and it increases loss, also the difference between size that predicted by Yolo during training and size that you set is very large. Also may be something wrong else. I think you will able to detect objects, but with low accuracy.

I recommend you to set real sizes for object using Yolo_mark, then recalculate anchors and then start training from the begining.

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

AlexeyAB on 3 May 2018

@AlexeyAB

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

If full-sized labels are not an option, would it be better for me to use Yolo v2? Or would I have the same issue there?

jfries289 on 3 May 2018

@jfries289

What is the range of the real sizes of objects in your dataset?

AlexeyAB on 3 May 2018

@AlexeyAB
I would guess anywhere from 0.1 to 0.8.

jfries289 on 3 May 2018

@jfries289

In the Yolo v3, the labels with correct sizes of objects help to choose the optimal [yolo]-layer, i.e. help to train with higher accuracy.

If full-sized labels are not an option, would it be better for me to use Yolo v2? Or would I have the same issue there?

If real sizes of objects are big - then probably will be better to use Yolo v2.
If real sizes of objects can be big and small - then you should re-label your objects and train with Yolo v3.

But I have never tested training using such dataset as your with the constant values of width and height.

AlexeyAB on 3 May 2018

I have a dataset of 21k face images. I already checked the labelled data using yolo_mark. I am using yolov3 with batch 64 and subdivisions 16
I am getting nan every where shall I wait for 1000 iterations?
here is the output:
73: -nan, -nan avg loss, 0.000000 rate, 649.007621 seconds, 4672 images
Loaded: 0.000000 seconds
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 2
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 4
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R:0.000000, count: 3
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 3
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 6
Region 106 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 2
Region 82 Avg IOU: -nan, Class: nan, Obj: -nan, No Obj: -nan, .5R: 0.000000, .75R: 0.000000, count: 5
Region 94 Avg IOU: -nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: -nan, .5R: -nan(ind), .75R: -nan(ind), count: 0

pallpb on 14 Jun 2018

@AlexeyAB As you mentioned earlier in Only if nan occurs for avg loss for several dozen consecutive iterations, then training went wrong. Otherwise, the training goes well., can you please give some solutions on how to correct the training process if we are getting all nans? My training loss goes on increasing and after some steps, all values become -nan.

patilameya825 on 20 Jun 2018

I encounter this phenomenon on my 3 classes dataset, but after training, it goes well.
I think it is from the scale mismatch of different output layer.

guantinglin on 11 Jul 2018

When no object found in the given layer it gives nan. IOU is basically Area of intersection / Area of Union.
I think that looks normal.

pushkalkatara on 12 Jul 2018

@AlexeyAB how many images do you think I should get if I want to add a new class to the COCO dataset?

MizbaMohammed on 16 Jul 2018

Hello @MizbaMohammed ,

Are you already aware of the default recommendation? I guess this is not very specific for COCO, but in general:
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

gustavovaliati on 16 Jul 2018

I got confused when set up the anchor boxes: how should I arrange the sequence of the clustered anchor boxes? You know, they're not distributed well as we wish...... I might get [1, 1, 2, 2, 5, 6, 30, 32, 42] instead of [1,2,3,4,5,6,7,8,9], and I hesitated to just put them evenly at 3 scales in yolov3. And experiments of myself have just proved that the arrangement of anchor boxes just matters.
And the output Region 82 and Region 94, and Region 106 is another confusion: what do they mean?

Region 82 Avg IOU: 0.790874, Class: 0.993619, Obj: 0.970194, No Obj: 0.002241, .5R: 1.000000, .75R: 0.666667,  count: 3
Region 94 Avg IOU: 0.665403, Class: 0.775035, Obj: 0.567849, No Obj: 0.000524, .5R: 0.800000, .75R: 0.200000,  count: 5
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000002, .5R: -nan, .75R: -nan,  count: 0

How can it know how many objects the batch has at each layer? And if it decides each object has a layer
to belong to, then would the distribution of anchor boxes be a big problem?
Could anyone help with this? Thanks a lot.

RubyLiao on 25 Jul 2018

I have successfully trained 1-object detection YOLO-2 model, but still doesn't understand- what role anchors in cfg file plays? I has changed them, but didn't see any effect.
1) What is the meaning of anchors?
2) @AlexeyAB: Alexey, what do you mean _"real sizes of objects"_?
3) If anybody has some program choosing one best weight from the trained set of weights, based on test annotated images?
4) What is the advantage of YOLO-3 over YOLO-2 ?

andyrey on 25 Jul 2018

Hi@AlexeyAB , yolo v3-spp sounds good, is there any tutorial on how to train it? thank you

anguoyang on 2 Aug 2018

When I am trying to calculate the anchors k-means++ can't be used without OpenCV, because there is used cvKMeans2 implementation , this error is coming. How to resolve this??

R1234A on 2 Aug 2018

Hi @AlexeyAB I have some questions. One is how to set the iteration times in the cfg file. Is it the "steps"?

I want to set the weights file autosave rate fix at every 100 iterations as the weights-file is saved only once every 10 000 iterations if(iterations > 1000). Is the parameter the "burn in"?

wentianl20 on 7 Aug 2018

@wentianl20 You should change in file detector.c in function train_detector(...):
the block with save_weights(net, buff) should be under condition if(i%100==0).

andyrey on 7 Aug 2018

@wentianl20 steps is for learning rate policy, not for model caching frequency

tkcrown on 6 Sep 2018

@AlexeyAB Sir when i install the alexyab/darknet then after make in the directory build/darknet/x64 darknet.exe not occurs.please guide me or send the procedure.thanks

arsal181 on 14 Oct 2018

@arsal181

on Windows if you compile by using MSVS - build/darknet/x64/darknet.exe will occur
on Linux if you compile by using make - darknet will occur in the current directory (near with Makefile)

AlexeyAB on 14 Oct 2018

can i use this darknet in build/darknet/x64 directory?

On Sun, Oct 14, 2018, 4:14 PM Alexey notifications@github.com wrote:

@arsal181 https://github.com/arsal181

-

on Windows if you compile by using MSVS - build/darknet/x64
/darknet.exe will occur
-

on Linux if you compile by using make - darknet will occur in the
current directory (near with Makefile)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pjreddie/darknet/issues/597#issuecomment-429617819,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AqFw_bV9pqzP-wJijyHOZ9pRSS8PbGCgks5ukxyEgaJpZM4TA4ER
.

arsal181 on 14 Oct 2018

@arsal181
Just compy ./darknet to the ./build/darknet/x64/darknet

AlexeyAB on 14 Oct 2018

sir when i run this error permission denied shows.

On Sun, Oct 14, 2018, 4:45 PM Alexey notifications@github.com wrote:

@arsal181 https://github.com/arsal181
Just compy ./darknet to the ./build/darknet/x64/darknet

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pjreddie/darknet/issues/597#issuecomment-429619685,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AqFw_TQTFP8ntZhDdj5lbD6RSZfKrZFrks5ukyPwgaJpZM4TA4ER
.

arsal181 on 14 Oct 2018

I ran this command ./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

Are these many iterations normal? I saw the posts about nan.

Loaded: 0.000025 seconds
Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
14799: -nan, -nan avg, 0.001000 rate, 0.086326 seconds, 14799 images
Loaded: 0.000039 seconds
Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
14800: -nan, -nan avg, 0.001000 rate, 0.085953 seconds, 14800 images
Saving weights to darknet/backup/yolov3-voc.backup

niphadkarneha on 2 Nov 2018

I ran this command ./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74

Are these many iterations normal? I saw the posts about nan.

Loaded: 0.000025 seconds
Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 2
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
14799: -nan, -nan avg, 0.001000 rate, 0.086326 seconds, 14799 images
Loaded: 0.000039 seconds
Region 82 Avg IOU: nan, Class: nan, Obj: nan, No Obj: nan, .5R: 0.000000, .75R: 0.000000, count: 1
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: nan, .5R: -nan, .75R: -nan, count: 0
14800: -nan, -nan avg, 0.001000 rate, 0.085953 seconds, 14800 images
Saving weights to darknet/backup/yolov3-voc.backup

That looks like bad data to me. nans pop up, but they shouldn't be your only result.

jfries289 on 2 Nov 2018

@AlexeyAB Sir I have trained custom data of 1 class with 200 images for 2500 iterations.For this, I have used yolov3.cfg file.In that I have changed
width and height=416
subdivisions=8
batch=64
classes=1
filters=18
When I tried to test with an image with the weights generated, I have seen blank screen on prediction window and showing could not load image as below.
What could be the reason?

screenshot from 2018-11-11 19-11-09

ManasaNadimpalli on 11 Nov 2018

The error message says It can't open 1.JPEG. Is this file a proper jpg? What happens if you test it with the dog and bicycle picture from the sample data?

PeterQuinn925 on 12 Nov 2018

your image path is not correct ot check the extension of image like jpg or
png

On Mon, Nov 12, 2018 at 9:56 AM PeterQuinn925 notifications@github.com
wrote:

The error message says It can't open 1.JPEG. Is this file a proper jpg?
What happens if you test it with the dog and bicycle picture from the
sample data?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pjreddie/darknet/issues/597#issuecomment-437756350,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AqFw_UMMxv2KwwyiH46xqgBV6xNSMjvyks5uuP-DgaJpZM4TA4ER
.

arsal181 on 12 Nov 2018

I have placed my images (.JPEG ) and .txt files in obj folder (i.e darknet->data->obj).
And all the images are in .JPEG format only.I have my yolov3.cfg ,obj.data and obj.names in cfg (i.e darknet->cfg) folder as well as in darknet.

ManasaNadimpalli on 12 Nov 2018

The error message says It can't open 1.JPEG. Is this file a proper jpg? What happens if you test it with the dog and bicycle picture from the sample data?
For the dog and bicycle picture,I am able to get the image but without labels.

ManasaNadimpalli on 12 Nov 2018

Thank you..The error was in the image path.But I am now seeing only the labels but not the bounding boxes in the test image.

ManasaNadimpalli on 12 Nov 2018

Do you known where will cause the "nan" in the source code? I can not find out where will cause "nan" in "yolo_layer.c".

linqiaozhou on 20 Dec 2018

In the function _float *parse_fields(char *line, int n)_ in utils.c there are lines like if(p == c) field[count] = nan("");

andyrey on 20 Dec 2018

Hi!!

I have some questions:

I have my images (in jpg format) in the directory darknet/data/images and the labels (.txt) in teh directory darknet/data/labels
When I try to train with my own dataset I get the following in the first iterations:
Region 94 Avg IOU: 0.327257, Class: 0.436315, Obj: 0.025532, No Obj: 0.005994, .5R: 0.000000, .75R: 0.000000, count: 4
Region 106 Avg IOU: 0.218748, Class: 0.485177, Obj: 0.011414, No Obj: 0.003186, .5R: 0.125000, .75R: 0.000000, count: 8
264: 45.015869, 26.430340 avg, 0.000005 rate, 0.082373 seconds, 264 images
Loaded: 0.000038 seconds

But after a while I get the following:
Region 94 Avg IOU: nan, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: nan, Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 6
1139: -nan, nan avg, 0.001000 rate, 0.094441 seconds, 1139 images
Loaded: 0.000038 seconds

I try to put the label files in the same directory(darknet/data/images) that the images, but I have the same problem.

What could be the problem?

Regards!

jessiffmm on 3 Feb 2019

@jessiffmm
Did you calculate the anchor boxes and put them into 3 places of .cfg file?

andyrey on 4 Feb 2019

Hi @andyrey

I haven't calculated the anchor boxes. How Can I calculate the anchor boxes?

Regards!

jessiffmm on 4 Feb 2019

@jessiffmm
if you have build darknet, launch cmd like this:

darknet3.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 -showpause
and then you'll get "anchors.txt" file with 9 pairs of digits, replace (3 places in yolo- 3) in .cfg file.

andyrey on 4 Feb 2019

Ok @andyrey
Thanks!!

I imagine that width and height depend of the images size.

jessiffmm on 4 Feb 2019

@jessiffmm
All images are brought to same size (416x416) or something 32-fold.
All anchors dependent on individual dataset.

andyrey on 4 Feb 2019

Hi!
I tried to get pre-trained weights 'yolov3-tiny.conv.15' using command: 'darknet.exe partial cfg/yolov3-tiny.cfg yolov3-tiny.weights yolov3-tiny.conv.15 15', but every time I get this error:

Can you help me, please.

petrohusak on 13 Feb 2019

@Kertrop Try to update your code from GitHub.

AlexeyAB on 13 Feb 2019

@Kertrop Try to update your code from GitHub.

Thank you very much, it helped.

petrohusak on 13 Feb 2019

Sorry if it's inappropriate to ask here. I've been trying to train yolo3-tiny with my own dataset of 1 class and ~1000 positive samples. After 50 hours of training it tells:

Resizing
512
Loaded: 0.044990 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000001, .5R: -nan, .75R: -nan, count: 0
Region 23 Avg IOU: 0.737925, Class: 0.998520, Obj: 0.991458, No Obj: 0.000621, .5R: 1.000000, .75R: 0.000000, count: 1
54261: 0.292673, 0.218497 avg, 0.001000 rate, 4.347198 seconds, 54261 images
Loaded: 0.000091 seconds

I'm running it on i5 cpu in docker on windows host :) Is it possible to tell that training will complete on that system in any near future (this week). Or I should stop that misery and run on a "normal" system?

dubtar on 20 Feb 2019

@dubtar

54261: 0.292673, 0.218497 avg, 0.001000 rate, 4.347198 seconds, 54261 images

Do you train with using yolov3-tiny.weights pretrained weights-file, or how did you train 54K iterations on CPU? )

There is enough low loss. So you can try to stop training, check wether it can detect anything, and if can't, then you should run it on a normal system with nVidia GPU )

AlexeyAB on 20 Feb 2019

@AlexeyAB Thanks, man! I was following article on Medium.com, so used darknet53.conv.74.

dubtar on 20 Feb 2019

@AlexeyAB
hi,i have some question.
What reason is this?

Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: 0.870595, Class: 0.994771, Obj: 0.700115, No Obj: 0.006700, .5R: 1.000000, .75R: 1.000000, count: 5
Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: 0.678149, Class: 0.996602, Obj: 0.088626, No Obj: 0.002252, .5R: 1.000000, .75R: 0.500000, count: 2
Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: 0.884214, Class: 0.999505, Obj: 0.904762, No Obj: 0.009010, .5R: 1.000000, .75R: 1.000000, count: 7
Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: 0.784557, Class: 0.996936, Obj: 0.690478, No Obj: 0.002225, .5R: 1.000000, .75R: 0.500000, count: 2
Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 82 Avg IOU: 0.818402, Class: 0.993600, Obj: 0.891443, No Obj: 0.003130, .5R: 1.000000, .75R: 0.750000, count: 4
Region 94 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0
Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0

yxwLearn on 25 Feb 2019

@AlexeyAB
Hello, I'm trying to crawl baidu food pictures in the atlas of the training, but found that iou is not high, class objective parameters, bouncing obj parameters is very big, loss has been maintained at 0.3-0.5, this is what causes? Is the picture background is too complicated or I caused by the number of each type is not the same as picture?

yxwLearn on 25 Feb 2019

@yxwLearn Hi,

hi,i have some question.
What reason is this?

Region 106 Avg IOU: -nan(ind), Class: -nan(ind), Obj: -nan(ind), No Obj: 0.000000, .5R: -nan(ind), .75R: -nan(ind), count: 0

This is normal.
Look only at avg loss: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Note: If during training you see nan values for avg (loss) field - then training goes wrong, but if nan is in some other lines - then training goes well.

and look at mAP if you train with flag -map https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Hello, I'm trying to crawl baidu food pictures in the atlas of the training, but found that iou is not high, class objective parameters, bouncing obj parameters is very big, loss has been maintained at 0.3-0.5, this is what causes?

Can you rename you cfg file to txt file and attach it?
How many classes and images do you have?
Show point cloud that you can get by using command
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 -show
using this repo: https://github.com/AlexeyAB/darknet

Is the picture background is too complicated or I caused by the number of each type is not the same as picture?

What do you mean?

AlexeyAB on 25 Feb 2019

@AlexeyAB

sorry，something went wrong when I upload my cfg file,so I paste it.
and i have 11 classes and 5853 images,but number of pictures of each classes is different, some more or less.
I run this training on Windows .
I mean my picture quality is not very good, this is the cause of loss at 0.3 to 0.5?

`[net]
Testing

batch=1

subdivisions=1

Training

batch=64
subdivisions=32
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky