Yolov3: mAP and detection not working

Created on 8 Apr 2019  路  40Comments  路  Source: ultralytics/yolov3

I think there is something wrong with the scaling of the bounding boxes. My mAP is always at zero even if my training is going well. Also, when I use detect.py, the bounding boxes are at the right places, but are really small.

I didn't touch anything in util.py and my .txt files for the images are right.

bug

All 40 comments

Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.

https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data
python3 train.py --data data/coco_10img.data

You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance.
from utils import utils; utils.plot_results()
results

results

Guess I am overfitting, labels are fine when I open them with the open source project labelimg

I had 21 classes, 450 images and ~50 objects per image

Before you do any training, an obvious first step is to run a tutorial and make sure your results match.

If you are overfitting your mAP on the train group should be great, right? Have you checked that at least?

In any case, 99% of the times people can't get results they didn't format their data correctly or they've modified the default repository.

Been trying to run it, but I can't get all the files right in windows, can't run the .sh file to set everything up.

I double checked my annotation data and everything is just like the yolo annotations. I tought my problem was the learning rate or the augmentation and did try several things to make it work tonight without any luck.

Would you mind trying my data to see if you can get something out of it ? Would be appreciated and maybe you'll be able to add guidance to the custom tutorial.

After searching the official Darknet repo, I think this may have something to do with the anchors.. I probably need to change them for my custom data

If your target sizes are different enough than the default anchors then yes you will want to vary the anchor dimensions. We used kmeans to do this with the xView data:
https://github.com/ultralytics/xview-yolov3

You can run under linux using your GCP quickstart:
https://github.com/ultralytics/yolov3/wiki/GCP-Quickstart

Also, to make sure your targets are in the right format, you can plot the training data by using the plotting script in train.py.

About your anchors, I'd be very surprised if the smallest or largest anchors weren't covering part of your training data. They span from 10 to 370 pixels wide in a 416 size image. Changing anchors is done to improve results, not to bring them the mAP from zero to something else. I still think there must be a problem elsewhere.

https://github.com/ultralytics/yolov3/blob/11366774e2a821dfcc281ee800b68141d989344f/train.py#L129-L139
batch_0

@Ownmarc good news maybe. I was posting a comment on a different issue when I realized we had inadvertantly introduced a bug in the master branch related to wh loss computation. This was fixed in our test branch but not the master. I fixed this and also hardcoded plotting of the first train and test batches. When you train normally now, two files will appear in your yolov3/ directory, train_batch0.jpg and test_batch0.jpg.

You should git pull to incorporate the wh bug fix, and retrain, viewing the two images to make sure the boxes seem correctly aligned. I will add this tidbit to the tutorials as well, this should go far in helping ppl make sure their training and testing data is well formatted.

Just checked your commit, it makes alot of sense since my training was getting worst the more I trained and it looked like a the loss on yolo layers wasn鈥檛 conputing correctly. I鈥檒l keep you updated!

train batch, everything looks normal:
train_batch0

Yep! Thanks alot, mAP is showing and increasing! I think we can close this

image

Hmm, it must have been that wh bug. Phew, we have to be careful here when we adjust the code. Ok, glad to hear its all working now!! I hope other people aren't running into the same problem. Probably leave open for a few days just in case anyone goes searching.

If anyone has training problems on custom data, please git pull the latest commit and try again, as a bug was present around the first week of April that has now been resolved!

@Ownmarc hey wait a second, your screenshot is showing Recall > 1 for several categories, which is a statistical impossibility. The high recall seems to be feeding to the mAP as well, causing it to increase above 1 for the same categories.

We validated our mAP against pycocotools and darknet very well, and now it matches to 1%. I just recomputed for another issue: https://github.com/ultralytics/yolov3/issues/199#issuecomment-481216891

Do you know what might be causing this?

It seems to take into account 1 object that is predicted with 2 bboxes almost on top of each other as 2 good predictions when there is, in fact, only 1 object!

This is at 0.7 conf threshold, see this cannon having 2 bbox. They are probably counted as 2 good detections.

image

Here we can see the cannon class at 1.01:

image

Hmm. This is surely the finest test.py result I've ever seen.

It's pretty common to get two boxes for one object, that should just give you a P of 0.5 and an R of 1.0 for that instance.

Somehow your list of TPs is greater than the list of target objects, which should not be possible. In any case, it looks like the issue mellowed out eventually. I scanned the test.py code but didn't see anything out of the ordinary. Since this doesn't occur on COCO data I'll just forget about it for now.

@glenn-jocher, let me know if you want my dataset to test it!

@Ownmarc maybe if you put it all in a Google Drive folder I can check it out when I have more free time! It would certainly be interesting to see what's causing the > 1 recalls.

Do you think you could have duplicate rows in your labels file? Is it still there at the default test parameters, i.e.nms_thres 0.5?

Yes, didn't change anything from master repo other then the __init__.py I need in the util folder (for windows), the font size of the plotting and setting visible gpu to the train script.

No duplicates, they were all made using a script from xml files which were hand annotated and checked using other scripts to make sure there was nothing impossible (like 8 gold_mines since a player can only have 7 maximum)

I have been training using darkflow and the exact same dataset and this was not hapenning. Maybe this can help you (from Darkflow repo):

import numpy as np

class BoundBox:
    def __init__(self, classes):
        self.x, self.y = float(), float()
        self.w, self.h = float(), float()
        self.c = float()
        self.class_num = classes
        self.probs = np.zeros((classes,))

def overlap(x1,w1,x2,w2):
    l1 = x1 - w1 / 2.;
    l2 = x2 - w2 / 2.;
    left = max(l1, l2)
    r1 = x1 + w1 / 2.;
    r2 = x2 + w2 / 2.;
    right = min(r1, r2)
    return right - left;

def box_intersection(a, b):
    w = overlap(a.x, a.w, b.x, b.w);
    h = overlap(a.y, a.h, b.y, b.h);
    if w < 0 or h < 0: return 0;
    area = w * h;
    return area;

def box_union(a, b):
    i = box_intersection(a, b);
    u = a.w * a.h + b.w * b.h - i;
    return u;

def box_iou(a, b):
    return box_intersection(a, b) / box_union(a, b);

def prob_compare(box):
    return box.probs[box.class_num]

def prob_compare2(boxa, boxb):
    if (boxa.pi < boxb.pi):
        return 1
    elif(boxa.pi == boxb.pi):
        return 0
    else:
        return -1

Test loss seems to be good, here is the result.txt if it can help (21 classes) :

image

I was running into the same issue, but after pulling the repo again this morning, I still couldn't get mAP to not be zero. But even weirder is that my wh becomes inf after a while.

image

I am training with transfer learning on a custom dataset with 1 class.

@vivian-wong see https://github.com/ultralytics/yolov3/issues/168 to control divergent width-height (wh) losses.

Working now! Thank you!

Working now! Thank you!

cong!

@glenn-jocher, was the mAP over 1.0 fixed or should we open a new issue ?

@Ownmarc > 1 recall is likely still an open issue as I have not worked on it due to an inability to rapidly reproduce it. Another user mentioned it as well. The darkflow iou code is nice to see, but their code only operates on one box at a time, whereas ours is vectorized for speed (computes many ious simultaneously). In any case, I don't think IOU is the problem.

If you raise a new issue specifically about the > 1 recall, make sure you supply all the elements to reproduce the issue, i.e. a google drive folder with the trained model, the *.data and *.cfg files, and the *.txt file pointing to the training images and labels folders, and of course the folders themselves. This would be the most useful.

Good morning everyone,

hi @Ownmarc can you share me your weight, cfg and .data .names from your "clash of clans" detector. I realy wanna test it, maybe make a youtube video.

Best regards,
Antoine

@Ownmarc mAP > 1.0 has been fixed now.

@glenn-jocher cool, what was the issue ? Thanks

@Ownmarc just a bug in the test.py mAP calculation. Non-exclusive target-anchor combinations were being allowed which caused some targets to count as multiple TPs.

can anyone help me, i performed transfer learning on yolov3 and its detecting objects but not at the right place. cant upload the result because using company's network.

@JalajK hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

  • Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3  # remove existing
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
  • Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
  • Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, PyTorch >= 1.4 etc. You can also use our Google Colab Notebook and our Docker Image to test your code in a working environment.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Git clone a clean copy of the repo and run one of the custom tutorials. If your results match ours then its your data.

https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data
python3 train.py --data data/coco_10img.data

You should see this. The 10 image example only takes about 5 minutes on a GCP VM V100 instance.
from utils import utils; utils.plot_results()
results

in the log.txt output what's the information about mAP?

@Leprechault from utils import utils; utils.plot_results() to plot your mAP.

Thanks, @glenn-jocher !!! Works, but I have a conceptual question too about the way that mAP is calculated. Because in the output file (log.txt) after each iteration a have e.g. "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images" and I recognize the six variables as iteration, total loss, loss error, rate, time and number of images, but
I don't know where comes from the percentual mAP.

@Leprechault I don't know what log file you refer to.

mAP is computed in a standard manner, i.e. area under a PR curve.

Okay, on the pre-trained weights, i did transfer learning on cnr parking dataset which has only one class i.e., car and format the data according to the yolo model.
in cfg file, i changed the all 3 yolo layers according to this dataset and the no. of filters.
after all this, im getting this kind of output. help me out guys.
IMG-20200316-WA0007

bounding boxes are shifted by same particular distance, I'm not getting it.

@JalajK your labels look incorrect. You need to check your train_batch0.jpg and test_batch0.jpg produced when training starts.

@Leprechault I don't know what log file you refer to.

mAP is computed in a standard manner, i.e. area under a PR curve.

Thank @glenn-jocher I search my goal in the wrong file. Now, I try to get the mAP results in a *txt file using ./darknet detector map obj.data obj.cfg backup/obj_100.weights -map | tee result_mAP.txt, but doesn't work (no output *txt file created). Any ideas?

@Leprechault suggest you raise the issue on the relevant repo.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Rajasekhar06 picture Rajasekhar06  路  3Comments

leeyunhome picture leeyunhome  路  3Comments

suarezjessie picture suarezjessie  路  5Comments

kaaier picture kaaier  路  3Comments

Arunavameister picture Arunavameister  路  3Comments