Yolov3: PRECISION-RECALL CURVE

Created on 5 Mar 2020 · 28Comments · Source: ultralytics/yolov3

🚀 Feature

Precision Recall curves may be plotted by uncommenting code here when running test.py: https://github.com/ultralytics/yolov3/blob/1dc1761f45fe46f077694e1a70472cd7eb788e0c/utils/utils.py#L171

python3 test.py --weights yolov3-spp-ultralytics.pt --cfg yolov3-spp --conf 0.001

For yolov3-spp-ultralytics.pt on COCO, the curves for all 80 classes look like this:
PR_curve

For a single class 0, or person, the curve looks like this. During testing we evaluate the area under the curve as average precision, AP. The curve should ideally go from P=1, R=0 in the top left towards P=0, R=1 at the bottom right to capture the full AP (area under the curve). By varying conf-thres you can select a single point on the curve to run your model at. Depending on your application, you may prioritize precision over recall, or vice versa.
PR_curve (1)

Stale tutorial

Source

glenn-jocher

👍3 🚀1

Most helpful comment

@jas-nat I visited this tutorial.it used accuracy to find the corresponding conf-thres,but there is no accuracy in target detection.it says "a certain binary classification metric" at the beginning of the article,so this method is not suitable.
I don't think the best threshold is calculated by some formula.It's depend on your project.For example,some project need high recall and the precision isn't very important and other projects may require the opposite.So the threshold should be appropriate for your own project.
@glenn-jocher thank you for your reply！

risemeup on 9 Jul 2020

👍4

All 28 comments

@TheophileBlard I'm thinking that perhaps we should plot P, R and mAP at seperate --conf-thres. mAP would naturally be computed near zero (i.e. 0.001), but P and R would perhaps be reported at 0.5 --conf-thres. This would be similar to Google AutoML reported results.
https://cloud.google.com/vision/automl/docs/beginners-guide?authuser=1#how_do_i_interpret_the_precision-recall_curves

0.1 confidence

Screen Shot 2020-03-06 at 1 44 44 PM

0.5 confidence

Screen Shot 2020-03-06 at 1 43 11 PM

0.9 confidence

Screen Shot 2020-03-06 at 1 44 53 PM

glenn-jocher on 6 Mar 2020

@glenn-jocher Sounds great! Current P&R curves are quite misleading, as the 0.001 threshold is defined in the code.

TheophileBlard on 7 Mar 2020

@TheophileBlard all done in feea9c1a65c73475803847c83545b5e7ee6c528c. Thanks for raising the issue, I think this update will help everyone! Here is a before and after run of the cooc64img.data tutorial. Let me know if you see any other problems.
results

glenn-jocher on 7 Mar 2020

I may misunderstand the PR and RECALL at training stage. The plot below is what I got when training (using my custom data that has two classes: stop sign and yield sign, and I used the default setting to split data into train/val). You can see PR, RECALL and mAP are super bad (I used the default conf).

results

However, when I run the test code for all the data together, as below:
python3 test.py --data data/stopsigns.data --cfg cfg/yolov3-spp-stopsigns.cfg \ --weights weights/yolov3-spp-ultralytics-stopsigns.pt
I got results:

           Class    Images   Targets         P         R   [email protected]        F1: 100%|
             all       554       543     0.979     0.947     0.991     0.963
        stopsign       554       276      0.97     0.938      0.99     0.954
       yieldsign       554       267     0.988     0.955     0.992     0.971

Does it mean the model overfits the dataset a lot? But when I used the model to predict some random street pictures downloaded from internet, the performance seems okay.

rightly0716 on 3 May 2020

@rightly0716 testing on your training data is only useful us a sanity check. It serves no purpose in terms of checking for generalization, which is what the test set is for. You P and R don't matter, as you select these yourself.

mAP is the metric that matters. If your training results are not to your liking, then its time for you to experiment on ways to improve them.

glenn-jocher on 3 May 2020

I see. I have only ~500 labelled data, and am wondering whether that can be a reason. Will do more deep analysis and see.

Thanks!

rightly0716 on 3 May 2020

@rightly0716 definitely more data would help. Also make sure you are training at an appropriate image size, and check your train.jpg and test.jpg images for correct labeling.

glenn-jocher on 3 May 2020

I canceled the code commented by ap_per_class in utils as follows:
# Plot fig, ax = plt.subplots(1, 1, figsize=(5, 5)) ax.plot(recall, precision) ax.set_xlabel('Recall') ax.set_ylabel('Precision') ax.set_xlim(0, 1.01) ax.set_ylim(0, 1.01) fig.tight_layout() fig.savefig('PR_curve.png', dpi=300)
There are two classes of my data set, but there is only one class in the PR curve graph. How can I solve it?

tinothy22 on 19 May 2020

@tinothy22 ah yes, I see what you mean. The graph is inside the for loop, so it will plot one graph per class and save it (overwriting the previous one). If you want to overlay all of your classes you must modify the plotting code a bit, to create the figure before the loop, plot as is, and then save the figure after the loop.

glenn-jocher on 19 May 2020

thank you ,I try to change the code

tinothy22 on 19 May 2020

@tinothy22 we definitely want to add this to tensorboard output in the future, for now unfortunately this is the only way to do it.

glenn-jocher on 19 May 2020

that's great! thank you for your guidance, I have got the PR curve

tinothy22 on 19 May 2020

👍1

Hello thank you for the clear explanation. I just want to clarify my understanding of precision and recall curve threshold, as I have been reading this over and over again.

Is it true that threshold can vary for each label?
In feea9c1 why did you change the PR_threshold to be 0.5, but currently when I checked the code it is changed to be 0.1? Where should we specify the threshold for drawing the precision and recall curve.
Is it the same if this line is changed to precision = tpc / n_p ? https://github.com/ultralytics/yolov3/blob/82f653b0f579db97f8908800d45e8f5287f79bd3/utils/utils.py#L177

Thanks and would like to hear your answers!

jas-nat on 4 Jun 2020

@jas-nat the curve has no threshold, it is plotted for all thresholds.

glenn-jocher on 4 Jun 2020

@glenn-jocher Got it. Thanks for answering!

jas-nat on 4 Jun 2020

hello,thank you for the clear explanation.The curves with silder seems useful.It can help me to select conf-thres.I want to know how to do this.

risemeup on 6 Jul 2020

@risemeup you'd need to code up an interactive version of the plot above with something like plotly dashboard maybe. Let me know if you come up with a solution!

glenn-jocher on 6 Jul 2020

hello,thank you for the clear explanation.The curves with silder seems useful.It can help me to select conf-thres.I want to know how to do this.

Hi, I want to ask how can we know the best theshold from the curve? Is it from the results.txt or where?

jas-nat on 7 Jul 2020

@jas-nat There does not seem to be such information in result.txt.The optimal threshold is near the turning point of the PR curve,which have both high precision and recall.You can add some code in ap_per_class function to write every confidence about PR curve and find the best conf-thres.So it will be very convenient if we can plot the curve with slider.

risemeup on 7 Jul 2020

@risemeup I am trying to implement it. Can you guide me how to find the best conf-thres?

I followed this tutorial but it applied precision_recall_curve from scikit-learn. A little confused in finding the corresponding variables in utils.py

Can high F1 score indicate the best conf-thres?

jas-nat on 8 Jul 2020

@risemeup @jas-nat there is no "optimal" or "best" threshold. It is up to the user to set this however they like, depending on the compromise they desire between increasing recall and reducing FPs.

glenn-jocher on 8 Jul 2020

risemeup on 9 Jul 2020

👍4

@glenn-jocher @risemeup Thank you for the replies!

jas-nat on 9 Jul 2020

Sorry I am still trying to understand the codes.
https://github.com/ultralytics/yolov3/blob/bdf546150df5aaeacd1eb415b5dc830096079880/utils/utils.py#L188
In that line, as far as I understand, it will create a new interpolation point referring to conf[i] for x axis and precision[:, 0] or recall[:, 0] for y axis, am I right?

I have 2 questions:

I don't see whenpr_score changes in the codes. Doesn't np.interp() function need the new points at the first argument to draw the interpolation? If I miss something, let me know.
when I try to print p[ci] it only shows 1 value. Does it mean the generated interpolated value?

For your information, I only train for 1 label.

jas-nat on 9 Jul 2020

@jas-nat I will try to explain two questions from my understanding.If I make mistake,let me know.

pr_scorewas set to a fixed parameter.we can get a set of precision,recall and conf when drawing PR curve.But we only need one precision to describe current training status,so we can select the precision when conf-thres set as pr_score.
https://github.com/ultralytics/yolov3/blob/8241bf67bb0cc1c11634bdb4cc76e06ac072192b/utils/utils.py#L167

2.Yes,p[ci] is generated by interpolation.It should be explained above.

I wise it can help you. If there is anything wrong, please point it out.

risemeup on 10 Jul 2020

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] on 10 Aug 2020

@glenn-jocher

According to what you said above, is the P, R, mAP and F1 obtained from training your own data have no reference value? Is there no value in getting P, R, mAP and F1 from the test? How to evaluate the quality of the training model?

test.py Why is conf-thres set to 0.01?
I only use one category, do I need to set --single-cls?

thank you!

hande6688 on 13 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] on 8 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings