Maskrcnn-benchmark: Why I got the map value -1

Created on 27 Oct 2018 · 6Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

After training on coco2017 dataset, I got the following output:

2018-10-26 07:57:49,972 maskrcnn_benchmark.inference INFO: Total inference time: 0:30:09.844855 (0.08900146815007108 s / img per device, on 2 devices)
2018-10-26 07:57:57,928 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2018-10-26 07:57:57,928 maskrcnn_benchmark.inference INFO: Preparing bbox results
2018-10-26 07:58:06,302 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=7.88s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=84.73s).
Accumulating evaluation results...
DONE (t=22.14s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
2018-10-26 08:00:20,193 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', -1.0), ('AP50', -1.0), ('AP75', -1.0), ('APs', -1.0), ('APm', -1.0), ('APl', -1.0)]))])

I used the file e2e_faster_rcnn_R_50_FPN_1x.yaml, and modified the lr to 0.01.

Source

auroua

Most helpful comment

I followed you advice and got the following results on coco2017 val dataset:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.405
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.483
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.508
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643

Thanks!

auroua on 1 Nov 2018

👍3

All 6 comments

Hum, this is weird.

Can you show me the full command that you used for running the experiment?
Also, how did you modify the paths_catalog to load the coco2017 instead of the coco2014?

Thanks!

fmassa on 27 Oct 2018

The paths_catalog.py contains a mistake, I have fixed and got a result, but the map is only around 22. I trained on coco_train2017 and tested on coco_val 2017. I used the default config in file e2e_faster_rcnn_R_50_FPN_1x.yaml. I changed the lr to 0.005, and MAX_ITER to 50000. I trained on two gpus 2 images per gpu. How could I get the reported map. Thanks~

auroua on 28 Oct 2018

Hi,

In order to reproduce the results on fewer GPUs than 8, you'll need indeed to change the learning rate (which is good in your case), but also the number of iterations should be increased from the default by a factor of 4x, as well as the learning rate schedules.
So you should have 90000 * 4= 360000 iterations, and you need to change the lr schedules to be [240000, 320000].

Check the README on the single GPU training sections for more informations.

I'm closing the issue ad it doesn't seem to be a bug, but please let me know if you have other questions

fmassa on 28 Oct 2018

👍3 👎1

Thanks for your kindly replay.

auroua on 28 Oct 2018

👍1

I followed you advice and got the following results on coco2017 val dataset:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.587
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.405
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.216
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.401
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.485
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.483
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.508
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.542
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643

Thanks!

auroua on 1 Nov 2018

👍3

Hi,

In order to reproduce the results on fewer GPUs than 8, you'll need indeed to change the learning rate (which is good in your case), but also the number of iterations should be increased from the default by a factor of 4x, as well as the learning rate schedules.
So you should have 90000 * 4= 360000 iterations, and you need to change the lr schedules to be [240000, 320000].

Check the README on the single GPU training sections for more informations.

I'm closing the issue ad it doesn't seem to be a bug, but please let me know if you have other questions

I also have same problem。
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "configs/e2e_faster_rcnn_R_101_FPN_1x.yaml" OUTPUT_DIR "./save_models/8gpus_101/" 2>&1 | tee train_8-101.log &