Maskrcnn-benchmark: Get 0 AP and AR when testing, and the inference result is very bad.

Created on 29 Jan 2019 · 3Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

I train the model on a custom dataset. The logs shows the loss is decreasing, and everything is fine during training. However, when I run test_net.py, I get all zero.

My train command:

python3 ${HOME}/maskrcnn-benchmark/tools/train_net.py --config-file e2e_faster_rcnn_R_50_C4_1x.yaml

My test command:

python3 ${HOME}/maskrcnn-benchmark/tools/test_net.py --config-file e2e_faster_rcnn_R_50_C4_1x.yaml

I changed SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH to 1, MODEL.ROI_BOX_HEAD.NUM_CLASSES to 2 in e2e_faster_rcnn_R_50_C4_1x.yaml. (There is only 1 class in the custom dataset) Also the learning rate divided by 8 (since I train this on 1 GPU instead of 8 in the original config).

In the test log:

2019-01-26 23:25:49,362 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ./model_0187500.pth

So it loads the correct model. But I get:

loading annotations into memory...
Done (t=24.70s)
creating index...
index created!
2019-01-26 23:26:17,842 maskrcnn_benchmark.inference INFO: Start evaluation on pbrs_2d_det_val dataset(56880 images).
100%|#############################################| 56880/56880 [3:00:43<00:00,  5.25it/s]
2019-01-27 02:27:02,544 maskrcnn_benchmark.inference INFO: Total inference time: 3:00:44.702537 (0.1906593272999537 s / img per device, on 1 devices)
2019-01-27 02:27:09,042 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-01-27 02:27:09,042 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-01-27 02:27:32,035 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=28.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=7539.21s).
Accumulating evaluation results...
DONE (t=50.25s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
2019-01-27 04:35:31,709 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 7.855486339732153e-06), ('AP50', 1.8437597949739107e-05), ('AP75', 9.5320979098$
6892e-07), ('APs', 0.0), ('APm', 1.3218627722287408e-07), ('APl', 7.860205797780202e-06)]))])

Here is my train log (part):

2019-01-26 23:19:39,610 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:06:24  iter: 188280  loss_rpn_box_reg: 0.0382 (0.0848)  loss: 0.7238 (0.8822)  data: 0.0044 (0.0051)  loss_o
bjectness: 0.0655 (0.1181)  loss_classifier: 0.3213 (0.3585)  time: 0.9481 (0.9080)  loss_box_reg: 0.2501 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:19:58,007 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:06:06  iter: 188300  loss_rpn_box_reg: 0.0715 (0.0847)  loss: 0.7635 (0.8822)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0843 (0.1181)  loss_classifier: 0.3098 (0.3585)  time: 0.9139 (0.9080)  loss_box_reg: 0.2684 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:20:16,619 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:50  iter: 188320  loss_rpn_box_reg: 0.0645 (0.0847)  loss: 0.7520 (0.8821)  data: 0.0047 (0.0051)  loss_o
bjectness: 0.0914 (0.1181)  loss_classifier: 0.3230 (0.3585)  time: 0.9267 (0.9080)  loss_box_reg: 0.2662 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:20:34,573 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:31  iter: 188340  loss_rpn_box_reg: 0.0716 (0.0847)  loss: 0.7228 (0.8821)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0804 (0.1181)  loss_classifier: 0.3247 (0.3585)  time: 0.8853 (0.9080)  loss_box_reg: 0.2430 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:20:52,592 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:12  iter: 188360  loss_rpn_box_reg: 0.0531 (0.0847)  loss: 0.7295 (0.8821)  data: 0.0047 (0.0051)  loss_o
bjectness: 0.0588 (0.1181)  loss_classifier: 0.3179 (0.3585)  time: 0.9197 (0.9080)  loss_box_reg: 0.2451 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:21:10,207 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:53  iter: 188380  loss_rpn_box_reg: 0.0746 (0.0847)  loss: 0.7422 (0.8821)  data: 0.0048 (0.0051)  loss_o
bjectness: 0.0704 (0.1181)  loss_classifier: 0.3112 (0.3585)  time: 0.8765 (0.9080)  loss_box_reg: 0.2885 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:21:27,977 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:33  iter: 188400  loss_rpn_box_reg: 0.0843 (0.0847)  loss: 0.7807 (0.8821)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0709 (0.1181)  loss_classifier: 0.3101 (0.3585)  time: 0.8796 (0.9080)  loss_box_reg: 0.2669 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:21:46,508 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:16  iter: 188420  loss_rpn_box_reg: 0.0568 (0.0847)  loss: 0.7233 (0.8821)  data: 0.0050 (0.0051)  loss_o
bjectness: 0.0501 (0.1181)  loss_classifier: 0.3229 (0.3585)  time: 0.9467 (0.9080)  loss_box_reg: 0.2582 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:22:05,012 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:59  iter: 188440  loss_rpn_box_reg: 0.0607 (0.0847)  loss: 0.7227 (0.8821)  data: 0.0046 (0.0051)  loss_o
bjectness: 0.0689 (0.1181)  loss_classifier: 0.3114 (0.3585)  time: 0.9376 (0.9080)  loss_box_reg: 0.2639 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:22:23,559 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:42  iter: 188460  loss_rpn_box_reg: 0.0726 (0.0847)  loss: 0.7143 (0.8821)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0657 (0.1181)  loss_classifier: 0.3023 (0.3585)  time: 0.9253 (0.9080)  loss_box_reg: 0.2568 (0.3208)  lr: 0.001250  max mem: 3807
2019-01-26 23:22:41,895 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:24  iter: 188480  loss_rpn_box_reg: 0.0443 (0.0847)  loss: 0.7598 (0.8820)  data: 0.0050 (0.0051)  loss_o
bjectness: 0.0893 (0.1181)  loss_classifier: 0.3214 (0.3585)  time: 0.9491 (0.9080)  loss_box_reg: 0.2369 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:23:00,549 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:08  iter: 188500  loss_rpn_box_reg: 0.0394 (0.0847)  loss: 0.7044 (0.8820)  data: 0.0043 (0.0051)  loss_o
bjectness: 0.0723 (0.1181)  loss_classifier: 0.3250 (0.3585)  time: 0.9640 (0.9080)  loss_box_reg: 0.2412 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:23:18,507 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:49  iter: 188520  loss_rpn_box_reg: 0.0933 (0.0847)  loss: 0.7587 (0.8820)  data: 0.0049 (0.0051)  loss_o
bjectness: 0.0674 (0.1181)  loss_classifier: 0.3090 (0.3585)  time: 0.9081 (0.9080)  loss_box_reg: 0.2892 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:23:37,546 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:33  iter: 188540  loss_rpn_box_reg: 0.0334 (0.0847)  loss: 0.7382 (0.8820)  data: 0.0050 (0.0051)  loss_o
bjectness: 0.0715 (0.1181)  loss_classifier: 0.3145 (0.3585)  time: 0.9794 (0.9080)  loss_box_reg: 0.2474 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:23:55,078 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:13  iter: 188560  loss_rpn_box_reg: 0.0855 (0.0847)  loss: 0.7598 (0.8820)  data: 0.0049 (0.0051)  loss_o
bjectness: 0.0550 (0.1181)  loss_classifier: 0.3234 (0.3585)  time: 0.8946 (0.9080)  loss_box_reg: 0.2682 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:24:13,263 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:55  iter: 188580  loss_rpn_box_reg: 0.0754 (0.0847)  loss: 0.7294 (0.8820)  data: 0.0050 (0.0051)  loss_o
bjectness: 0.0618 (0.1181)  loss_classifier: 0.3176 (0.3585)  time: 0.8887 (0.9080)  loss_box_reg: 0.2582 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:24:31,342 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:37  iter: 188600  loss_rpn_box_reg: 0.0730 (0.0847)  loss: 0.7447 (0.8820)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0673 (0.1181)  loss_classifier: 0.3206 (0.3585)  time: 0.8810 (0.9080)  loss_box_reg: 0.2778 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:24:49,717 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:19  iter: 188620  loss_rpn_box_reg: 0.0504 (0.0847)  loss: 0.7608 (0.8820)  data: 0.0052 (0.0051)  loss_o
bjectness: 0.0807 (0.1181)  loss_classifier: 0.3327 (0.3585)  time: 0.9305 (0.9080)  loss_box_reg: 0.2564 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:25:07,588 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:00  iter: 188640  loss_rpn_box_reg: 0.0609 (0.0847)  loss: 0.7882 (0.8820)  data: 0.0051 (0.0051)  loss_o
bjectness: 0.0656 (0.1181)  loss_classifier: 0.3550 (0.3585)  time: 0.8833 (0.9080)  loss_box_reg: 0.2731 (0.3207)  lr: 0.001250  max mem: 3807
2019-01-26 23:25:25,763 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:00:42  iter: 188660  loss_rpn_box_reg: 0.0526 (0.0847)  loss: 0.7696 (0.8819)  data: 0.0046 (0.0051)  loss_o
bjectness: 0.0669 (0.1181)  loss_classifier: 0.3196 (0.3585)  time: 0.9373 (0.9080)  loss_box_reg: 0.2619 (0.3207)  lr: 0.001250  max mem: 3807

I also use this script to get the bbox prediction on some validation images, but the bbox result is very bad:

import cv2 
import os
import numpy as np
from maskrcnn_benchmark.config import cfg 
from predictor import COCODemo

config_file = 'e2e_faster_rcnn_R_50_C4_1x.yaml'

cfg.merge_from_file(config_file)
cfg.merge_from_list(["MODEL.DEVICE", "cpu"])

coco_demo = COCODemo(
    cfg,
    min_image_size=800,
    confidence_threshold=0.9
)

image_root = "/mnt/disk3/zzz/datasets/pbrs_2d_det/inference"
pred_root = "preds"

for image_name in os.listdir(image_root):
    image_path = os.path.join(image_root, image_name)
    image = cv2.imread(image_path)
    predictions = coco_demo.run_on_opencv_image(image)
    cv2.imwrite(os.path.join(pred_root, image_name), predictions)

Source

KuribohG

Most helpful comment

Yes. I checked my dataset and it was in wrong format. Very sorry for my carelessness. I will close this issue.

KuribohG on 2 Feb 2019

👍2

All 3 comments

It's difficult to say what could be wrong. I'd recommend that you try visualizing the results obtained by your dataset to see if the bounding boxes are in the right format first, for both training and testing images. There might be a problem with your custom dataset that is putting the boxes in the wrong format maybe?

fmassa on 29 Jan 2019

According to your training log, the training loss does not decrease. For example, the loss_objectness should be very small not like around 0.11. I guess your data format is wrong. If you are using coco format, try to use pycocotools to visualize your training set. Maybe you can find something there.

chengyangfu on 29 Jan 2019

👍2

Yes. I checked my dataset and it was in wrong format. Very sorry for my carelessness. I will close this issue.

KuribohG on 2 Feb 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

size mismatch

CF2220160244 · 3Comments

problem at last setp 'python setup.py build develop'.

nanyoullm · 3Comments

Raise ValueError: Type mismatch (<type 'str'> vs. <type 'tuple'>) with values (coco_2017_train vs. ('coco_2017_train',)) for config key: DATASETS.TRAIN

SkeletonOne · 3Comments

cuda runtime error (77): an illegal memory access was encountered

IenLong · 4Comments

Confused about some Pooler parameters

salehiac · 4Comments