I train the model on a custom dataset. The logs shows the loss is decreasing, and everything is fine during training. However, when I run test_net.py, I get all zero.
My train command:
python3 ${HOME}/maskrcnn-benchmark/tools/train_net.py --config-file e2e_faster_rcnn_R_50_C4_1x.yaml
My test command:
python3 ${HOME}/maskrcnn-benchmark/tools/test_net.py --config-file e2e_faster_rcnn_R_50_C4_1x.yaml
I changed SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH to 1, MODEL.ROI_BOX_HEAD.NUM_CLASSES to 2 in e2e_faster_rcnn_R_50_C4_1x.yaml. (There is only 1 class in the custom dataset) Also the learning rate divided by 8 (since I train this on 1 GPU instead of 8 in the original config).
In the test log:
2019-01-26 23:25:49,362 maskrcnn_benchmark.utils.checkpoint INFO: Loading checkpoint from ./model_0187500.pth
So it loads the correct model. But I get:
loading annotations into memory...
Done (t=24.70s)
creating index...
index created!
2019-01-26 23:26:17,842 maskrcnn_benchmark.inference INFO: Start evaluation on pbrs_2d_det_val dataset(56880 images).
100%|#############################################| 56880/56880 [3:00:43<00:00, 5.25it/s]
2019-01-27 02:27:02,544 maskrcnn_benchmark.inference INFO: Total inference time: 3:00:44.702537 (0.1906593272999537 s / img per device, on 1 devices)
2019-01-27 02:27:09,042 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-01-27 02:27:09,042 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-01-27 02:27:32,035 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=28.13s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=7539.21s).
Accumulating evaluation results...
DONE (t=50.25s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
2019-01-27 04:35:31,709 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 7.855486339732153e-06), ('AP50', 1.8437597949739107e-05), ('AP75', 9.5320979098$
6892e-07), ('APs', 0.0), ('APm', 1.3218627722287408e-07), ('APl', 7.860205797780202e-06)]))])
Here is my train log (part):
2019-01-26 23:19:39,610 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:06:24 iter: 188280 loss_rpn_box_reg: 0.0382 (0.0848) loss: 0.7238 (0.8822) data: 0.0044 (0.0051) loss_o
bjectness: 0.0655 (0.1181) loss_classifier: 0.3213 (0.3585) time: 0.9481 (0.9080) loss_box_reg: 0.2501 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:19:58,007 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:06:06 iter: 188300 loss_rpn_box_reg: 0.0715 (0.0847) loss: 0.7635 (0.8822) data: 0.0051 (0.0051) loss_o
bjectness: 0.0843 (0.1181) loss_classifier: 0.3098 (0.3585) time: 0.9139 (0.9080) loss_box_reg: 0.2684 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:20:16,619 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:50 iter: 188320 loss_rpn_box_reg: 0.0645 (0.0847) loss: 0.7520 (0.8821) data: 0.0047 (0.0051) loss_o
bjectness: 0.0914 (0.1181) loss_classifier: 0.3230 (0.3585) time: 0.9267 (0.9080) loss_box_reg: 0.2662 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:20:34,573 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:31 iter: 188340 loss_rpn_box_reg: 0.0716 (0.0847) loss: 0.7228 (0.8821) data: 0.0051 (0.0051) loss_o
bjectness: 0.0804 (0.1181) loss_classifier: 0.3247 (0.3585) time: 0.8853 (0.9080) loss_box_reg: 0.2430 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:20:52,592 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:05:12 iter: 188360 loss_rpn_box_reg: 0.0531 (0.0847) loss: 0.7295 (0.8821) data: 0.0047 (0.0051) loss_o
bjectness: 0.0588 (0.1181) loss_classifier: 0.3179 (0.3585) time: 0.9197 (0.9080) loss_box_reg: 0.2451 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:21:10,207 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:53 iter: 188380 loss_rpn_box_reg: 0.0746 (0.0847) loss: 0.7422 (0.8821) data: 0.0048 (0.0051) loss_o
bjectness: 0.0704 (0.1181) loss_classifier: 0.3112 (0.3585) time: 0.8765 (0.9080) loss_box_reg: 0.2885 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:21:27,977 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:33 iter: 188400 loss_rpn_box_reg: 0.0843 (0.0847) loss: 0.7807 (0.8821) data: 0.0051 (0.0051) loss_o
bjectness: 0.0709 (0.1181) loss_classifier: 0.3101 (0.3585) time: 0.8796 (0.9080) loss_box_reg: 0.2669 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:21:46,508 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:04:16 iter: 188420 loss_rpn_box_reg: 0.0568 (0.0847) loss: 0.7233 (0.8821) data: 0.0050 (0.0051) loss_o
bjectness: 0.0501 (0.1181) loss_classifier: 0.3229 (0.3585) time: 0.9467 (0.9080) loss_box_reg: 0.2582 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:22:05,012 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:59 iter: 188440 loss_rpn_box_reg: 0.0607 (0.0847) loss: 0.7227 (0.8821) data: 0.0046 (0.0051) loss_o
bjectness: 0.0689 (0.1181) loss_classifier: 0.3114 (0.3585) time: 0.9376 (0.9080) loss_box_reg: 0.2639 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:22:23,559 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:42 iter: 188460 loss_rpn_box_reg: 0.0726 (0.0847) loss: 0.7143 (0.8821) data: 0.0051 (0.0051) loss_o
bjectness: 0.0657 (0.1181) loss_classifier: 0.3023 (0.3585) time: 0.9253 (0.9080) loss_box_reg: 0.2568 (0.3208) lr: 0.001250 max mem: 3807
2019-01-26 23:22:41,895 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:24 iter: 188480 loss_rpn_box_reg: 0.0443 (0.0847) loss: 0.7598 (0.8820) data: 0.0050 (0.0051) loss_o
bjectness: 0.0893 (0.1181) loss_classifier: 0.3214 (0.3585) time: 0.9491 (0.9080) loss_box_reg: 0.2369 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:23:00,549 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:03:08 iter: 188500 loss_rpn_box_reg: 0.0394 (0.0847) loss: 0.7044 (0.8820) data: 0.0043 (0.0051) loss_o
bjectness: 0.0723 (0.1181) loss_classifier: 0.3250 (0.3585) time: 0.9640 (0.9080) loss_box_reg: 0.2412 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:23:18,507 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:49 iter: 188520 loss_rpn_box_reg: 0.0933 (0.0847) loss: 0.7587 (0.8820) data: 0.0049 (0.0051) loss_o
bjectness: 0.0674 (0.1181) loss_classifier: 0.3090 (0.3585) time: 0.9081 (0.9080) loss_box_reg: 0.2892 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:23:37,546 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:33 iter: 188540 loss_rpn_box_reg: 0.0334 (0.0847) loss: 0.7382 (0.8820) data: 0.0050 (0.0051) loss_o
bjectness: 0.0715 (0.1181) loss_classifier: 0.3145 (0.3585) time: 0.9794 (0.9080) loss_box_reg: 0.2474 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:23:55,078 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:02:13 iter: 188560 loss_rpn_box_reg: 0.0855 (0.0847) loss: 0.7598 (0.8820) data: 0.0049 (0.0051) loss_o
bjectness: 0.0550 (0.1181) loss_classifier: 0.3234 (0.3585) time: 0.8946 (0.9080) loss_box_reg: 0.2682 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:24:13,263 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:55 iter: 188580 loss_rpn_box_reg: 0.0754 (0.0847) loss: 0.7294 (0.8820) data: 0.0050 (0.0051) loss_o
bjectness: 0.0618 (0.1181) loss_classifier: 0.3176 (0.3585) time: 0.8887 (0.9080) loss_box_reg: 0.2582 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:24:31,342 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:37 iter: 188600 loss_rpn_box_reg: 0.0730 (0.0847) loss: 0.7447 (0.8820) data: 0.0051 (0.0051) loss_o
bjectness: 0.0673 (0.1181) loss_classifier: 0.3206 (0.3585) time: 0.8810 (0.9080) loss_box_reg: 0.2778 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:24:49,717 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:19 iter: 188620 loss_rpn_box_reg: 0.0504 (0.0847) loss: 0.7608 (0.8820) data: 0.0052 (0.0051) loss_o
bjectness: 0.0807 (0.1181) loss_classifier: 0.3327 (0.3585) time: 0.9305 (0.9080) loss_box_reg: 0.2564 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:25:07,588 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:01:00 iter: 188640 loss_rpn_box_reg: 0.0609 (0.0847) loss: 0.7882 (0.8820) data: 0.0051 (0.0051) loss_o
bjectness: 0.0656 (0.1181) loss_classifier: 0.3550 (0.3585) time: 0.8833 (0.9080) loss_box_reg: 0.2731 (0.3207) lr: 0.001250 max mem: 3807
2019-01-26 23:25:25,763 maskrcnn_benchmark.trainer INFO: eta: 5 days, 14:00:42 iter: 188660 loss_rpn_box_reg: 0.0526 (0.0847) loss: 0.7696 (0.8819) data: 0.0046 (0.0051) loss_o
bjectness: 0.0669 (0.1181) loss_classifier: 0.3196 (0.3585) time: 0.9373 (0.9080) loss_box_reg: 0.2619 (0.3207) lr: 0.001250 max mem: 3807
I also use this script to get the bbox prediction on some validation images, but the bbox result is very bad:
import cv2
import os
import numpy as np
from maskrcnn_benchmark.config import cfg
from predictor import COCODemo
config_file = 'e2e_faster_rcnn_R_50_C4_1x.yaml'
cfg.merge_from_file(config_file)
cfg.merge_from_list(["MODEL.DEVICE", "cpu"])
coco_demo = COCODemo(
cfg,
min_image_size=800,
confidence_threshold=0.9
)
image_root = "/mnt/disk3/zzz/datasets/pbrs_2d_det/inference"
pred_root = "preds"
for image_name in os.listdir(image_root):
image_path = os.path.join(image_root, image_name)
image = cv2.imread(image_path)
predictions = coco_demo.run_on_opencv_image(image)
cv2.imwrite(os.path.join(pred_root, image_name), predictions)
It's difficult to say what could be wrong. I'd recommend that you try visualizing the results obtained by your dataset to see if the bounding boxes are in the right format first, for both training and testing images. There might be a problem with your custom dataset that is putting the boxes in the wrong format maybe?
According to your training log, the training loss does not decrease. For example, the loss_objectness should be very small not like around 0.11. I guess your data format is wrong. If you are using coco format, try to use pycocotools to visualize your training set. Maybe you can find something there.
Yes. I checked my dataset and it was in wrong format. Very sorry for my carelessness. I will close this issue.
Most helpful comment
Yes. I checked my dataset and it was in wrong format. Very sorry for my carelessness. I will close this issue.