I'm trying to reproduce the results of baseline on conv5 in FPN paper on Cityscapes dataset. In FPN paper, using ResNet conv5 as body for RPN and Fast RCNN and 2fc as head for Fast RCNN will yield an acceptable result, though it may be worse than the results with baseline on conv4. I'm trying to reproduce this results with Cityscapes dataset.
However, after training over 10 epoches on the Cityscapes training set, I got 1% mAP50 on the validation set, which is quite different from the phenomena described in FPN paper. I wonder whether there's something wrong with my setting.
Plus, I'm also faced with the same problem with Detectron. It would be best if there could be somebody sharing some ideas.
The config I use to reproduce the results is as following:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
RPN:
PRE_NMS_TOP_N_TEST: 6000
POST_NMS_TOP_N_TEST: 1000
ROI_BOX_HEAD:
NUM_CLASSES: 9
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 256
NMS: 0.3
BACKBONE:
CONV_BODY: "R-50-C5"
OUT_CHANNELS: 2048
ROI_BOX_HEAD:
NUM_CLASSES: 9
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
DATASETS:
TRAIN: ("cityscapes_fine_instanceonly_seg_train_cocostyle",)
TEST: ("cityscapes_fine_instanceonly_seg_val_cocostyle",)
INPUT:
MIN_SIZE_TRAIN: 600
MAX_SIZE_TRAIN: 1200
MIN_SIZE_TEST: 600
MAX_SIZE_TEST: 1200
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
STEPS: (50000,)
MAX_ITER: 70000
IMS_PER_BATCH: 1
TEST:
IMS_PER_BATCH: 1
Hi,
I've never trained a model on cityscapes, so I'm not going to be the best one to give you advices there.
I know though that to obtain best performances, it is recommended to remove only a few rows of the last classifier of a pre-trained model on COCO, for the classes that are similar cityscapes
@fmassa Thanks for your suggestions! I think the mistake I made is that I forgot to change the anchor_size and the pooler scales. Now the result looks normal. For whoever will be interested, the config for the model I finally use is presented as following:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"
BACKBONE:
CONV_BODY: "R-50-C5"
OUT_CHANNELS: 2048
RPN:
ANCHOR_STRIDE: (32,)
PRE_NMS_TOP_N_TEST: 6000
POST_NMS_TOP_N_TEST: 1000
ROI_BOX_HEAD:
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.03125,)
POOLER_SAMPLING_RATIO: 2
NUM_CLASSES: 9
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 256
NMS: 0.3
Thanks for the info!
Yes, unfortunately many of the config options have interdependencies. It would be better to just infer those values automatically, but that's more involved.
Most helpful comment
@fmassa Thanks for your suggestions! I think the mistake I made is that I forgot to change the anchor_size and the pooler scales. Now the result looks normal. For whoever will be interested, the config for the model I finally use is presented as following: