The training works perfectly with FREEZE_CONV_BODY_AT: 1. But when I try to fine-tune all layers using FREEZE_CONV_BODY_AT: 0, I got nan loss like this:
2019-04-27 12:58:51,707 maskrcnn_benchmark.trainer INFO: Start training
2019-04-27 12:59:06,432 maskrcnn_benchmark.trainer INFO: eta: 6:07:52 iter: 20 loss: 5.6908 (nan) loss_box_reg: 0.0103 (nan) loss_classifier: 0.0842 (nan) loss_mask: 3.5805 (nan) loss_o
bjectness: 0.6767 (nan) loss_rpn_box_reg: 0.0401 (nan) time: 0.7274 (0.7362) data: 0.0025 (0.0143) lr: 0.001793 max mem: 5827
2019-04-27 12:59:20,762 maskrcnn_benchmark.trainer INFO: eta: 6:02:41 iter: 40 loss: nan (nan) loss_box_reg: nan (nan) loss_classifier: nan (nan) loss_mask: nan (nan) loss_objectness: n
an (nan) loss_rpn_box_reg: nan (nan) time: 0.7273 (0.7264) data: 0.0023 (0.0084) lr: 0.001927 max mem: 5827
2019-04-27 12:59:33,545 maskrcnn_benchmark.trainer INFO: eta: 5:47:56 iter: 60 loss: nan (nan) loss_box_reg: nan (nan) loss_classifier: nan (nan) loss_mask: nan (nan) loss_objectness: n
an (nan) loss_rpn_box_reg: nan (nan) time: 0.6303 (0.6973) data: 0.0025 (0.0065) lr: 0.002060 max mem: 5827
Here is the model configuration:
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d"
BACKBONE:
CONV_BODY: "R-101-FPN"
FREEZE_CONV_BODY_AT: 0
RESNETS:
BACKBONE_OUT_CHANNELS: 256
STRIDE_IN_1X1: False
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
RPN:
USE_FPN: True
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
PRE_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TEST: 1000
ROI_HEADS:
USE_FPN: True
ROI_BOX_HEAD:
NUM_CLASSES: 3
POOLER_RESOLUTION: 7
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POOLER_SAMPLING_RATIO: 2
FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
PREDICTOR: "FPNPredictor"
ROI_MASK_HEAD:
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
FEATURE_EXTRACTOR: "MaskRCNNFPNFeatureExtractor"
PREDICTOR: "MaskRCNNC4Predictor"
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 2
RESOLUTION: 28
SHARE_BOX_FEATURE_EXTRACTOR: False
MASK_ON: True
DATASETS:
TRAIN: ("niibridge_2018_train_cocostyle",)
TEST: ("niibridge_2018_test_cocostyle",)
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.005
WEIGHT_DECAY: 0.0001
STEPS: (26000, 29000)
MAX_ITER: 30000
IMS_PER_BATCH: 2
TEST:
IMS_PER_BATCH: 1
INPUT:
BRIGHTNESS: 0.0
CONTRAST: 0.0
SATURATION: 0.0
HUE: 0.0
2019-04-27 12:58:49,811 maskrcnn_benchmark INFO: Running with config:
AMP_VERBOSE: False
DATALOADER:
ASPECT_RATIO_GROUPING: True
NUM_WORKERS: 4
SIZE_DIVISIBILITY: 32
DATASETS:
TEST: ('niibridge_2018_test_cocostyle',)
TRAIN: ('niibridge_2018_train_cocostyle',)
DTYPE: float32
INPUT:
BRIGHTNESS: 0.0
CONTRAST: 0.0
HUE: 0.0
MAX_SIZE_TEST: 1333
MAX_SIZE_TRAIN: 1333
MIN_SIZE_TEST: 800
MIN_SIZE_TRAIN: (800,)
PIXEL_MEAN: [102.9801, 115.9465, 122.7717]
PIXEL_STD: [1.0, 1.0, 1.0]
SATURATION: 0.0
TO_BGR255: True
MODEL:
BACKBONE:
CONV_BODY: R-101-FPN
FREEZE_CONV_BODY_AT: 0
USE_GN: False
CLS_AGNOSTIC_BBOX_REG: False
DEVICE: cuda
FBNET:
ARCH: default
ARCH_DEF:
BN_TYPE: bn
DET_HEAD_BLOCKS: []
DET_HEAD_LAST_SCALE: 1.0
DET_HEAD_STRIDE: 0
DW_CONV_SKIP_BN: True
DW_CONV_SKIP_RELU: True
KPTS_HEAD_BLOCKS: []
KPTS_HEAD_LAST_SCALE: 0.0
KPTS_HEAD_STRIDE: 0
MASK_HEAD_BLOCKS: []
MASK_HEAD_LAST_SCALE: 0.0
MASK_HEAD_STRIDE: 0
RPN_BN_TYPE:
RPN_HEAD_BLOCKS: 0
SCALE_FACTOR: 1.0
WIDTH_DIVISOR: 1
FPN:
USE_GN: False
USE_RELU: False
GROUP_NORM:
DIM_PER_GP: -1
EPSILON: 1e-05
NUM_GROUPS: 32
KEYPOINT_ON: False
MASK_ON: True
META_ARCHITECTURE: GeneralizedRCNN
RESNETS:
BACKBONE_OUT_CHANNELS: 256
DEFORMABLE_GROUPS: 1
NUM_GROUPS: 32
RES2_OUT_CHANNELS: 256
RES5_DILATION: 1
STAGE_WITH_DCN: (False, False, False, False)
STEM_FUNC: StemWithFixedBatchNorm
STEM_OUT_CHANNELS: 64
STRIDE_IN_1X1: False
TRANS_FUNC: BottleneckWithFixedBatchNorm
WIDTH_PER_GROUP: 8
WITH_MODULATED_DCN: False
RETINANET:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDES: (8, 16, 32, 64, 128)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BBOX_REG_BETA: 0.11
BBOX_REG_WEIGHT: 4.0
BG_IOU_THRESHOLD: 0.4
FG_IOU_THRESHOLD: 0.5
INFERENCE_TH: 0.05
LOSS_ALPHA: 0.25
LOSS_GAMMA: 2.0
NMS_TH: 0.4
NUM_CLASSES: 81
NUM_CONVS: 4
OCTAVE: 2.0
PRE_NMS_TOP_N: 1000
PRIOR_PROB: 0.01
SCALES_PER_OCTAVE: 3
STRADDLE_THRESH: 0
USE_C5: True
RETINANET_ON: False
ROI_BOX_HEAD:
CONV_HEAD_DIM: 256
DILATION: 1
FEATURE_EXTRACTOR: FPN2MLPFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 3
NUM_STACKED_CONVS: 4
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 2
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
PREDICTOR: FPNPredictor
USE_GN: False
ROI_HEADS:
BATCH_SIZE_PER_IMAGE: 512
BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
BG_IOU_THRESHOLD: 0.5
DETECTIONS_PER_IMG: 100
FG_IOU_THRESHOLD: 0.5
NMS: 0.5
POSITIVE_FRACTION: 0.25
SCORE_THRESH: 0.05
USE_FPN: True
ROI_KEYPOINT_HEAD:
CONV_LAYERS: (512, 512, 512, 512, 512, 512, 512, 512)
FEATURE_EXTRACTOR: KeypointRCNNFeatureExtractor
MLP_HEAD_DIM: 1024
NUM_CLASSES: 17
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 0
POOLER_SCALES: (0.0625,)
PREDICTOR: KeypointRCNNPredictor
RESOLUTION: 14
SHARE_BOX_FEATURE_EXTRACTOR: True
ROI_MASK_HEAD:
CONV_LAYERS: (256, 256, 256, 256)
DILATION: 1
FEATURE_EXTRACTOR: MaskRCNNFPNFeatureExtractor
MLP_HEAD_DIM: 1024
POOLER_RESOLUTION: 14
POOLER_SAMPLING_RATIO: 2
POOLER_SCALES: (0.25, 0.125, 0.0625, 0.03125)
POSTPROCESS_MASKS: False
POSTPROCESS_MASKS_THRESHOLD: 0.5
PREDICTOR: MaskRCNNC4Predictor
RESOLUTION: 28
SHARE_BOX_FEATURE_EXTRACTOR: False
USE_GN: False
RPN:
ANCHOR_SIZES: (32, 64, 128, 256, 512)
ANCHOR_STRIDE: (4, 8, 16, 32, 64)
ASPECT_RATIOS: (0.5, 1.0, 2.0)
BATCH_SIZE_PER_IMAGE: 256
BG_IOU_THRESHOLD: 0.3
FG_IOU_THRESHOLD: 0.7
FPN_POST_NMS_PER_BATCH: True
FPN_POST_NMS_TOP_N_TEST: 1000
FPN_POST_NMS_TOP_N_TRAIN: 2000
MIN_SIZE: 0
NMS_THRESH: 0.7
POSITIVE_FRACTION: 0.5
POST_NMS_TOP_N_TEST: 1000
POST_NMS_TOP_N_TRAIN: 2000
PRE_NMS_TOP_N_TEST: 1000
PRE_NMS_TOP_N_TRAIN: 2000
RPN_HEAD: SingleConvRPNHead
STRADDLE_THRESH: 0
USE_FPN: True
RPN_ONLY: False
WEIGHT: catalog://ImageNetPretrained/FAIR/20171220/X-101-32x8d
OUTPUT_DIR: .
PATHS_CATALOG: /maskrcnn-benchmark-latest/maskrcnn_benchmark/config/paths_catalog.py
SOLVER:
BASE_LR: 0.005
BIAS_LR_FACTOR: 2
CHECKPOINT_PERIOD: 2500
GAMMA: 0.1
IMS_PER_BATCH: 2
MAX_ITER: 30000
MOMENTUM: 0.9
STEPS: (26000, 29000)
WARMUP_FACTOR: 0.3333333333333333
WARMUP_ITERS: 500
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0001
WEIGHT_DECAY_BIAS: 0
TEST:
DETECTIONS_PER_IMG: 100
EXPECTED_RESULTS: []
EXPECTED_RESULTS_SIGMA_TOL: 4
IMS_PER_BATCH: 1
Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
The source code version I use is from 25th April 2019 (https://github.com/facebookresearch/maskrcnn-benchmark/commit/eb4d3352be7e968b96260c8999a331e8431da95f).
I would like to know how to train Mask R-CNN without freezing any layers.
https://github.com/facebookresearch/maskrcnn-benchmark/issues/283#issuecomment-448165941
It's really about the learning rate. I tried to reduced learning rate from 0.005 to 0.0005. It works now.
For anyone else encountering this, the exact same solution worked for me: I had to bump my lr down from 0.005 to 0.00005 (yeah, two orders of magnitude smaller) before it started working. Before that, I got nan loss as well.
There is a different and more elegant solution IMO:
Most helpful comment
For anyone else encountering this, the exact same solution worked for me: I had to bump my lr down from
0.005to0.00005(yeah, two orders of magnitude smaller) before it started working. Before that, I gotnanloss as well.