Mmdetection: raise ValueError('All dicts must have the same number of keys') ValueError: All dicts must have the same number of keys

Created on 21 Mar 2020  ·  8Comments  ·  Source: open-mmlab/mmdetection

I use tools/train.py configs/cascade_rcnn_x101_32x4d_fpn_1x.py --gpus 4, to train my model, then this error appear:ValueError: All dicts must have the same number of keys. I don’t modify the config file.If I alter the number of gpus –gpus 1.It will works well.

Error:
root@2925ad04cc86:/home/mmdetection# python3 tools/train.py configs/cascade_rcnn_x101_32x4d_fpn_1x.py --gpus 4

2020-03-20 23:37:19,145 - mmdet - INFO - Environment info:

MMDetection Compiler: GCC 5.4
MMCV: 0.4.0
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
MMDetection: 1.1.0+unknown
CUDA available: True
CUDA_HOME: /usr/local/cuda
GPU 0,1,2,3: GeForce GTX 1080 Ti
TorchVision: 0.3.0
NVCC: Cuda compilation tools, release 9.0, V9.0.176
PyTorch: 1.1.0
MMDetection CUDA Compiler: 9.0
sys.platform: linux
OpenCV: 4.1.0
Python: 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609]
PyTorch compiling details: PyTorch built with:

  • GCC 4.9
  • Intel(R) Math Kernel Library Version 2018.0.1 Product Build 20171007 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.18.1 (Git Hash 7de7e5d02bf687f971e7668963649728356e0c20)
  • OpenMP 201307 (a.k.a. OpenMP 4.0)
  • NNPACK is enabled
  • CUDA Runtime 9.0
  • NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_50,code=compute_50
  • CuDNN 7.5.1
  • Magma 2.5.0
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=True, USE_NNPACK=True, USE_OPENMP=ON,

2020-03-20 23:37:19,145 - mmdet - INFO - Distributed training: False
2020-03-20 23:37:19,145 - mmdet - INFO - Config:
/home/mmdetection/configs/cascade_rcnn_x101_32x4d_fpn_1x.py.py

model settings

model = dict(
type='CascadeRCNN',
num_stages=3,
# pretrained='open-mmlab://resnext101_32x4d',
pretrained='./checkpoints/resnext101_32x4d-a5af3160.pth',

backbone=dict(
    type='ResNeXt',
    depth=101,
    groups=32,
    base_width=4,
    num_stages=4,
    out_indices=(0, 1, 2, 3),
    frozen_stages=1,
    norm_cfg=dict(type='BN', requires_grad=True),
    style='pytorch'),
neck=dict(
    type='FPN',
    in_channels=[256, 512, 1024, 2048],
    out_channels=256,
    num_outs=5),
rpn_head=dict(
    type='RPNHead',
    in_channels=256,
    feat_channels=256,
    anchor_scales=[8],
    anchor_ratios=[0.5, 1.0, 2.0],
    anchor_strides=[4, 8, 16, 32, 64],
    target_means=[.0, .0, .0, .0],
    target_stds=[1.0, 1.0, 1.0, 1.0],
    loss_cls=dict(
        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
    loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
bbox_roi_extractor=dict(
    type='SingleRoIExtractor',
    roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
    out_channels=256,
    featmap_strides=[4, 8, 16, 32]),
bbox_head=[
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=11,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=True,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=11,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.05, 0.05, 0.1, 0.1],
        reg_class_agnostic=True,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)),
    dict(
        type='SharedFCBBoxHead',
        num_fcs=2,
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=11,
        target_means=[0., 0., 0., 0.],
        target_stds=[0.033, 0.033, 0.067, 0.067],
        reg_class_agnostic=True,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
])

model training and testing settings

train_cfg = dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=0,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_across_levels=False,
nms_pre=2000,
nms_post=2000,
max_num=2000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=[
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.6,
neg_iou_thr=0.6,
min_pos_iou=0.6,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False),
dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.7,
min_pos_iou=0.7,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)
],
stage_loss_weights=[1, 0.5, 0.25])
test_cfg = dict(
rpn=dict(
nms_across_levels=False,
nms_pre=1000,
nms_post=1000,
max_num=1000,
nms_thr=0.7,
min_bbox_size=0),
rcnn=dict(
score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100))

dataset settings

dataset_type = 'VOCDataset'
data_root = '/home/trainDataSet/'#'data/VOCdevkit/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', *img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', *
img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
imgs_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=[
data_root + 'VOC2007/ImageSets/Main/trainval.txt',
],
img_prefix=[data_root + 'VOC2007/',],
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/val.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

optimizer

optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
checkpoint_config = dict(interval=1)

yapf:disable

log_config = dict(
interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])

yapf:enable

runtime settings

total_epochs = 12
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/cascade_rcnn_x101_32x4d_fpn_1x-0320'
load_from = None
resume_from = None
workflow = [('train', 1)]

2020-03-20 23:37:20,815 - mmdet - INFO - load model from: ./checkpoints/resnext101_32x4d-a5af3160.pth
2020-03-20 23:37:24,606 - mmdet - INFO - Start running, host: root@2925ad04cc86, work_dir: /home/mmdetection/work_dirs/cascade_rcnn_x101_32x4d_fpn_1x-0320
2020-03-20 23:37:24,606 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
2020-03-20 23:38:35,106 - mmdet - INFO - Epoch [1][50/1076] lr: 0.00797, eta: 5:02:13, time: 1.410, data_time: 0.071, memory: 7394, s1.loss_cls: 0.0733, s1.loss_bbox: 0.0013, s0.loss_cls: 0.1444, s0.acc: 98.2505, loss_rpn_bbox: 0.0220, s2.acc: 97.3032, s1.acc: 97.4937, s2.loss_cls: 0.0386, s2.loss_bbox: 0.0001, s0.loss_bbox: 0.0045, loss_rpn_cls: 0.1826, loss: 0.4667
2020-03-20 23:39:32,950 - mmdet - INFO - Epoch [1][100/1076] lr: 0.00931, eta: 4:34:02, time: 1.157, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0335, s1.loss_bbox: 0.0091, s0.loss_cls: 0.1112, s0.acc: 98.2754, loss_rpn_bbox: 0.0121, s2.acc: 99.3667, s1.acc: 99.0713, s2.loss_cls: 0.0123, s2.loss_bbox: 0.0014, s0.loss_bbox: 0.0314, loss_rpn_cls: 0.0673, loss: 0.2783
2020-03-20 23:40:29,569 - mmdet - INFO - Epoch [1][150/1076] lr: 0.01064, eta: 4:22:15, time: 1.132, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0425, s1.loss_bbox: 0.0155, s0.loss_cls: 0.1506, s0.acc: 97.4678, loss_rpn_bbox: 0.0106, s2.acc: 99.2236, s1.acc: 98.7192, s2.loss_cls: 0.0144, s2.loss_bbox: 0.0024, s0.loss_bbox: 0.0484, loss_rpn_cls: 0.0516, loss: 0.3361
2020-03-20 23:41:26,572 - mmdet - INFO - Epoch [1][200/1076] lr: 0.01197, eta: 4:16:18, time: 1.140, data_time: 0.022, memory: 7394, s1.loss_cls: 0.0470, s1.loss_bbox: 0.0210, s0.loss_cls: 0.1454, s0.acc: 97.3516, loss_rpn_bbox: 0.0093, s2.acc: 99.1182, s1.acc: 98.4370, s2.loss_cls: 0.0152, s2.loss_bbox: 0.0036, s0.loss_bbox: 0.0458, loss_rpn_cls: 0.0457, loss: 0.3329
2020-03-20 23:42:24,277 - mmdet - INFO - Epoch [1][250/1076] lr: 0.01331, eta: 4:12:57, time: 1.154, data_time: 0.022, memory: 7394, s1.loss_cls: 0.0544, s1.loss_bbox: 0.0278, s0.loss_cls: 0.1412, s0.acc: 97.3765, loss_rpn_bbox: 0.0080, s2.acc: 98.9502, s1.acc: 98.0742, s2.loss_cls: 0.0172, s2.loss_bbox: 0.0058, s0.loss_bbox: 0.0406, loss_rpn_cls: 0.0346, loss: 0.3296
2020-03-20 23:43:21,512 - mmdet - INFO - Epoch [1][300/1076] lr: 0.01464, eta: 4:10:03, time: 1.145, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0698, s1.loss_bbox: 0.0418, s0.loss_cls: 0.1627, s0.acc: 96.5952, loss_rpn_bbox: 0.0099, s2.acc: 98.3979, s1.acc: 97.1577, s2.loss_cls: 0.0225, s2.loss_bbox: 0.0109, s0.loss_bbox: 0.0499, loss_rpn_cls: 0.0392, loss: 0.4069
2020-03-20 23:44:18,465 - mmdet - INFO - Epoch [1][350/1076] lr: 0.01597, eta: 4:07:33, time: 1.139, data_time: 0.022, memory: 7394, s1.loss_cls: 0.0687, s1.loss_bbox: 0.0395, s0.loss_cls: 0.1595, s0.acc: 96.6416, loss_rpn_bbox: 0.0091, s2.acc: 98.4079, s1.acc: 97.2558, s2.loss_cls: 0.0226, s2.loss_bbox: 0.0108, s0.loss_bbox: 0.0488, loss_rpn_cls: 0.0352, loss: 0.3943
2020-03-20 23:45:15,525 - mmdet - INFO - Epoch [1][400/1076] lr: 0.01731, eta: 4:05:29, time: 1.141, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0685, s1.loss_bbox: 0.0396, s0.loss_cls: 0.1500, s0.acc: 96.7744, loss_rpn_bbox: 0.0097, s2.acc: 98.1963, s1.acc: 97.1099, s2.loss_cls: 0.0240, s2.loss_bbox: 0.0129, s0.loss_bbox: 0.0426, loss_rpn_cls: 0.0339, loss: 0.3811
2020-03-20 23:46:13,713 - mmdet - INFO - Epoch [1][450/1076] lr: 0.01864, eta: 4:04:12, time: 1.164, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0675, s1.loss_bbox: 0.0402, s0.loss_cls: 0.1371, s0.acc: 96.8926, loss_rpn_bbox: 0.0093, s2.acc: 98.0420, s1.acc: 97.0156, s2.loss_cls: 0.0250, s2.loss_bbox: 0.0143, s0.loss_bbox: 0.0394, loss_rpn_cls: 0.0302, loss: 0.3630
2020-03-20 23:47:11,164 - mmdet - INFO - Epoch [1][500/1076] lr: 0.01997, eta: 4:02:40, time: 1.149, data_time: 0.022, memory: 7394, s1.loss_cls: 0.0725, s1.loss_bbox: 0.0453, s0.loss_cls: 0.1548, s0.acc: 96.5024, loss_rpn_bbox: 0.0090, s2.acc: 97.8232, s1.acc: 96.6743, s2.loss_cls: 0.0268, s2.loss_bbox: 0.0169, s0.loss_bbox: 0.0456, loss_rpn_cls: 0.0326, loss: 0.4036
2020-03-20 23:48:08,236 - mmdet - INFO - Epoch [1][550/1076] lr: 0.02000, eta: 4:01:06, time: 1.141, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0679, s1.loss_bbox: 0.0388, s0.loss_cls: 0.1426, s0.acc: 96.8599, loss_rpn_bbox: 0.0077, s2.acc: 97.9312, s1.acc: 97.0029, s2.loss_cls: 0.0259, s2.loss_bbox: 0.0148, s0.loss_bbox: 0.0395, loss_rpn_cls: 0.0297, loss: 0.3669
2020-03-20 23:49:04,853 - mmdet - INFO - Epoch [1][600/1076] lr: 0.02000, eta: 3:59:28, time: 1.132, data_time: 0.022, memory: 7394, s1.loss_cls: 0.0755, s1.loss_bbox: 0.0481, s0.loss_cls: 0.1515, s0.acc: 96.3633, loss_rpn_bbox: 0.0080, s2.acc: 97.4531, s1.acc: 96.3058, s2.loss_cls: 0.0294, s2.loss_bbox: 0.0198, s0.loss_bbox: 0.0447, loss_rpn_cls: 0.0289, loss: 0.4057
2020-03-20 23:50:01,604 - mmdet - INFO - Epoch [1][650/1076] lr: 0.02000, eta: 3:58:00, time: 1.135, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0775, s1.loss_bbox: 0.0499, s0.loss_cls: 0.1540, s0.acc: 96.2070, loss_rpn_bbox: 0.0073, s2.acc: 97.2655, s1.acc: 96.1928, s2.loss_cls: 0.0310, s2.loss_bbox: 0.0211, s0.loss_bbox: 0.0463, loss_rpn_cls: 0.0246, loss: 0.4117
2020-03-20 23:50:57,996 - mmdet - INFO - Epoch [1][700/1076] lr: 0.02000, eta: 3:56:29, time: 1.128, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0723, s1.loss_bbox: 0.0447, s0.loss_cls: 0.1430, s0.acc: 96.5825, loss_rpn_bbox: 0.0070, s2.acc: 97.2779, s1.acc: 96.4376, s2.loss_cls: 0.0300, s2.loss_bbox: 0.0201, s0.loss_bbox: 0.0417, loss_rpn_cls: 0.0259, loss: 0.3847
2020-03-20 23:51:54,653 - mmdet - INFO - Epoch [1][750/1076] lr: 0.02000, eta: 3:55:08, time: 1.133, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0795, s1.loss_bbox: 0.0495, s0.loss_cls: 0.1579, s0.acc: 96.0879, loss_rpn_bbox: 0.0075, s2.acc: 96.9344, s1.acc: 95.9855, s2.loss_cls: 0.0331, s2.loss_bbox: 0.0240, s0.loss_bbox: 0.0441, loss_rpn_cls: 0.0269, loss: 0.4224
2020-03-20 23:52:51,650 - mmdet - INFO - Epoch [1][800/1076] lr: 0.02000, eta: 3:53:55, time: 1.140, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0828, s1.loss_bbox: 0.0555, s0.loss_cls: 0.1653, s0.acc: 95.8198, loss_rpn_bbox: 0.0087, s2.acc: 96.6513, s1.acc: 95.6689, s2.loss_cls: 0.0344, s2.loss_bbox: 0.0253, s0.loss_bbox: 0.0497, loss_rpn_cls: 0.0283, loss: 0.4500
2020-03-20 23:53:48,543 - mmdet - INFO - Epoch [1][850/1076] lr: 0.02000, eta: 3:52:42, time: 1.138, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0763, s1.loss_bbox: 0.0509, s0.loss_cls: 0.1458, s0.acc: 96.2705, loss_rpn_bbox: 0.0067, s2.acc: 96.8233, s1.acc: 95.9776, s2.loss_cls: 0.0333, s2.loss_bbox: 0.0245, s0.loss_bbox: 0.0430, loss_rpn_cls: 0.0219, loss: 0.4025
2020-03-20 23:54:45,166 - mmdet - INFO - Epoch [1][900/1076] lr: 0.02000, eta: 3:51:27, time: 1.132, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0719, s1.loss_bbox: 0.0427, s0.loss_cls: 0.1433, s0.acc: 96.5156, loss_rpn_bbox: 0.0076, s2.acc: 97.2332, s1.acc: 96.4431, s2.loss_cls: 0.0309, s2.loss_bbox: 0.0207, s0.loss_bbox: 0.0376, loss_rpn_cls: 0.0357, loss: 0.3904
2020-03-20 23:55:42,432 - mmdet - INFO - Epoch [1][950/1076] lr: 0.02000, eta: 3:50:23, time: 1.145, data_time: 0.023, memory: 7394, s1.loss_cls: 0.0739, s1.loss_bbox: 0.0466, s0.loss_cls: 0.1453, s0.acc: 96.4023, loss_rpn_bbox: 0.0065, s2.acc: 97.0089, s1.acc: 96.2097, s2.loss_cls: 0.0322, s2.loss_bbox: 0.0221, s0.loss_bbox: 0.0409, loss_rpn_cls: 0.0237, loss: 0.3911
2020-03-20 23:56:38,710 - mmdet - INFO - Epoch [1][1000/1076] lr: 0.02000, eta: 3:49:07, time: 1.126, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0735, s1.loss_bbox: 0.0492, s0.loss_cls: 0.1402, s0.acc: 96.2124, loss_rpn_bbox: 0.0067, s2.acc: 96.6843, s1.acc: 95.8805, s2.loss_cls: 0.0322, s2.loss_bbox: 0.0248, s0.loss_bbox: 0.0412, loss_rpn_cls: 0.0213, loss: 0.3891
2020-03-20 23:57:34,789 - mmdet - INFO - Epoch [1][1050/1076] lr: 0.02000, eta: 3:47:51, time: 1.122, data_time: 0.021, memory: 7394, s1.loss_cls: 0.0716, s1.loss_bbox: 0.0460, s0.loss_cls: 0.1343, s0.acc: 96.4614, loss_rpn_bbox: 0.0065, s2.acc: 96.8533, s1.acc: 96.1544, s2.loss_cls: 0.0320, s2.loss_bbox: 0.0232, s0.loss_bbox: 0.0374, loss_rpn_cls: 0.0216, loss: 0.3725
2020-03-20 23:59:07,666 - mmdet - INFO - Epoch [2][50/1076] lr: 0.02000, eta: 3:41:43, time: 1.216, data_time: 0.077, memory: 7394, s1.loss_cls: 0.0763, s1.loss_bbox: 0.0531, s0.loss_cls: 0.1411, s0.acc: 96.1631, loss_rpn_bbox: 0.0071, s2.acc: 96.4252, s1.acc: 95.6669, s2.loss_cls: 0.0335, s2.loss_bbox: 0.0269, s0.loss_bbox: 0.0406, loss_rpn_cls: 0.0231, loss: 0.4016
2020-03-21 00:00:05,275 - mmdet - INFO - Epoch [2][100/1076] lr: 0.02000, eta: 3:40:58, time: 1.152, data_time: 0.024, memory: 7394, s1.loss_cls: 0.0652, s1.loss_bbox: 0.0481, s0.loss_cls: 0.1215, s0.acc: 96.5552, loss_rpn_bbox: 0.0060, s2.acc: 96.7345, s1.acc: 96.2413, s2.loss_cls: 0.0294, s2.loss_bbox: 0.0250, s0.loss_bbox: 0.0395, loss_rpn_cls: 0.0160, loss: 0.3508
2020-03-21 00:01:03,058 - mmdet - INFO - Epoch [2][150/1076] lr: 0.02000, eta: 3:40:14, time: 1.156, data_time: 0.024, memory: 7394, s1.loss_cls: 0.0693, s1.loss_bbox: 0.0483, s0.loss_cls: 0.1287, s0.acc: 96.3203, loss_rpn_bbox: 0.0067, s2.acc: 96.7182, s1.acc: 96.0118, s2.loss_cls: 0.0308, s2.loss_bbox: 0.0244, s0.loss_bbox: 0.0399, loss_rpn_cls: 0.0204, loss: 0.3686
Traceback (most recent call last):
File "tools/train.py", line 142, in
main()
File "tools/train.py", line 138, in main
meta=meta)
File "/home/mmdetection/mmdet/apis/train.py", line 111, in train_detector
meta=meta)
File "/home/mmdetection/mmdet/apis/train.py", line 235, in _non_dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/usr/local/lib/python3.5/dist-packages/mmcv-0.4.0-py3.5-linux-x86_64.egg/mmcv/runner/runner.py", line 359, in run
epoch_runner(data_loaders[i], kwargs)
File "/usr/local/lib/python3.5/dist-packages/mmcv-0.4.0-py3.5-linux-x86_64.egg/mmcv/runner/runner.py", line 263, in train
self.model, data_batch, train_mode=True, *kwargs)
File "/home/mmdetection/mmdet/apis/train.py", line 75, in batch_processor
losses = model(
data)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(
input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.gather(outputs, self.output_device)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
return gather(outputs, output_device, dim=self.dim)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 67, in gather
return gather_map(outputs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/scatter_gather.py", line 59, in gather_map
raise ValueError('All dicts must have the same number of keys')
ValueError: All dicts must have the same number of keys

Most helpful comment

I also met this problem in fast_rcnn. Hope to solve it.

All 8 comments

@wangwanguo03 Hi, have you solved this problem? I encountered the same when I train with gpu 2.

I also met this problem in fast_rcnn. Hope to solve it.

I have the same error.how to solve it?

I also met this problem in fast_rcnn. find_unused_parameters=True is not slove it.Is there a better solution

For me, I find out there are some images with no annotated bboxes. After filtering out these images from the training set, this problem disappears magically!

This only happens when the version of your mmdetection is low. In bbox_head.py, the loss function returns a dict. If there is no foreground bounding boxes, the losses is not assigned with the key 'loss_bbox'. This may cause error for parallel training on multiple GPUs, since the number of keys in losses are different.

For your case, you can try to modify mmdet/models/roi_heads/bbox_heads/bbox_head.py, add the following lines in the function loss():

    if pos_inds.any():
        ...

    # add the following two lines
    else:
        losses['loss_bbox'] = bbox_pred.sum() * 0

I find out that in the latest version of mmdetection (2.4), this problem has already been solved.
See mmdet/models/roi_heads/bbox_heads/bbox_head.py for more details.

我更新了新版本,还是会出现out mermory。个人觉的应该是rpn计算nms时anchor较多,用的是gpu计算而不是cpu

------------------ 原始邮件 ------------------
发件人: "open-mmlab/mmdetection" <[email protected]>;
发送时间: 2020年10月29日(星期四) 下午5:00
收件人: "open-mmlab/mmdetection"<[email protected]>;
抄送: "利剑"<[email protected]>;"Comment"<[email protected]>;
主题: Re: [open-mmlab/mmdetection] raise ValueError('All dicts must have the same number of keys') ValueError: All dicts must have the same number of keys (#2303)

This only happens when the version of your mmdetection is low. In bbox_head.py, the loss function returns a dict. If there is no foreground bounding boxes, the losses is not assigned with the key 'loss_bbox'. This may cause error for parallel training on multiple GPUs, since the number of keys in losses are different.

For your case, you can try to modify mmdet/models/roi_heads/bbox_heads/bbox_head.py, add the following lines in the function loss():
if pos_inds.any(): ... # add the following two lines else: losses['loss_bbox'] = bbox_pred.sum() * 0

I find out that in the latest version of mmdetection (2.4), this problem has already been solved.
See mmdet/models/roi_heads/bbox_heads/bbox_head.py for more details.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.

It seems that @pdpdpd2013 has already solved this issue. Thanks a lot. Therefore, this issue is closed. Please create a new issue if you meet any other questions or meet a similar issue in the newest mmdet with a detailed description following the error template.

Was this page helpful?
0 / 5 - 0 ratings