Mmdetection: About multiscale training

Created on 25 Oct 2018  路  4Comments  路  Source: open-mmlab/mmdetection

I found that the repo has already supported using different image scales in the training.
This should be included in data augmentation, then what does multiscale training mean?
I think multiscale training should has similar format to input image pyramid, which has not been supported by this repo yet?

Most helpful comment

The term of "multiscale training" is adopted in many papers, which indicates resizing images to different scales at each iteration. In the challenge, we use the setting [(400, 1600), (1400, 1600)] which means the short edge are randomly sampled from 400~1400, and the long edge is fixed as 1600.

All 4 comments

You only need to modify the data.train.img_scale field in the config file.
For example, you can use a list of scales [(1333, 800), (1666, 1000)] so that each image will randomly select a scale between (1333, 800) and (1666, 1000).

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        img_scale=[(1333, 800), (1666, 1000)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))

You only need to modify the data.train.img_scale field in the config file.
For example, you can use a list of scales [(1333, 800), (1666, 1000)] so that each image will randomly select a scale between (1333, 800) and (1666, 1000).

data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        img_scale=[(1333, 800), (1666, 1000)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=True,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))

Thanks, but my question is "what does multiscale training means".
I am not sure if simply selecting different resize scale in training phase can be called multiscale training. I notice that in your poster, almost 2% mAP is gained by multiscale training, but simply setting training scale to [(1667,1000),(1333,800),(1000,600)] brings no benefits according to my experiments.

The term of "multiscale training" is adopted in many papers, which indicates resizing images to different scales at each iteration. In the challenge, we use the setting [(400, 1600), (1400, 1600)] which means the short edge are randomly sampled from 400~1400, and the long edge is fixed as 1600.

The term of "multiscale training" is adopted in many papers, which indicates resizing images to different scales at each iteration. In the challenge, we use the setting [(400, 1600), (1400, 1600)] which means the short edge are randomly sampled from 400~1400, and the long edge is fixed as 1600.

Thanks.

Was this page helpful?
0 / 5 - 0 ratings