Maskrcnn-benchmark: trian cityscapes use coco pretrain model problem ?

Created on 10 Dec 2018 · 10Comments · Source: facebookresearch/maskrcnn-benchmark

❓ Questions and Help

thanks the code for train new datasets cityscapes for instance segementation .
first i train the cityscapes from scratch and the loss is convergence；but i get box_AP and seg_AP is not high as follow , i read the mask_rcnn paper is is higher a lot , I don't know what details I overlooked.

2018-12-07 18:58:13,471 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.266143220179594), ('AP50', 0.4705279119903588), ('AP75', 0.2664711486678874), ('APs', 0.0742186384761436), ('APm', 0.26418817964465885), ('APl', 0.4618351991771723)])), ('segm', OrderedDict([('AP', 0.2169857479304357), ('AP50', 0.4159623962610022), ('AP75', 0.17807455425402843), ('APs', 0.029122872145021395), ('APm', 0.174442224182182), ('APl', 0.42977448859947454)]))])

experiment set on single GTX1080ti :

--config-file "../configs/cityscapes/e2e_mask_rcnn_R_50_FPN_1x_cocostyle.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.00125 SOLVER.MAX_ITER 200000 SOLVER.STEPS "(160000, 180000)" TEST.IMS_PER_BATCH 1

seconde quesition : using COCO pre-training to train cityscapes
when i load the pretrain coco model meet some problem ,the classnums 81->9 ,so the fc parameter should be ignored ,
but the code follow maskrcnn-benchmark/maskrcnn_benchmark/utils/model_serialization.py get problem becase model_state_dict[key] = loaded_state_dict[key_old] overwriting the original value :

def load_state_dict(model, loaded_state_dict):
    model_state_dict = model.state_dict()
    # if the state_dict comes from a model that was wrapped in a
    # DataParallel or DistributedDataParallel during serialization,
    # remove the "module" prefix before performing the matching
    loaded_state_dict = strip_prefix_if_present(loaded_state_dict, prefix="module.")
    align_and_update_state_dicts(model_state_dict, loaded_state_dict) ##model_state_dict[key] = loaded_state_dict[key_old] 

    # use strict loading
model.load_state_dict(model_state_dict)

i use follow code:

def load_state_dict(model, loaded_state_dict):
    model_state_dict = model.state_dict()
    # if the state_dict comes from a model that was wrapped in a
    # DataParallel or DistributedDataParallel during serialization,
    # remove the "module" prefix before performing the matching
    loaded_state_dict = strip_prefix_if_present(loaded_state_dict, prefix="module.")

    # align_and_update_state_dicts(model_state_dict, loaded_state_dict)
    # # finetune
    loaded_state_dict = {k:v for k,v in loaded_state_dict.items() if k in model_state_dict and model_state_dict[k].size()==v.size()}
    model_state_dict.update(loaded_state_dict)
    # use strict loading
    model.load_state_dict(model_state_dict)

but then maskrcnn_benchmark/utils/checkpoint.py get error, i don't know why should load self.optimizer.load_state_dict and self.scheduler.load_state_dict , it has 'momentum_buffer'paremeter , i don't understand why load this parameter . can you explain ? and how can i use coco pretrain model to finetune cityscapes ? thanks !

 def load(self, f=None):
        if self.has_checkpoint():
            # override argument with existing checkpoint
            f = self.get_checkpoint_file()
        if not f:
            # no checkpoint could be found
            self.logger.info("No checkpoint found. Initializing model from scratch")
            return {}
        self.logger.info("Loading checkpoint from {}".format(f))
        checkpoint = self._load_file(f)
        self._load_model(checkpoint)
        if "optimizer" in checkpoint and self.optimizer:
            self.logger.info("Loading optimizer from {}".format(f))
            self.optimizer.load_state_dict(checkpoint.pop("optimizer"))
        if "scheduler" in checkpoint and self.scheduler:
            self.logger.info("Loading scheduler from {}".format(f))
            self.scheduler.load_state_dict(checkpoint.pop("scheduler"))

        # return any further checkpoint data
        return checkpoint

help wanted question

Source

ranjiewwen

Most helpful comment

Hi @ranjiewwen,
I only tried end to end training on cityscapes.
I followed the steps described by the paper, and the result AP[val] is about 0.316.

We train with image scale (shorter side) randomly sampled from [800, 1024], which reduces overfitting; inference is on a single scale of 1024 pixels.

I didn't submit the code because I thought everyone might have their own transformation.
You could refer the below changes:

In transform.py, add this class

class RandomResize(object):
    def __init__(self, min_size, max_size):
        self.min_size = min_size
        self.max_size = max_size

    def get_size(self, image_size):
        w, h = image_size
        min_size = self.min_size
        max_size = self.max_size
        rand = random.randint(min_size, max_size)
        return rand, int(w*rand/h)

    def __call__(self, image, target):
        size = self.get_size(image.size)
        image = F.resize(image, size)
        target = target.resize(image.size)
        return image, target

In build.py, modify build_transforms

if "cityscapes" in cfg.DATASETS.TRAIN[0]:
    if is_train:
        transform = T.Compose(
            [
                T.RandomResize(800, 1024),
                T.RandomHorizontalFlip(flip_prob),
                T.ToTensor(),
                normalize_transform,
            ]
        )
    else:
        transform = T.Compose(
            [
                T.ToTensor(),
                normalize_transform,
            ]
        )
else: #...

henrywang1 on 20 Dec 2018

👍2

All 10 comments

Hi,

I believe best results for cityscapes are obtained after starting from a model pre-trained on COCO, and then some model surgery are done so that the common classes between COCO and cityscapes are kept.
See this file for more information.

About your second question, I'm sorry but I couldn't understand what was the problem that you are facing. Can you give a bit more context?

fmassa on 10 Dec 2018

thanks you @fmassa reply. i have get some help from #15 . but I haven't reproduced the cityscapes instance segmentation result yet .i hope someone can share the cityscpaes model . so i can compare the different.

for the second question is simple : " i want to finetune cityscapse from pretrained coco detectron models with different number of classes".
beacuse the class num is different, so i modify the load_state_dict funtion, but the code also load optimizer and scheduler parameter ,so it also conflict, so when load these parameter, i block this code .
the follow is my result:

| time| set| data_val|segAP|mAp
| ------ | ------ | ------ | ------ | ------ |
| | paper |fine|0.315| |
| | paper |fine+coco|0.365| |
|2018-12-06| single gpu |fine|0.217|0.266|
|2018-12-11| multi gpu| fine| 0.238|0.278|
|2018-12-08| single gpu|fine+coco|0.285|0.331|

ranjiewwen on 20 Dec 2018

I haven't myself trained models on cityscapes, so I might not be the best person to help you with that. Maybe @henrywang1 knows a bit better, as he's the one who originally added support to cityscapes

fmassa on 20 Dec 2018

Hi @ranjiewwen,
I only tried end to end training on cityscapes.
I followed the steps described by the paper, and the result AP[val] is about 0.316.

We train with image scale (shorter side) randomly sampled from [800, 1024], which reduces overfitting; inference is on a single scale of 1024 pixels.

I didn't submit the code because I thought everyone might have their own transformation.
You could refer the below changes:

In transform.py, add this class

class RandomResize(object):
    def __init__(self, min_size, max_size):
        self.min_size = min_size
        self.max_size = max_size

    def get_size(self, image_size):
        w, h = image_size
        min_size = self.min_size
        max_size = self.max_size
        rand = random.randint(min_size, max_size)
        return rand, int(w*rand/h)

    def __call__(self, image, target):
        size = self.get_size(image.size)
        image = F.resize(image, size)
        target = target.resize(image.size)
        return image, target

In build.py, modify build_transforms

if "cityscapes" in cfg.DATASETS.TRAIN[0]:
    if is_train:
        transform = T.Compose(
            [
                T.RandomResize(800, 1024),
                T.RandomHorizontalFlip(flip_prob),
                T.ToTensor(),
                normalize_transform,
            ]
        )
    else:
        transform = T.Compose(
            [
                T.ToTensor(),
                normalize_transform,
            ]
        )
else: #...

henrywang1 on 20 Dec 2018

👍2

thanks @henrywang1 . i will try to train again! look for the good result !

ranjiewwen on 21 Dec 2018

👍1

thanks you @fmassa reply. i have get some help from #15 . but I haven't reproduced the cityscapes instance segmentation result yet .i hope someone can share the cityscpaes model . so i can compare the different.
* for the second question is simple : " i want to finetune cityscapse from pretrained coco detectron models with different number of classes".

* beacuse the class num is different, so i modify the load_state_dict funtion, but the code also load optimizer and scheduler parameter ,so it also conflict, so when load these parameter, i block this code .

* the follow is my result:
time set data_val segAP mAp
paper fine 0.315
paper fine+coco 0.365
2018-12-06 single gpu fine 0.217 0.266
2018-12-11 multi gpu fine 0.238 0.278
2018-12-08 single gpu fine+coco 0.285 0.331

I am wondering what is the mAP in your result? Is it the bbox mAP?

xllau on 1 Mar 2019

thanks you @fmassa reply. i have get some help from #15 . but I haven't reproduced the cityscapes instance segmentation result yet .i hope someone can share the cityscpaes model . so i can compare the different.
* for the second question is simple : " i want to finetune cityscapse from pretrained coco detectron models with different number of classes".

* beacuse the class num is different, so i modify the load_state_dict funtion, but the code also load optimizer and scheduler parameter ,so it also conflict, so when load these parameter, i block this code .

* the follow is my result:
time set data_val segAP mAp
paper fine 0.315
paper fine+coco 0.365
2018-12-06 single gpu fine 0.217 0.266
2018-12-11 multi gpu fine 0.238 0.278
2018-12-08 single gpu fine+coco 0.285 0.331
I am wondering what is the mAP in your result? Is it the bbox mAP?

mAP is for the bbox , you can read original mask r-cnn paper, or read the evaluation code: coco_eval.py

ranjiewwen on 2 Mar 2019

hi @ranjiewwen
had u reproduced the results on cityscapes dataset? i follow the steps in mask-rcnn paper, only get 0.250 using fine dataset, and 0.293 using fine + coco

after set both MAX_SIZE_TRAIN and MAX_SIZE_TEST to 2048 and do the re-trainings, get 0.316 using fine and 0.358 using fine + coco.

zimenglan-sysu-512 on 14 Mar 2019

hi @henrywang1
what value did u set for MAX_SIZE_TRAIN and MAX_SIZE_TEST? is it 2048?

zimenglan-sysu-512 on 14 Mar 2019

Hi @zimenglan-sysu-512
I followed the setting on the paper, so I hard-coded the MIN/MAX_SIZE_TRAIN (as described in https://github.com/facebookresearch/maskrcnn-benchmark/issues/259#issuecomment-449118259)

And I just notice that my previous reply is not complete.
For test, the paper said inference is on a single scale of 1024 pixels.
So we have to let the transform be

            transform = T.Compose(
                [
                    T.Resize(1024, 1024),
                    T.ToTensor(),
                    normalize_transform,
                ]
            )

For other settings or training log, you could send an e-mail me.

henrywang1 on 14 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings