If It is no good effects when add decoder and ASSP?
Can I add this coder to try?
Adding decoder and ASPP to MobileNet-v2 still improves the performance in our experiments. We do not include them in the released checkpoint since the released MobileNet-v2 checkpoint targets at fast inference speed instead of high accuracy.
Hi, if we wanted to try adding decoder to MobileNet-v2, how can we go about it. Do we simply add DECODER_END_POINTS?
I add --decoder_output_stride=4 parameter It Seem can train, But eval Imou is very low only about 0.03.
How can I do?
@kushagraagrawal
Please try setting in feature_extractor.py, for mobilenet_v2:
DECODER_END_POINTS: [
'layer_4/depthwise_output',
],
@shijh1975
It seems that your model converges to whole background. Make sure you have every flag properly set and try to use batch size as large as possible (fine-tune batch norm as well).
okay, I will. And should I use ImageNet pre-trained mobilenet v2 to initialise this?
You could try using ImageNet-pretrained checkpoint or provided pretrained checkpoints. Either should be fine.
@aquariusjay I add DECODER_END_POINTS: [
'layer_4/depthwise_output',
], in feature_extractor.py, the error is: File "/home-ex/shijh/models/research/deeplab/model.py", line 548, in refine_by _decoder end_points[feature_name],
KeyError: 'MobilenetV2/layer_4/depthwise_output', It seem the 'layer4/depthwise_output' is not correct.
@shijh1975 I had the same issue. in model.py, function - refine_by_decoder set feature_name = name.
@aquariusjay can you help me?
@kushagraagrawal Did you have resolved this issue?
@shijh1975 Just repalce the code in 544 "feature_name= '{}/{}......" with feature_name = name. What's more, delete the code 346 "flags.mark_flag_as_required("tf_initial_checkpoint'), and do not include --tf_initial_checkpoint in your training command. It works for me.
Without the "tf_initial_checkpoint" the training is not convergence, however, with "tf_initial_checkpoint", Error reported to Coordinator:
[[Node: save/Assign_423 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](concat_projection/weights/Momentum, save/RestoreV2:423)]] is reported, anybody knows how to fix this issue???
@Shenghsin thank you very much, I also found the trainning is not convergence with tf_initial_checkpoint=false. however I set the batch size =12, it seem still not convergence.
To use decoder for MobileNet model variants, we need to change one more line:
In this line, set feature_name = name.
We will provide an update for this soon.
@aquariusjay thank you, I have correct this issue, but now it seems not covergence when trainning use tf_initial_checkpoint=false, batch size =12 parameter and pascal datasets.
tf_initial_checkpoint=false means you are not using any pretrained checkpoint (not even ImageNet pretrained checkpoint), which makes it very hard to train.
You should at least use the ImageNet pretrained checkpoint provided by MobileNetv2.
@Shenghsin I got the same "Assign requires shapes of both tensors to match" error. Were you able to fix it?
@aquariusjay I amke a mistake,not tf_initial_checkpoint=false, just initialize_last_layer=false, it seems can not covergence, But I change the aspp_with_batch_norm=True to false, I seems can covergence, But the Miou is not very good,only about 0.5.
A few suggestions:
@derinozturk set initialize_last_layer=false will fix this error.
@aquariusjay I have download the pascal AUG datasets and merge with Pascal VOC 2012 datasets, I use aspp_with_batch_norm=fine_tune_batch_norm=True, tf_initial_checkpoint=true, initialize_last_layer=false batch_size =13 to train the mobilenetv2+aspp+decoder model. But after 30000 steps, It seems the Loss covergence to about 0.5,And mIou about 0.66.
Which hyperparameters can I to modify to raise the performance?
Good job, shijh1975!
You could further try to pretrain the model on COCO annotations, and then fine-tune on PASCAL.
@aquariusjay you means I need pretrain the model on COCO, Then I do not restore the logits layer parameter, and fine-tune on pascal?
Did the initial checkpoint (deeplabv3_mnv2_pascal_train_aug) is pretrained by the COCO datasets?
It seems I need write more code to train on COCO datasets.
Can I only use pascal_AUG datasets to get a good performance? Not use COCO to pretrain?
Using COCO should bring you extra >5% improvement. You do not need to be very careful about the performance on COCO, since it is only used for pretraining.
Yes, the provided checkpoint has been pretrained on COCO. See model_zoo.md for details. That is why we suggest simply using the provided checkpoints since we have pretrained them on COCO for you.
Hi @aquariusjay , thank you so much for your help. I got an mIoU of 81.2% on PASCAL VOC val set (no depthwise separable convolution) from using decoder module with MobileNetv2 backbone. I further got mIoU of 78% by using depthwise separable convolution in the decoder.
@aquariusjay, thanks very much for your work, I wonder how about the performance of your provided MobileNet-V2 checkpoints on the cityscapes validation dataset? Or does your provided MobileNet-V2 checkpoint has been trained on the cityscapes train_fine dataset?
@aquariusjay , what about the hyperparameter(such as base_learning_rate, min_scale_factor and initialize_last_layer ... ) of MobileNet-V2 on cityscapes if I want to reproduce your reported results? Thanks very much!!
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
Without the "tf_initial_checkpoint" the training is not convergence, however, with "tf_initial_checkpoint", Error reported to Coordinator:, Assign requires shapes of both tensors to match. lhs shape= [1,1,1280,256] rhs shape= [1,1,512,256]
[[Node: save/Assign_423 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](concat_projection/weights/Momentum, save/RestoreV2:423)]] is reported, anybody knows how to fix this issue???