Mmdetection: Training HTC and apply coco stuff 2017 on HTC with semantic head. But something wrong with it

Created on 26 Sep 2020 · 3Comments · Source: open-mmlab/mmdetection

It showed up one error : l only batches of spatial targets supported (non-empty 3D tensors) but get targets of size :[2, 100, 152, 3]. This problem was not solved in last few similar closed questions. So, I mention it again.
Note
Semantic segmentation dataset I used is stuff_annotations_trainval2017.

First thing I need to check with you guys --- in configs file of htc_r50_fpn_1x_coco.py, there is:
train=dict(
seg_prefix=data_root + 'stuffthingmaps/train2017/',
pipeline=train_pipeline)
In "stuffthingmaps/train2017/ ", there are only pixelmap PNG images from stuff_annotations_trainval2017/stuff_train2017_pixelmaps. Am I right?

Note that the .png files are indexed images, which means they store only the label indices and are typically displayed as grayscale images. But when computing semantic prediction loss, these .png should be in size of [#img,weight,height], however, when the error said ''get targets of size :[2, 100, 152, 3] '', where the 3 means RBG channel. This is why we cant run this crossentropy loss.

Notes
To be compatible with COCO, COCO-Stuff has 91 thing classes (1-91), 91 stuff classes (92-182) and 1 class "unlabeled" (0)

My question is: how do you guys convert this 3 channel images into 1 channel images(gray) and how to make the the pixel values of labels represent or just equal to 0~182 ? (I mean do not let pixel values become meanless after converting from 3 to 1 channel )

Thank you a lot.

reimplementation

Source

tianxinhang

Most helpful comment

I found answer. Wrong dataset.
Do not use the stuff dataset downloaded from COCO offical website.
use this one https://github.com/nightrome/cocostuff

stuffthingmaps_trainval2017.zip

tianxinhang on 26 Sep 2020

🎉2 👍1

All 3 comments

i know we need convert colored pixelmap img to gray. But how can we guarantee the each pixel value on the new gray image is thing classes (1-91), 91 stuff classes (92-182). For example, the middle area of stuffthingmaps/train2017/000000000285.png is a bear, which is indexed as 23 in coco label document, but the gray image shows the pixels are 201.863 which represent nothing.
Also in that images, the background are grass, which is indexed as 124, but the gray images shows the pixels are 15.276.
Someone can explaining this ?

tianxinhang on 26 Sep 2020

I found answer. Wrong dataset.
Do not use the stuff dataset downloaded from COCO offical website.
use this one https://github.com/nightrome/cocostuff

stuffthingmaps_trainval2017.zip

tianxinhang on 26 Sep 2020

🎉2 👍1

same problem, thx for your sharing