I'm trying to train Mask RCNN on my custom dataset to perform segmentation task on new classes that coco or ImageNet never seen.
python2 tools/train_net.py --cfg configs/encov/copy_maskrcnn_R-101-FPN.yaml OUTPUT_DIR /tmp/detectron-output/ I get the following error (complete log file is here output.txt)
At:
/home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(269): _expand_bbox_targets
/home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(181): _sample_rois
/home/encov/Softwares/Detectron/lib/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs
/home/encov/Softwares/Detectron/lib/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at pybind_state.h:423] . Exception encountered running PythonOp function: ValueError: could not broadcast input array from shape (4) into shape (0)
This error comes from the expand box procedure that increase the size of bounding box weights by 4 (see roi_data/fast_rcnn.py). It basically takes the first element which represents the class, checks that it is not 0 (the background) and copy weights values at index_class x 4. Error happens because the index is greater than the NUM_CLASSES parameter which has been used to create the output array.
I try same training except I set NUM_CLASSES to 81 which was the number of classes used for coco training which is working on my set-up by the way.
The error I described above does not appear but in the really early beginning of the the iterations, bounding box areas is null which cause some divisions by zero.
output2.txt
Has someone experienced the same issue for training fast rcnn or mask rcnn on a custom dataset ?
I really suspect an error in my json coco-like file because training on coco dataset in working correctly.
Thank you for your help,
python --version output: Python 2.7.12How many classes do you have in your custom dataset? If you have N classes, then you should set NUM_CLASSES: N+1 in your yaml config file. For example, for six classes you should set NUM_CLASSES: 7. For 80 classes COCO you should set it to 81.
Thank you :+1: . I have 4 classes so I should set NUM_CLASSES to 5.
Now I now I must put this value but I already tried it and I got first ERROR 1 I described above.
The error (from what I understood in lib/roi_data/fast_rcnn.py) comes from the fact _expand_boxes_targets create an array with size defined by NUM_CLASSES parameter but when this array is filled up in for loop, it takes first box element as the class index and error happens when this class index is greater than the NUM_CLASSES parameter. The fact I can get a greater class index value than NUM_CLASSES is weird.
For the record, I put bellow the lines of code I talking about (in lib/roi_data/fast_rcnn.py ):
l.251 num_bbox_reg_classes = cfg.MODEL.NUM_CLASSES
l.256 bbox_targets = blob_utils.zeros((clss.size, 4 * num_bbox_reg_classes))
ll.260-270
inds = np.where(clss > 0)[0]
# print("DEBUG: inds value is {}".format(inds))
for ind in inds:
cls = int(clss[ind])
start = 4 * cls
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = (1.0, 1.0, 1.0, 1.0)
Error occurs when cls is greater than cfg.MODEL.NUM_CLASSES
@francoto I have a question, how you converted your dataset to coco format?
Thanks in advance.
@raninbowlalala
From my initial dataset (not COCO_like dataset), I write a Python script to fill every field of COCO dataset dict:
You can find COCO dataset format here.
I also installed pycocotools and copy/paste coco.py as mycustomdataset.py.
Then, you "just" have to redefine your constructor method in order to create similar format dataset.
Make sure it is working by trying to load your final .json file using COCO API.
Hope it will help you
@francoto Thanks for your help, I converted my dataset to coco format successfully.
I finally made it:
Hi francoto,
I am also training Mask-RCNN using my own data. But I got a problem, the bbox precision is satisfying (mAP 0.5+, mAR 0.6+), but the segmentation or mask accuracy is poor (mAP 0.2, mAR 0.2). Do you achieve good performance on instance segmentation?
Hello @YanWang2014,
In my case, I got similar performances for bbox and mask (AP ~ 0.8).
My current dataset is quite small (~350 images for test and 40 images for validation) so I don't know if the number I gave is relevant.
Good luck for your task.
I'm sorry but I'm still struggling with training on a different number of classes. I have 2 classes in my annotation file so I set the number of classes in my config file to 3. I added some lines in the net.py to prevent the class related layers from loading (after this line):
if (keyname == 'cls_score_w' or keyname == 'cls_score_b' or keyname == 'bbox_pred_w' or keyname == 'bbox_pred_b'):
logger.info('ignore: ' + keyname)
continue
That way Detectron should not load the weights from these layers and leave them in the dimensions as configured in the .yaml file.
That's the only code I've changed but I still get the error: could not broadcast input array from shape (4) into shape (0)
@francoto How did you solve this problem or did you train from scratch?
I'm happy for any help.
Hello @mattifrind !
From my perspective, I'd say that you should let Detectron deal with the configuration you describe in your .yaml file. I re used weights models used in getting_started/yaml* examples.
I would say that you should not 'force' detectron to forget about weights.
The only issue I got was that the name of the classes detected displayed in the pdf results remains the 'old' ones: 'person', 'bicycle', etc.
@francoto are you using inference to show your pdf results? as I was initially doing that and in infer_simple.py it uses a dummy dataset in dummy_coco_dataset = dummy_datasets.get_coco_dataset() ... with the COCO dataset labels. Also, when you get your bounding boxes, do they make sense? Because I get decent masks, but the bounding boxes are not around these masks.
Hey @francoto! Thanks for your help.
I tried this because of a tip from Kaiming He in this issue. I tried to understand the code and found out that the model structure defined in the .yaml file will be overridden by the weights of the .pkl file. So if I configure 3 classes the, for example, cls_score layer which would be 3 depth will be replaced by the layer from the pkl file with a dimension of 81. Am I wrong?
Unfortunately, I get errors with or without my code change in the net.py.
Hey @gabriellap,
the commands I use is :
to train:
$ python2 tools/train_net.py \
--cfg configs/<custom_config>.yaml \
OUTPUT_DIR /tmp/detectron-output
to test:
$python2 tools/infer_simple.py --cfg configs/<custom_config> \
--output-dir /tmp/detection-visualizations \
--image-ext png \
--wts /tmp/detectron-output/<ouput_train_directory>/generalized_rcnn/model_final.pkl \
demo # location of the images
I can't share publicly my results but my bounding boxes location and mask are quite fine (I obviously have some errors but considering my dataset is only ~350 images, I think its pretty amazing) but as I said I still have the COCO dataset labels. I need to check the infer_simple.py file.
Hey @mattifrind, from what I remember, the error could not broadcast input array from shape (4) into shape (0) happened in my case when the parameter cfg.MODEL.NUM_CLASSES is not matching with clss in lib/roi_data/fast_rcnn.py. I guess that when you apply your fix to delete manually the weights corresponding to the class you don't use, they may still have one index corresponding to an index of your class greater than your cfg.MODEL.NUM_CLASSES.
For the record, I put bellow the lines of code I talking about (in lib/roi_data/fast_rcnn.py ):
l.251 num_bbox_reg_classes = cfg.MODEL.NUM_CLASSES
l.256 bbox_targets = blob_utils.zeros((clss.size, 4 * num_bbox_reg_classes))
ll.260-270
inds = np.where(clss > 0)[0] # print("DEBUG: inds value is {}".format(inds)) for ind in inds: cls = int(clss[ind]) start = 4 * cls end = start + 4 bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] bbox_inside_weights[ind, start:end] = (1.0, 1.0, 1.0, 1.0)Error occurs when cls is greater than cfg.MODEL.NUM_CLASSES
Have you tried to train without changing the code for the weights ?
Have you added a 'background' label in your dataset ? In my case, I tried to add manually one and that was messing everything up.
Hope that may help you out,
Hey, @francoto thanks for your help!
without changing the code I get this error:
Traceback (most recent call last):
File "tools/train_net.py", line 128, in <module>
main()
File "tools/train_net.py", line 110, in main
checkpoints = utils.train.train_model()
File "/home/ubuntu/detectron/lib/utils/train.py", line 58, in train_model
setup_model_for_training(model, weights_file, output_dir)
File "/home/ubuntu/detectron/lib/utils/train.py", line 161, in setup_model_for_training
nu.initialize_gpu_from_weights_file(model, weights_file, gpu_id=0)
File "/home/ubuntu/detectron/lib/utils/net.py", line 119, in initialize_gpu_from_weights_file
src_blobs[src_name].shape)
AssertionError: Workspace blob cls_score_w with shape (3, 1024) does not match weights file shape (81, 1024)
Didn't you had this problem to when you changed the number of classes and used a pre-trained model?
With my change, I get the broadcast error. My dataset has no background class and my 2 categories have the indices 1 and 2 (i also tried 0 and 1 with the same effect).
Hello @mattifrind, I haven't seem these kind of errors so I can't really help you on this.
Good luck :crossed_fingers:
@mattifrind and @francoto I got that error because I tried with a pre-trained model with 81 classes, so to fix this I just use the ImageNet pretrained model in MODEL_ZOO
Did you find any solution to train without WEIGHTS?, I tried with WEIGHTS: '' (empy) and got AssertionError: Negative areas founds So, any idea?
Will you solve the problem? I encountered the same problem. @mattifrind Thanks in advance.
@ZSSNIKE because I need to get my task done I stopped trying to fix that. It works for me with 81 classes as a workaround. Good luck!
@mattifrind how do you set 81 classes? I mean, only changing NUM_CLASSES to 81 is not enough? right? Do you also need to convert the annotations to contains 81 categories?
@chenweisomebody126 yes the pre-trained models from Detectron have 81 classes and so the configuration files (.yaml) too. I wrote a Java program to convert my dataset in the COCO format. After the conversion, the program delets 2 classes of the original COCO dataset and adds the two of me. That's how I train.
@francoto I am getting exactly the same erroras yours.
ValueError: could not broadcast input array from shape (4) into shape (0)
My custom dataset has 4 classes and I have set Num classes to 5. I have added the dataset in dataset_catalog.py and generated the json for the dataset. A sample annotation in the json file looks like the following :
'id': 6, 'image_id': 1, 'category_id': 1, 'iscrowd': 0, 'area': 4674, 'bbox': [630.0, 482.0, 82.0, 59.0], 'segmentation': [[650.0, 540.5, 629.5, 540.0, 630.0, 483.5, 711.5, 482.0, 711.0, 538.5, 650.0, 540.5]], 'width': 1599, 'height': 1903}
U have written the steps but I can't understand them clearly. Can u please elaborate on the steps u took ,i.e. :
bounding box coordinates in my dataset were wrong : How are they wrong and how did u correct them'
Finally, I misunderstood the part where I need a 'background' class : How did u correct this part
Thanks in advance
Hello @vsd550, it has been a while I post this and I haven't use Detectron since I got my first results but I will try to explain.
"bounding box coordinates in my dataset were wrong" : as I said, I convert my custom dataset into COCO-like form on my own and I was not taking the correct parameters to compute the bounding box according to the segmentation polygon (if I remember right, my bounding box was only 1 pixel height and width).
"background" Previously I was manually adding a 'background' class in my COCO dataset with id=0 but without any occurrence in the dataset. My problem got solve when I remove this 'background class' from the dataset I design. I think that Detectron is actually creating this background class in the very beginning of the training, when it loads your dataset.
I hope I make my steps clear (or clearer) for you.
I meet error 1, after check my data, i found that my number_class is right = 150, so in the yaml file ,number_class = 151, but error 1. finaly i found that one of 150 classes is not right, i was added a ' ' ,
delete ' ' ,it works all right
my en is so pool!!!
我训练的时候出现了错误1 ,150个类别,yaml文件写的151,我确认这样是正确的写法,因为我以前跑别的数据成功了,所以我去检查了我这次的数据,结果 一个类别的名称 "Bao_yan_sheng_chou_1750ml" ,它有两个写法,其中一个是前面多了一个空格,导致实际类别是151个,所以出错了,删了以后就好了,
所以我觉得error1,80%都是自己数据有问题导致的
Hello @YanWang2014,
In my case, I got similar performances for bbox and mask (AP ~ 0.8).
My current dataset is quite small (~350 images for test and 40 images for validation) so I don't know if the number I gave is relevant.
Good luck for your task.
Hi! I got a same problem as you when I trained my custom dataset. The box AP is ~0.6, while the mask AP is ~0.5. Did you find the cause for this phenomenon? Look forward to your reply!
@francoto
The only issue I got was that the name of the classes detected displayed in the pdf results remains the 'old' ones: 'person', 'bicycle', etc.
I got the same problem. Did you fix it. How? Would you please tell me? Thanks.
Hello @maiff,
I actually find out that the category name where written directly in the file detectron/datasets/dummy_datasets.py in method get_coco_dataset() so I just created my get_custom_dataset() method with the category name I wanted. Then you update the file tools/infer_simple.py with your new method. It did the trick for me.
(I'm still using an old version from january 2019)
Good luck :)
@francoto Thank you very much, I have solved it
@francoto which cloud service you used or do you have gpu on personal computer?
@vaibhavkumar049 I use my local GPU which is GeFoce GTX 1080.
Most helpful comment
I finally made it: