In the Getting_Starting tutorial, we have:
python2 tools/infer_simple.py \
--cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir /tmp/detectron-visualizations \
--image-ext jpg \
--wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
demo
This give us pdf images that visualize segmentation. Is it possible to get output in formats from which I can get pixels in the object mask? Does detectron contain functions to evaulate our testing set for the instance segmentation? Does detectron output confidence of the prediction?
Are there any tutorial from which I can learn in more details about how to use detectron, or the only way is to look at this forum and the code?
Ok, I found out how to output segmentation masks in infer_simple.py - cls_segms variable.
I still did not find out how I can output the prediction confidence.
After running "Inference with Pretrained Models/Directory of Image Files" from Getting-Started tutorial I get the following for the image demo/16004479832_a748d55f21_k.jpg.
cls_boxes[17]
array([[0.0000000e+00, 8.1034927e+01, 2.9909589e+02, 5.6584467e+02, 9.9869162e-01],
[3.2148853e+02, 2.0170921e+02, 9.0500000e+02, 5.9527612e+02, 9.9953604e-01],
[5.7266614e+02, 2.4558983e+02, 9.0472571e+02, 5.5804816e+02, 1.9798927e-01],
[1.3561227e+02, 6.9481277e+01, 4.5119119e+02, 5.1436737e+02, 5.4683097e-02]], dtype=float32)
The class 17 is dog. I would like to check if I correctly interpret this output.
Each raw in this array represents a bounded box for one instance object.
In one raw, the first and second elements represent the coordinates of the box center, the third and fourth elements represent the height and width (or the width and height?), and the fifth element represents the confidence (i.e. probability) for the object box prediction.
Is this correct?
In this image I see 2 dogs, however there are 4 raws (boxes) in cls_boxes. Is this an error in the object detection or do I misinterpret something? However, in the visualization only 2 boxes for dogs are shown.
This is the continuation of my previous post. I consider the same image demo/16004479832_a748d55f21_k.jpg.
The output variable cls_segms[17] also have 4 elements as cls_boxes[17], though there are only 2 dogs (class 17) on the image.
Moreover I don't understand the output of cls_segms. I suppose 'counts' is a segmentation mask. But in what format is it? How can I convert it into something human understandable?
'counts': 'nm0k4c=n0YO
7h4cHoJc7Q5HgJe7Z5]HbJc7_5 H\\J7f5cHSJ^7P6eHiI[7[6hH^IY7e6jHUIV7n6nHkHT7W7nHfHS7Z7oHcHQ7_7PI^HQ7b7RI[Hn6f7TIVHm6j7YInGh6T8_IbGb6_8j12M3N1O2O1N110O00010O000000O1O1000O010O010O010O0000010O1O002N100O1O1O2N1O2N10000O10O1O1O00100000O2O0O2N2N101N101O1O0O2O001N1O100O1000000000010O0O2O0O2N2M201N2O1O001O1N2O1N2M3N2N101N2O1O1O1N2N2N2L4L4M3N2N2O1N2M3M3M3L4M3M3N2O1N20000O10000O10000O101O001N101O1N2O1O1N3N1N3N2M3M4L5Kd0\O;E9G7I6J4Mg0XOa0@2N2N1N2O1O1N101O000O10000O100O10O0100O1O100O001O100O001O1O01O00001N1O2O0O2O0010O010O010O1O010O001O1O00000000O1O1O1O2O0O101O0100O010O100O10O0100O1O001O1O1O1O001O1O1O001O2N100O2N1O1O1O2N1O1O1N2O001O1O001O0010O0004L3M3L5L4G:BYhX;'
Counts is Run Length Encoded.
https://en.wikipedia.org/wiki/Run-length_encoding
you can use mask utils to get the mask into a numpy array. I'm assuming your familier with opencv
import pycocotools.mask as mask_util
# this will make a 2d array of 1's and 0's
mask = mask_util.decode(found_segment)
cv2.imshow("mask", mask * 255.0)
# this will find the contours and potentially give you polygons representing the segments of the object
_, contours, _ = cv2.findContours(mask.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
the Mask comes out the same way you passed it in when training. The data goes in via the Coco format
have a look here for the cocoapi
https://github.com/cocodataset/cocoapi
that package will give you pycocotools.
Thank you. The answer was very useful.
I figured out the answer to my question why I have more instances in cls_segms variable than I see it in the visualization. Every instance has a confidence of the prediction, and in the visualization are only shown those instances with confidence>0.7.
Hi锛宧ow can I convert the "courts" to a binary mask? I used the command:
mask = mask_util.decode(found_segment)
and replaced the "found_segment" with my "courts" and got an error like:
TypeError: string indices must be integers, not str
Can you tell me what's the correct way to use "decode"?
Thank you!
Oh I have found out how to do that~ Thank you~
@ll884856 How did you do it?
@thecondofitz I used commands like this:
`
import pycocotools.mask as mask_util
import cv2
import numpy as np
mask = mask_util.decode( [{"counts": "l]a?V3an02O001N101O0O10001O0000001O00001O00001O1O001O010O00001O00000000010O01O001O001O00001O00001O000000001O00000000000001O0001O0000000001O01O000000010O001O00001O00001O000000001O000000000000000010O00001O00001O000000001O00000001O0001O0001O00001O001O001O00001O0000001O00000000001O0001O01O01O01000O010O010O01O01O0001O0001O00000000001O0000001O00001O00001O001O000001O01O01O00001O10O01O001O001O0001O01O00000001O00001O001O1O001O00001O0000001O000000001O0000001O00001O001O1O01O01O001O00000001O00001O001O0010O0001O00001O000000001O000001O00001O00001O1O001O001O00001O0000001O000000001O0000010O00001O001O010O000010O00001O000010O0001O00000000001O0000000000010O00000001O01O01O010O1O010O010O01O00010O00000000000001O0001O0000000001O00000000001O01O01O0010O00010O1O0010O01O001O01O0001O00000000000001O000000000000001O00000000001O00001O010O00001O01O0000010O00001O00010O00001O00000000001O00000000000000001O00000000000000000000000000000000000000000000000000010O00001O010O00100O0010O0010O0001O01O0000000001O00000000000000000000000000001O01O0000001O001O001O001O001O00001O0000000000000O101O000O10000O100O100O1O10000O10000O10001N10001N101N2N6Fmkc=", "size": [1080, 1440]}])
cv2.imshow("mask", mask * 255.0)
`
Most helpful comment
Counts is Run Length Encoded.
https://en.wikipedia.org/wiki/Run-length_encoding
you can use mask utils to get the mask into a numpy array. I'm assuming your familier with opencv