with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(p)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
boxes=np.squeeze(boxes)
classes=np.squeeze(classes).astype(np.int32)
scores=np.squeeze(scores)
num_detections = np.squeeze(num_detections)
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
boxes,
classes.astype(np.int32),
scores,
category_index,
use_normalized_coordinates=True,
line_thickness=8)
# plt.figure(figsize=IMAGE_SIZE)
print image_path
# plt.imshow(image_np)
on the above, the information of output of the model consists of boxes,scores,classes, I want to evaluate my model ,but how to use that ?
for example ,the classes outputs:
boxes outputs:
boxes: [[ 0.10284555 0.58427072 0.22631724 0.70474339]
[ 0.83406907 0.65002507 0.94289017 0.77105808]
[ 0.78632039 0.79019397 0.90252429 0.91139358]
...]
classes : [3 3 3 3 1 3 3 2 3 3 5 4 3 3 2 3 3 3 3 3 1 1 2 1 5 1 5 1 5 1 5 3 3 3 1 1 3
3 3 3 3 3 3 1 3 3 1 3 1 3 3 3 1 1 3 2 3 2 1 3 3 4 3 3 3 3 3 3 3 3 3 3 5 3
3 1 1 1 1 3 1 3 3 2 4 3 2 3 2 1 2 3 4 1 3 2 1 1 3 1]
scores outputs:
scores: [ 0.95616055 0.89709145 0.85906911 0.06335787 0.02520643 0.02324397
0.01829346 0.01510186 0.01466805 0.01359926 0.01264461 0.00913072
0.00761257 0.00743888 0.00734038 0.00706119 0.00675951 0.00671593
0.00664596 0.00647511 0.00627282 0.00615904 0.0060715 0.00584353
0.00547123 0.00532609 0.00513607 0.0051044 0.0044906 0.00441345
0.00435485 0.00433881 0.00431149 0.00431057 0.00427277 0.00421458
0.00415938 0.00409394 0.00408323 0.003925 0.00386971 0.00380281
0.00373316 0.00372564 0.00368697 0.00359879 0.00359426 0.00358267
0.00353963 0.0033366 0.00330383 0.00314544 0.00308296 0.00308134
0.00308132 0.00305725 0.00298005 0.00294961 0.00292675 0.00286796
0.00282848 0.00282805 0.00277512 0.00269225 0.00267927 0.00264838
0.00264094 0.00262854 0.00260897 0.00258672 0.00255843 0.00255719
0.00254792 0.00248044 0.00244716 0.00243139 0.00241957 0.0024156
0.00239647 0.00237698 0.00236842 0.00235493 0.00229962 0.00229157
0.00228014 0.00227027 0.00226229 0.00225496 0.00224126 0.00222978
0.00222819 0.00215887 0.00211833 0.00211493 0.00210062 0.00205547
0.00205226 0.00205122 0.00203317 0.00201525]
num_detections outputs:
num_detections: 100.0
Hi @wxp0329 - are you asking how to interpret those arrays?
@jch1 yes, I don't understand what stored in the boxes array and how to use it ? for example, using IOU method which need parameters (GT = GroundTruth diagonal coordinates; DR = DetectionResult diagonal coordinates) to evaluate .
`def IOU(Reframe,GTframe):
"""
Reframe,GTframe: two rectangle's diagonal coordinates (x,y) ·
"""
x1 = Reframe[0];
y1 = Reframe[1];
width1 = Reframe[2]-Reframe[0];
height1 = Reframe[3]-Reframe[1];
x2 = GTframe[0];
y2 = GTframe[1];
width2 = GTframe[2]-GTframe[0];
height2 = GTframe[3]-GTframe[1];
endx = max(x1+width1,x2+width2);
startx = min(x1,x2);
width = width1+width2-(endx-startx);
endy = max(y1+height1,y2+height2);
starty = min(y1,y2);
height = height1+height2-(endy-starty);
if width <=0 or height <= 0:
ratio = 0
else:
Area = width*height; # two rectangle intersection area
Area1 = width1*height1;
Area2 = width2*height2;
ratio = Area*1./(Area1+Area2-Area);
# return IOU
return ratio,Reframe,GTframe`
So the boxes are in [ymin xmin ymax xmax] format and always normalized relative to image size. So for example if the boxes array is:
[[.1, .2, .3, .4],
[.5, .75, 1, 1]]
then it means you have two boxes, the second of which has minimum y coordinate .5, minimum x coordinate .75, and maximum x and y coordinates of 1.
For each box there is a score and a class. So if the score array is
[.2, .6], and the class array is [3, 5],
then the first detection is class 3 with 20% confidence and the second detection is class 5 with 60% confidence. What the class indices mean is determined by the label map.
Does this help?
@jch1 Thank you very much! Your answer is perfect ! It is my wanted . 向大佬致敬!
Closing, this issue since it was clarified by @jch1
Hi @jch1, @wxp0329 , I am still confused. I have an image with one person in it. The algorithm draws a bounding box around the person and a text: "person 46%". But my results are the following:
scores
array([[ 0.93932933, 0.07025577, 0.06598418, 0.06165977, 0.0580634 ,
0.05620686, 0.05363677, 0.0527909 , 0.05187344, 0.04901367,
0.04837012, 0.04710015, 0.04552074, 0.04471788, 0.04386492,
0.04365378, 0.04354041, 0.04262244, 0.04186066, 0.04124551,
0.03997024, 0.03989998, 0.03914978, 0.03888754, 0.03858582,
0.03857578, 0.03828432, 0.03654354, 0.0363818 , 0.03629457,
0.03586311, 0.03501716, 0.03483037, 0.03412054, 0.03373158,
0.03365625, 0.03339461, 0.03309092, 0.0328337 , 0.03277906,
0.03218234, 0.03193506, 0.03176387, 0.03132074, 0.03123432,
0.03091392, 0.03077964, 0.03000043, 0.02985074, 0.02973447,
0.02970752, 0.02962561, 0.02953558, 0.02945234, 0.02911837,
0.02904501, 0.02881845, 0.02841894, 0.02828046, 0.02784052,
0.02780956, 0.027736 , 0.02761369, 0.0273649 , 0.027158 ,
0.02705142, 0.02665204, 0.02642948, 0.02625343, 0.02618215,
0.02590469, 0.02589623, 0.02567873, 0.02561364, 0.02560339,
0.02548196, 0.02514659, 0.02514326, 0.02501463, 0.02489797,
0.0246844 , 0.02457524, 0.02453672, 0.02440646, 0.02429095,
0.02427525, 0.02422404, 0.02414469, 0.02403057, 0.02401115,
0.02393561, 0.0238783 , 0.02385183, 0.02379306, 0.02369065,
0.02360767, 0.02350716, 0.02327218, 0.02322651, 0.02319981]], dtype=float32)
classes
array([[ 1., 1., 1., 1., 1., 77., 32., 1., 1., 1., 32.,
32., 1., 32., 1., 32., 1., 32., 1., 62., 32., 32.,
31., 1., 1., 31., 32., 32., 1., 1., 1., 32., 1.,
1., 1., 32., 52., 32., 31., 1., 1., 32., 1., 32.,
1., 1., 77., 1., 1., 32., 32., 1., 32., 1., 1.,
1., 31., 1., 1., 1., 32., 62., 32., 1., 32., 32.,
1., 1., 77., 1., 1., 1., 1., 1., 77., 1., 63.,
52., 1., 52., 63., 32., 31., 32., 31., 62., 1., 1.,
31., 77., 1., 1., 1., 1., 62., 1., 32., 52., 32.,
1.]], dtype=float32)
From this arrays I can see that class 1 was detected with 98%, then with 7%, ... class 77 with 5%. How is the 46% computed? And why the classes appear multiple times in the array?
Thank you :)
Hi,I am still confused, the box is returned as nd array, so how to I know the excat coordinate of each box of those detected result on the image? and how do I know each box is the result of which object detected in the image?
@UnforgivenZZZ you can check https://github.com/tensorflow/models/blob/master/object_detection/utils/visualization_utils.py
Looking at the code, you can understand more about how does the code process the data. The boxes array provides the normalized coordinates of the box. You can manually match it by index. Element 1 in boxes array represents rectangle coordinates defined as X class in first element in classes array with X score as first element of the scores array.
Hi @jch1, @wxp0329 , I am still confused. I have an image with one person in it. The algorithm draws a bounding box around the person and a text: "person 46%". But my results are the following:
scores
array([[ 0.93932933, 0.07025577, 0.06598418, 0.06165977, 0.0580634 ,
0.05620686, 0.05363677, 0.0527909 , 0.05187344, 0.04901367,
0.04837012, 0.04710015, 0.04552074, 0.04471788, 0.04386492,
0.04365378, 0.04354041, 0.04262244, 0.04186066, 0.04124551,
0.03997024, 0.03989998, 0.03914978, 0.03888754, 0.03858582,
0.03857578, 0.03828432, 0.03654354, 0.0363818 , 0.03629457,
0.03586311, 0.03501716, 0.03483037, 0.03412054, 0.03373158,
0.03365625, 0.03339461, 0.03309092, 0.0328337 , 0.03277906,
0.03218234, 0.03193506, 0.03176387, 0.03132074, 0.03123432,
0.03091392, 0.03077964, 0.03000043, 0.02985074, 0.02973447,
0.02970752, 0.02962561, 0.02953558, 0.02945234, 0.02911837,
0.02904501, 0.02881845, 0.02841894, 0.02828046, 0.02784052,
0.02780956, 0.027736 , 0.02761369, 0.0273649 , 0.027158 ,
0.02705142, 0.02665204, 0.02642948, 0.02625343, 0.02618215,
0.02590469, 0.02589623, 0.02567873, 0.02561364, 0.02560339,
0.02548196, 0.02514659, 0.02514326, 0.02501463, 0.02489797,
0.0246844 , 0.02457524, 0.02453672, 0.02440646, 0.02429095,
0.02427525, 0.02422404, 0.02414469, 0.02403057, 0.02401115,
0.02393561, 0.0238783 , 0.02385183, 0.02379306, 0.02369065,
0.02360767, 0.02350716, 0.02327218, 0.02322651, 0.02319981]], dtype=float32)classes
array([[ 1., 1., 1., 1., 1., 77., 32., 1., 1., 1., 32.,
32., 1., 32., 1., 32., 1., 32., 1., 62., 32., 32.,
31., 1., 1., 31., 32., 32., 1., 1., 1., 32., 1.,
1., 1., 32., 52., 32., 31., 1., 1., 32., 1., 32.,
1., 1., 77., 1., 1., 32., 32., 1., 32., 1., 1.,
1., 31., 1., 1., 1., 32., 62., 32., 1., 32., 32.,
1., 1., 77., 1., 1., 1., 1., 1., 77., 1., 63.,
52., 1., 52., 63., 32., 31., 32., 31., 62., 1., 1.,
31., 77., 1., 1., 1., 1., 62., 1., 32., 52., 32.,
1.]], dtype=float32)From this arrays I can see that class 1 was detected with 98%, then with 7%, ... class 77 with 5%. How is the 46% computed? And why the classes appear multiple times in the array?
Thank you :)
And I have the same question, this only outputs the confidence value ONLY over the predicted class. Then how am I able to generate the probablity over all the categories based on the given scores for this test batch(for say the batch size is one)? Or is there anything to do pre or post training?
use non maximum suppression (NMS) to get the output can be solved
This might be a new topic, But does anyone have a source for the calculation of the detection score? I understand the interpretation of it, but want to know how it is derived.
Most helpful comment
So the boxes are in [ymin xmin ymax xmax] format and always normalized relative to image size. So for example if the boxes array is:
[[.1, .2, .3, .4],
[.5, .75, 1, 1]]
then it means you have two boxes, the second of which has minimum y coordinate .5, minimum x coordinate .75, and maximum x and y coordinates of 1.
For each box there is a score and a class. So if the score array is
[.2, .6], and the class array is [3, 5],
then the first detection is class 3 with 20% confidence and the second detection is class 5 with 60% confidence. What the class indices mean is determined by the label map.
Does this help?