Keras-retinanet: (row, column) or (column, row) ?

Created on 22 Dec 2017 · 5Comments · Source: fizyr/keras-retinanet

I am new to this, so this question may sound dumb.

As far as what I know, here are some different notations for a point (1, 100)

0/0---column--->
 |   --------------**.** (1,100)
 |
row
 |
 |
 v
(matrix)

0/0---X--->
 |
 |
 Y
 |
 | **.** (1,100)
 v
 (opencv and plt)

So if I use my custom dataset, I believe I should go with the matrix one and do "xmin, ymin, xmas, ymax", since all the internal operations are on matrices.
But in keras-retinanet/examples/ResNet50RetinaNet - COCO 2017.ipynb, I see
cv2.rectangle(draw, (b[0], b[1]), (b[2], b[3]), ... for both detections and annotations.
Aren't those inverted (should be (b[1], b[0]), (b[3], b[2])?), or is there anything I missed?

Thanks in advance.

Source

xyiaaoo

Most helpful comment

Yeah that's right. All image coordinates and boxes follow the (xmin, ymin, xmax, ymax).

hgaiser on 22 Dec 2017

👍3

All 5 comments

"xmin, ymin, xmas, ymax"

I see what you did there ;)

Anyway, for numpy matrices you're right that the first coordinate is the row index. In an image however, we say that the X axis points to the right and the Y axis points down. I've never seen a different convention. This doesn't match with each other, but opencv deals with the difference internally.

Long story short: when specifying an opencv point or rectangle, you should use image coordinates, not matrix indices.

de-vri-es on 22 Dec 2017

👍1

I kiiinda got this theoretical part.
But how could opencv take in matrix indices when it actually asks for image coordinates?
Shouldn't it be cv2.rectangle(draw, (b[1], b[0]), (b[3], b[2]), ... instead of cv2.rectangle(draw, (b[0], b[1]), (b[2], b[3]), ...? But it works fine in ResNet50RetinaNet - COCO 2017.ipynb. Those bounding boxes look nice

xyiaaoo on 22 Dec 2017

The points for cv2.rectangle are given in image coordinates (x, y). The order of the data in the output of the network is (xmin, ymin, xmax, ymax), so the upper left point would be (xmin, ymin) (equivalent to (b[0], b[1]) in the example).

hgaiser on 22 Dec 2017

So annotations in my custom dataset are also in image coordinates?

xyiaaoo on 22 Dec 2017

Yeah that's right. All image coordinates and boxes follow the (xmin, ymin, xmax, ymax).

hgaiser on 22 Dec 2017

👍3

Was this page helpful?

0 / 5 - 0 ratings