Mediapipe: How to obtain unnormalized landmark coordinates?

Created on 26 May 2020  路  3Comments  路  Source: google/mediapipe

I am currently running the mediapipe_multi_hands_tracking_aar_example.
Inside the MainActivity class I am able to obtain the normalized x and y coordinates of the landmarks by using getX() and getY() on the received NormalizedLandmarks. However, I am trying to extract the box of pixels around the finger tips for each frame. To do this, I need the unnormalized, precise coordinates of the finger tip landmarks. The incoming frames are 1080 x 1440 pixels, however the normalized landmark coordinates are all between [0,1]. Using the normalized landmark coordinates to map out the actual finger tip pixels will not be accurate at all.

How can I obtain the coordinates of the original, unnormalized landmarks and use it in my MainActivity? If this is possible, what do these unnormalized coordinates represent? Are they pixels coordinates? Thank you.

Most helpful comment

The beauty of normalized coordinates is that they [generally] do not care what size of output is used.
Using getX()*width & getY()*height should be plenty accurate.

Let's say you have a hand coordinate, in pixel space, of (123,123) with your 1080 x 1440 image, and it moves 1 pixel up (123,124):
In normalized coordinates this would be
123/1440.0 = 0.0854166666666666
124/1440.0 = 0.08611111111111111
meaning there is a difference of
abs(0.08541666666666667 - 0.08611111111111111) = 0.000694444444444442
which floating point can easily handle, even more accurate than integer coordinates, allowing for sub-pixel accuracy.

All 3 comments

Why do you think that it will not be accurate at all? The landmarks are at least single precision float point values.

I am successfully extracting the normalized landmark coordinates and then processing them further for on-frame drawing without any issues.

The beauty of normalized coordinates is that they [generally] do not care what size of output is used.
Using getX()*width & getY()*height should be plenty accurate.

Let's say you have a hand coordinate, in pixel space, of (123,123) with your 1080 x 1440 image, and it moves 1 pixel up (123,124):
In normalized coordinates this would be
123/1440.0 = 0.0854166666666666
124/1440.0 = 0.08611111111111111
meaning there is a difference of
abs(0.08541666666666667 - 0.08611111111111111) = 0.000694444444444442
which floating point can easily handle, even more accurate than integer coordinates, allowing for sub-pixel accuracy.

I realize my misinterpretation now. Thank you all for the input. I do have a new issue about accessing raw camera frames and would appreciate any help. https://github.com/google/mediapipe/issues/793

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidakr picture davidakr  路  4Comments

Cubbee picture Cubbee  路  5Comments

SwatiModi picture SwatiModi  路  5Comments

baocareos picture baocareos  路  5Comments

calvin422 picture calvin422  路  3Comments