Mediapipe: Hand keypoint extraction

Created on 29 Aug 2019 · 16Comments · Source: google/mediapipe

Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn’t carefully read the documentation and considered the example, I apologize in advance.

feature request question

Source

AndrGolubkov

Most helpful comment

@AndrGolubkov @astinmiura
We'll look into how to best access such information in the iOS/Java API, and provide an example in the next release.

chuoling on 16 Sep 2019

👍4 ❤3

All 16 comments

Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn’t carefully read the documentation and considered the example, I apologize in advance.

@AndrGolubkov Have you tried this?
https://github.com/google/mediapipe/tree/master/mediapipe/models
https://www.tensorflow.org/lite/guide/inference

ajinkyapuar on 29 Aug 2019

@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output

AndrGolubkov on 2 Sep 2019

@AndrGolubkov It can't output precise coordinates!! You can see the graph below carefully:
https://mediapipe.readthedocs.io/en/latest/hand_tracking_mobile_gpu.html
You can find that the 2D keypoints location output are based uv-coordinate.
And in the file /mediapipe/tflite/tflite_tensors_to_landmarks_calculator.proto, you will find that the output z-coordinate is normalized, which don't have real scale.

MedlarTea on 3 Sep 2019

@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output

If you look at the hand tracking graph as @MedlarTea mentioned, you can find the landmarks are output from the HandLandmarkSubgraph at stream "LANDMARKS:hand_landmarks".
You can find the definition of landmark here.
And @MedlarTea is correct about the scale that the output landmarks is in the image coordinates.

fanzhanggoogle on 6 Sep 2019

@AndrGolubkov
Are you asking about getting the landmark coordinates in C++ (e.g., to be used in another calculator), or getting them in Android to be consumed in the Android application?

chuoling on 7 Sep 2019

@chuoling
How to get "LANDMARKS:hand_landmarks" in Android?

I ran below code, becuase "LANDMARKS:hand_landmarks" is vector of proto, but failed.

processor.getGraph().addPacketCallback("hand_landmarks", new PacketCallback() {
  @Override
  public void process(Packet packet) {
    PacketGetter.getVectorOfPackets(packet);
  }
});

And I think a function to get type of packet is necessary.

astinmiura on 8 Sep 2019

👍1

@chuoling I am interested in using and getting key points on iOS.

AndrGolubkov on 9 Sep 2019

I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.

oishi89 on 11 Sep 2019

@MedlarTea
But theoretically, it is possible to precisely extract keypoints by using hand landmark model file combined with MediaPipe, isn't it?
I mean, if this was not possible, so why the rendered video can denote those landmarks so exactly like that?

metalwhale on 12 Sep 2019

I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.

Hi @oishi89 ,
The model takes in RGB only and output 3D coordinates. We trained out model jointly with synthetic data which has 3D coordinates and the model was able to generalize the z-coord to real images (although it's not perfect yet, we are actively working on it). You can read the Hand Landmark Model session in our blogpost for more detail.

fanzhanggoogle on 13 Sep 2019

👍3

@AndrGolubkov @astinmiura
We'll look into how to best access such information in the iOS/Java API, and provide an example in the next release.

chuoling on 16 Sep 2019

👍4 ❤3

@chuoling Thank you very much, that would be great

AndrGolubkov on 16 Sep 2019

👍1

@chuoling Thank you, we were hoping for such an API when we first read about this project. We all would appreciate this.

Hemanth715 on 17 Sep 2019

👍3

@Hemanth715 @AndrGolubkov @astinmiura Before such an API is available, we have an intermediate solution in C++. See issue from #200 where we have example of Normalizedlandmark protos

mgyong on 5 Nov 2019

@AndrGolubkov @astinmiura Fixed in v0.6.6 Pls check it out and let us know

mgyong on 3 Dec 2019

Is there any way to extract the keypoint in python so that i can use these in the VR project ? Thankyou :)

faizahmed618 on 6 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Deeplab v3 support

PrinceP · 5Comments

Javascript Solutions Source Code

Choons · 4Comments

Is it possible to implement Windows Desktop GPU Support using the ANGLE library?

shraiwi · 5Comments

Mediapipe Hair Color using Slider error

SwatiModi · 5Comments

Hello, in the android gesture recognition, I want to use your own video YUV data is passed to the mediapipe, instead of using the data of mobile phone camera, I need how to modify com.google.mediapipe.com ponents package of files. Please help me, thank you.

Sanerly · 4Comments