Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn鈥檛 carefully read the documentation and considered the example, I apologize in advance.
Is it possible using the Hand Tracking (GPU) example to extract not an video, but an array of keypoints? Perhaps I didn鈥檛 carefully read the documentation and considered the example, I apologize in advance.
@AndrGolubkov Have you tried this?
https://github.com/google/mediapipe/tree/master/mediapipe/models
https://www.tensorflow.org/lite/guide/inference
@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output
@AndrGolubkov It can't output precise coordinates!! You can see the graph below carefully:
https://mediapipe.readthedocs.io/en/latest/hand_tracking_mobile_gpu.html
You can find that the 2D keypoints location output are based uv-coordinate.
And in the file /mediapipe/tflite/tflite_tensors_to_landmarks_calculator.proto, you will find that the output z-coordinate is normalized, which don't have real scale.
@ajinkyapuar Yes, I noticed that the models are available. But here the question is precisely in obtaining an array of coordinates instead of the rendered output
If you look at the hand tracking graph as @MedlarTea mentioned, you can find the landmarks are output from the HandLandmarkSubgraph at stream "LANDMARKS:hand_landmarks".
You can find the definition of landmark here.
And @MedlarTea is correct about the scale that the output landmarks is in the image coordinates.
@AndrGolubkov
Are you asking about getting the landmark coordinates in C++ (e.g., to be used in another calculator), or getting them in Android to be consumed in the Android application?
@chuoling
How to get "LANDMARKS:hand_landmarks" in Android?
I ran below code, becuase "LANDMARKS:hand_landmarks" is vector of proto, but failed.
processor.getGraph().addPacketCallback("hand_landmarks", new PacketCallback() {
@Override
public void process(Packet packet) {
PacketGetter.getVectorOfPackets(packet);
}
});
And I think a function to get type of packet is necessary.
@chuoling I am interested in using and getting key points on iOS.
I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.
@MedlarTea
But theoretically, it is possible to precisely extract keypoints by using hand landmark model file combined with MediaPipe, isn't it?
I mean, if this was not possible, so why the rendered video can denote those landmarks so exactly like that?
I have the same question. But I'm wondering if the input is a 2d image then it's so hard to extract a 3D coordinator. Unless the input is a depth image containing depth data.
Hi @oishi89 ,
The model takes in RGB only and output 3D coordinates. We trained out model jointly with synthetic data which has 3D coordinates and the model was able to generalize the z-coord to real images (although it's not perfect yet, we are actively working on it). You can read the Hand Landmark Model session in our blogpost for more detail.
@AndrGolubkov @astinmiura
We'll look into how to best access such information in the iOS/Java API, and provide an example in the next release.
@chuoling Thank you very much, that would be great
@chuoling Thank you, we were hoping for such an API when we first read about this project. We all would appreciate this.
@Hemanth715 @AndrGolubkov @astinmiura Before such an API is available, we have an intermediate solution in C++. See issue from #200 where we have example of Normalizedlandmark protos
@AndrGolubkov @astinmiura Fixed in v0.6.6 Pls check it out and let us know
Is there any way to extract the keypoint in python so that i can use these in the VR project ? Thankyou :)
Most helpful comment
@AndrGolubkov @astinmiura
We'll look into how to best access such information in the iOS/Java API, and provide an example in the next release.