Hi,
Thanks for the questions.
1) For cropping the correct image region, we first rotate the image to an angle that the vector connecting wrist and MCP is vertical. Then we extend the palm square on each direction with fairly large scale based on the metrics from our experiment. Plus our model is trained with large augmentation to capture the variance of hand location within the cropped region. You can find the implementation detail in the Mediapipe hand tracking graph.
2) You are absolutely correct that the detail of the hand is not predicted well. This is mostly because the model doesn't handle motion blur well enough and of course the model itself is not perfect yet.
We are keep working on improving the model quality in various aspects. Your feedback is very appreciated!
It's very helpful. Looking forward to your further work. Thank you!
Most helpful comment
Hi,
Thanks for the questions.
1) For cropping the correct image region, we first rotate the image to an angle that the vector connecting wrist and MCP is vertical. Then we extend the palm square on each direction with fairly large scale based on the metrics from our experiment. Plus our model is trained with large augmentation to capture the variance of hand location within the cropped region. You can find the implementation detail in the Mediapipe hand tracking graph.
2) You are absolutely correct that the detail of the hand is not predicted well. This is mostly because the model doesn't handle motion blur well enough and of course the model itself is not perfect yet.
We are keep working on improving the model quality in various aspects. Your feedback is very appreciated!