Hello,
About "palm_detection.tflite" model,
How is the regressor output decoded to the actucal boxes, and also the way anchor boxes are generated. I've been trying to understand it for days now.
Thanks in anticipation.
Hi @lamarrr,
I'm also trying to understand the output formatting. Here's what I came up with so far:
So because this is a SSD-based architecture I've started with reading this great tutorail. Also by visualizing the model in Netron you can learn that it produces 3 prediction maps (unlike 6 in SSD) 32x32, 16x16 and 8x8. For anchor generation you can look into this file, which provides the general code for anchor generation.
If you're interested I've learned how to build a tflite interpreter with support for custom operations used in the model, so you can run inference in Python.
I still have trouble figuring out how rotations are encoded though.
Hopefully someone helps us out soon.
We were assigned to this project where I intern.
I don't have problem building the tflite interpreter in python. I'm concerned about the overheard and liable errors. I was able to build a custom interpreter in C++ which is far easier. Also I use a laptop with a small RAM and processing power, so building the whole Tensorflow makes my laptop hang all the time.
I think It's transformed so that it's aligned with one of the hand landmark features.
I was later able to reverse engineer everything and works perfectly for me now. Though not portable.
The regressors is primarily used to obtain a ROI for palm landmark detection.
I also used netron to visualize it about 4 days ago, only gave the classificators and regressors, the problem is with the regressors. Since the AnchorGenerator is standalone you can easily get their outputs with some modifications. This would give you the anchors that will be used in decoding the output of the regressors.
I didn't later use mediapipe, but a modified version of their code and so far it works pretty well.
It's a really good framework but with poor packaging and tooling.
Really hoping to address issues like this over the course of my next internship.
but be sure to be familiar with Protobuf, most of the definitions are provided in the 'formats' directory, the Ssd Anchors Calculators utilize this lightweight Protobuf serialization data formats. Which I also think makes it able to communicate with the java ni.
Hi @wolterlw , we use wrist and MCP of middle finger to rotate. See here.
Hi @lamarrr,
I'm also trying to understand the output formatting. Here's what I came up with so far:
So because this is a SSD-based architecture I've started with reading this great tutorail. Also by visualizing the model in Netron you can learn that it produces 3 prediction maps (unlike 6 in SSD) 32x32, 16x16 and 8x8. For anchor generation you can look into this file, which provides the general code for anchor generation.
If you're interested I've learned how to build a tflite interpreter with support for custom operations used in the model, so you can run inference in Python.I still have trouble figuring out how rotations are encoded though.
Hopefully someone helps us out soon.
Hi @lamarrr ,
Thank you for trying out Mediapipe. Your feedback is very appreciated! We are actively working on improving the framework, e.g. better API and tools. Hope you can find it easier to use in the near future :)
We were assigned to this project where I intern.
I don't have problem building the tflite interpreter in python. I'm concerned about the overheard and liable errors. I was able to build a custom interpreter in C++ which is far easier. Also I use a laptop with a small RAM and processing power, so building the whole Tensorflow makes my laptop hang all the time.I think It's transformed so that it's aligned with one of the hand landmark features.
I was later able to reverse engineer everything and works perfectly for me now. Though not portable.
The regressors is primarily used to obtain a ROI for palm landmark detection.
I also used netron to visualize it about 4 days ago, only gave the classificators and regressors, the problem is with the regressors. Since the AnchorGenerator is standalone you can easily get their outputs with some modifications. This would give you the anchors that will be used in decoding the output of the regressors.
I didn't later use mediapipe, but a modified version of their code and so far it works pretty well.
It's a really good framework but with poor packaging and tooling.
Really hoping to address issues like this over the course of my next internship.
Hi @lamarrr ,
Thank you for trying out Mediapipe. Your feedback is very appreciated! We are actively working on improving the framework, e.g. better API and tools. Hope you can find it easier to use in the near future :)We were assigned to this project where I intern.
I don't have problem building the tflite interpreter in python. I'm concerned about the overheard and liable errors. I was able to build a custom interpreter in C++ which is far easier. Also I use a laptop with a small RAM and processing power, so building the whole Tensorflow makes my laptop hang all the time.
I think It's transformed so that it's aligned with one of the hand landmark features.
I was later able to reverse engineer everything and works perfectly for me now. Though not portable.
The regressors is primarily used to obtain a ROI for palm landmark detection.
I also used netron to visualize it about 4 days ago, only gave the classificators and regressors, the problem is with the regressors. Since the AnchorGenerator is standalone you can easily get their outputs with some modifications. This would give you the anchors that will be used in decoding the output of the regressors.
I didn't later use mediapipe, but a modified version of their code and so far it works pretty well.
It's a really good framework but with poor packaging and tooling.
Really hoping to address issues like this over the course of my next internship.
Thanks 馃槈馃憤
Great job!
I'll be contributing and following up with the project as much as I can.
Hopefully will learn alot from you guys.
@lamarrr I'm trying to make sense of the documentation for days as well. The document is not really well sought which makes it hard to reuse this great framework