Azure-kinect-sensor-sdk: Random Forests instead of DNN on CPU Body Tracking

Created on 17 Feb 2020 · 12Comments · Source: microsoft/Azure-Kinect-Sensor-SDK

CPU Body tracking is simply way too slow, (see video below)
https://www.dropbox.com/s/j1cnjephldp95vr/Azure-Kinect-BodyTrackingv1.0-CPU-Test.MOV?dl=0

All cores on my Core i7 are running full tilt, and the body tracking is only going at 2 FPS.

The "random forest" technique that Kinect 2 used was fast, accurate, and reliable, and more importantly not locked to one GPU manufacturer. Random forest should be used for CPU mode, and not DNN.

https://onlinelibrary.wiley.com/doi/full/10.4218/etrij.13.2013.0063

It doesn't make sense to have CPU mode if it's completely unusable - having body tracking available to all systems is important!

Body Tracking Enhancement

Source

fractalfantasy

👍5

Most helpful comment

The random forest 2.5D depth-based approach of the v2 definitely had some advantages (less power hungry, less latency, more precise index map).
I do like the superior detection of joints in scenarios when not facing the sensor and generally more plausible/stable poses of the current deep learning (3D?) model fitting approach though.

Indeed as you mention it sounds like a hybrid approach with old/new and or switching between different methods could be very useful for many scenarios.
And I certainly hope the BT team is open to look at the older v1/v2 research and tracking models to improve upcoming versions.

Maybe you should open a feedback idea, so we can all upvote?
https://feedback.azure.com/forums/920053

Brekel on 18 Feb 2020

👍4

All 12 comments

Maybe you should open a feedback idea, so we can all upvote?
https://feedback.azure.com/forums/920053

Brekel on 18 Feb 2020

👍4

Yes I thought of that, but I believe this issue falls more under the issues/bug category, because CPU mode is available, but it's just completely unusable - without random forest the feature is essentially broken.

There are also people (including myself) asking for random forest within the "Body Tracking Without CudNN" topic: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38129473-body-tracking-without-cudnn

Copying the code over from the Kinect v1 & v2 code base should be relatively simple and would make the Azure Kinect so much more powerful.

$fractalfantasy picture$ fractalfantasy on 18 Feb 2020

The CPU mode was provided as a non-NVIDIA alternative for customers who are interested in non-realtime processing of recordings i.e.. delayed analysis applications. It was never intended to be used for (very) low frame rate realtime processing. We are working with the ONNX team to provide support for other GPUs.

There has been considerably research into skeleton tracking in the intervening 10+ years since the release of the original Kinect. The team evaluated re-implementing the dynamic forest approach vs a DNN and human model fitting approach. After consulting with existing customers and thinking longer term the team opted for the improved robustness and accuracy of a DNN approach vs performance of a dynamic forest approach. The current model fits in the middle of the GPU compute spectrum. We are currently investigating developing a lower end model that sacrifices some accuracy for more performance (think 30fps on 1050 level GPU).

qm13 on 19 Feb 2020

I understand that DNN is ultimately the way forward, but by not providing a legacy tracking option you are closing off so many users from Azure Kinect.

It’s like if apple removed the headphone jack on the iphone before bluetooth headphones were feasible and widely available.

http://youtu.be/k_3Dm-iYGic

We use body tracking as a musical instrument for our project and the DNN body tracking is simply too slow and unweildy for this - even with a GTX1080. We used our old Kinect 2 system across so many different laptops - but now we have to restrict it to high powered Nvidia rigs? and even with that, it runs worse than the old tracking.

Thats great that your looking to make DNN available on other GPUs. I’m not saying you should stop developing DNN body tracking and replace it with random forest, but at least have a legacy body tracking option for users who need solid performance.

Would it be a such a huge undertaking to port the old code over?

On Feb 19, 2020, at 4:23 PM, qm13 notifications@github.com wrote:

The CPU mode was provided as a non-NVIDIA alternative for customers who are interested in non-realtime processing of recordings i.e.. delayed analysis applications. It was never intended to be used for (very) low frame rate realtime processing. We are working with the ONNX team to provide support for other GPUs.

There has been considerably research into skeleton tracking in the intervening 10+ years since the release of the original Kinect. The team evaluated re-implementing the dynamic forest approach vs a DNN and human model fitting approach. After consulting with existing customers and thinking longer term the team opted for the improved robustness and accuracy of a DNN approach vs performance of a dynamic forest approach. The current model fits in the middle of the GPU compute spectrum. We are currently investigating developing a lower end model that sacrifices some accuracy for more performance (think 30fps on 1050 level GPU).

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

$fractalfantasy picture$ fractalfantasy on 24 Feb 2020

👍2

@qm13 To us, the accuracy of KinectV2 was good enough and delivered solid 30fps of skeletons. With more modern hardware and similar technology (random forests) I am sure 60fps would be achievable.

Talking about robustness and accuracy of a DNN compared to random forests: this is really relative to how the tracking is going to be used in the end.

If you're using body tracking for static/individual frame analysis, I'm sure DNN is able to track poses where random forests would fail.

But if you consider dynamic movement, specially fast movements: running, jumping, gestures, etc. performance is paramount, way more critical than individual frame accuracy.

So if you ask me what I preffer: a 60FPS random forest approach, or a 15fps DNN approach, I would vote for 60fps random forest.

Another reason for using random forests to allow low end devices to be used is not only hardware costs (a GTX 1070 is way too expensive for most users here), but also energy saving; Our software runs all day, and it's not very green to keep a high end GPU burning kilowatts like there's no end to the world. It's funny to think that special care was taken for the new kinect to be as low energy consuming as possible, just to tie it to a Gtx 1070 in after burner mode.

vpenades on 10 Mar 2020

👍2 ❤1

@qm13 There is a customer request in this regard too, and it was marked as Planned three months ago. If the Body Tracking SDK team has decided not to provide an alternative, non-DNN SDK for Azure Kinect, it may be good to open source the body tracking model and SDK used by Kinect-v2. This way the interested parties may try to upgrade it by themselves and reuse it for their own Azure Kinect apps and use cases.

rfilkov on 12 Mar 2020

👍1

Not sure if @qm13 is still reading this thread.. do we need to start another feature request to specifically ask for a random forest?

I’m sure the majority of those who upvoted ‘non-CUDDN body tracking’ wanted a fast real-time option for non-gaming rigs, not a slow offline DNN option. Looking through the comments that seems to be the case.

On Mar 12, 2020, at 6:17 AM, Rumen Filkov notifications@github.com wrote:

@qm13 There is a customer request in this regard too, and it was marked as Planned three months ago. If the Body Tracking SDK team has decided not to provide an alternative, non-DNN SDK for Azure Kinect, it may be good to open source the body tracking model and SDK used by Kinect-v2. This way the interested parties may try to upgrade it by themselves and reuse it for their own Azure Kinect apps and use cases.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

$fractalfantasy picture$ fractalfantasy on 12 Mar 2020

Hi @fractalfantasy , I'm sure all involved parties get e-mail notifications for closed issues as well, but it's different question, if they would like to reply.

I'm not sure though, if anyone still cares about these feature requests. See the request for hand state classifier, for instance. It has almost the same story as the non-DNN request. Its initial result was that the BT team has added the hands, hand tips and thumbs to the list of tracked joints, but forgot about the hand states. The 2nd hand-state-classifier request only got a modified title, and still hangs in there waiting. But anyway, feel free to add a RF body tracking specific feature request, post the link here, and I'll upvote it right away.

rfilkov on 12 Mar 2020

@rfilkov I think it may be slow to get a response simply because there are limited resources and there are probably more existentially important bugs for their team to work on at the moment AND there's the coronavirus going on and the vast majority of Microsoft developers are working from home now so their dogs are going to distract them.

Greendogo on 12 Mar 2020

ok so threw together a feature request, i tried to include all your points @rfilkov @Brekel @vpenades, wrote this quite quickly so please correct me if I missed something!

https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/39945454-legacy-body-tracking-like-kinect-v2

fingers crossed - really hope this works out!

$fractalfantasy picture$ fractalfantasy on 16 Mar 2020

👍2

I can concur that the accuracy and latency of Kinect V2 was really good and whether to invest in a new camera or not comes down to if the new Azure Kinect is as good as K2 or better in this aspect.
Using the Kinect as a truly real time and interactive body tracking device is similar to playing on a virtual musical instrument (even for non-musical apps) that is, you need to feel that latency is small and fluent.
Otherwise it will become useless, as playing a keyboard with very high latency is impossible or trying to speak in a microphone with delay in speakers, your brain just gets interrupted.