Mixedrealitytoolkit-unity: [Plan of Record]: controller polling

Created on 1 Mar 2019  路  16Comments  路  Source: microsoft/MixedRealityToolkit-Unity

Overview

The purpose of this proposal is to decide on the general architecture for polling inputs from controllers. The solution proposed is to implement polling separately for each controller and expose it either via static utility classes or via extensions to IMixedRealityInputSystem.

Examples

In the following pseudo-code examples we鈥檒l use articulated hands but the concept should be applicable to all controller types

Using utility classes

static class ArticulatedHandsUtils
{
   // Returns the pose for a given hand joint or false if not available
   static bool TryGetHandJointPose(Handedness, Joint, out Pose);
}

// Client code must include utility class
using ArticulatedHandsUtils;

class Script : MonoBehaviour
{
   Update()
   {
      Pose tipPose;
      if (ArticulatedHandsUtils.TryGetHandJointPose(Right, Joint.IndexTip, tipPose))
      {
         // ...
      }
   }
}

Using extensions

namespace Extensions.ArticulatedHands
{
   class InputSystemExtension
   {
      // Returns the pose for a given hand joint or false if not available
      bool TryGetHandJointPose(this IInputSystem, Handedness, Joint, out Pose);
   }
}

// Client code must include the extension 
using Extensions.ArticulatedHands;

class Script : MonoBehaviour
{
   Update()
   {
      Pose tipPose;
      if (MixedRealityInputSystem.TryGetHandJointPose(Right, Joint.IndexTip, tipPose))
      {
         // ...
      }
   }
}

Pros and Cons

Pros

  • Better usability of custom APIs for each controller type compared with a generic API for all controllers.

    Cons

  • Custom code for each controller type`

  • Discoverability: it may be difficult for users make the connection between _I want to get a hand joint pose_ and _I need to use utility class/extension X_.

Most helpful comment

Their trouble was specifically with articulated hand controllers which, in our implementation to be released soon, do not fire events for finger joint poses.

There are more that 25 joints in each hand, all updated every frame, so it seemed like a polling model made more sense there instead of firing 50+ events each frame or gathering all the data in hand pose events.

Since we're dealing specifically with hands, maybe instead of drilling down into polling we could try abstracting UP to a robust helper class.

The first thing we see a lot of design-oriented devs write is some kind of wrapper that deals with gathering joint data and converts it to something more usable. Eg transforms or (preferably) structs detailing joint position / orientation / velocity / whatever.

This helper class could pre-screen and relay hand-related events in a way that's more consumable, like detecting fingertip intersections. And it could help reduce a lot of common operations like calculating joint velocity or proximity. We've seen projects where this stuff gets calculated 5 times a frame by different systems using raw joint data.

All 16 comments

Sounds like an easy way to put platform specific code back into the presentation layer, which breaks the platform agnostic approach we were looking to achieve.

Sounds like an easy way to put platform specific code back into the presentation layer, which breaks the platform agnostic approach we were looking to achieve.

This has been another area of strong customer feedback.

It is acknowledged that there is a risk of developers placing platform specific logic in their code and the team is committed to providing guidance / examples of how to avoid such issues.

Prior to posting his proposal @luis-valverde-ms spoke to the team and specifically mentioned support for polling by interaction mappings as the preferred approach while still enabling "what is the state of the A button".

Rest assured that everyone involved cares deeply about our cross-platform commitment in MRTK.

Not fully understanding the relationships between the components of the MRTK Service/Manager/Profiles and how these are consumed in the generic manner, make it very difficult to understand how this proposal would not be in complience to current standards and requirements..

That said @StephenHodgson documentation for XRTK and MRTK could use a detailed example of how to provide data profiles for specific controllers.

Else, I feel that we will continue to missinterprete the intent of the aganostic approch.

I'm also confused, because we ALSO poll the controller input's in the Data Provider. MR used to be events but we ripped that out when it caused issues and now it polls.

Why would we then want to add another layer of polling on top of the polling we are already doing?

This has been another area of strong customer feedback.

Where is this feedback, where is it documented and what were the issues. If we keep "Re-architecting" the MRTK everytime someone doesn't "get it" then all we are going to end up with is the HTK, which means you might as well abandon MRTK v2

Sorry for sounding harsh, be we never see / hear about this "customer feedback" and know even less about the counter proposals to better educate our community.

could use a detailed example of how to provide data profiles for specific controllers.

I def hear you there @hridpath. I'm working on that now.

I'm having a hard time trying to figure out exactly what problem this proposal is trying to solve.

If there is a native dll where developers will be getting articulated hand data, why can't developers just call into that dll directly in _any_ script? Seems rather pointless to request data from the input system if we're not going to route it through the input system.

If we keep "Re-architecting" the MRTK...

It's more of a side-step imho. Which I understand, but at that point, there's not much of a reason to use the toolkit.

Don't get me wrong, I think we should def be showing how to do it in both workflows. The traditional approach of just creating a MonoBehaviour then consuming input in the fast and dirty approach, and the performant enterprise version that was designed for v2. The important thing to keep in mind is you can't just do it half way, because there's more going on that goes unseen. For example, maybe one service is listening for input for a specific thing, that is now being rerouted out and round the framework, instead of being part of it.

... then all we are going to end up with is the HTK...

The input system in the old HTK is actually very similar to the MRTK's. The biggest difference is that the event handlers are now more generic as to the data they can receive and we've added input actions that can help developers have more control over which actions will trigger on a specific component.

Hello there! Excuse the lack of detail, I'm new to the open source way of life. I'll try to elaborate more on a few points.

The feedback leading to this proposal comes from early adopters of MRTK v2 that reached out directly to us with the concern that it was difficult to poll the current state of a controller. Their trouble was specifically with articulated hand controllers which, in our implementation to be released soon, do not fire events for finger joint poses. There are more that 25 joints in each hand, all updated every frame, so it seemed like a polling model made more sense there instead of firing 50+ events each frame or gathering all the data in hand pose events.

We did consider a generic approach that would make use of the interaction mappings in all controllers, something along these lines:

class MixedRealityInputSystem 
{ 
    // Returns the value of the pose input identified by inputId (controller-type specific ID) 
    bool GetPoseInput<ControllerType>(handedness, int inputId, ref poseOut); 
} 

class Script : MonoBehavior 
{ 
    Update() 
    { 
        if (MixedRealityInputSystem.GetPoseInput<IHandController>(Right, IHandController.IndexTip, tipPose)) 
        { 
            // ... 
        } 
    } 
} 

bool MixedRealityInputSystem::GetPoseInput<ControllerType>(handedness, inputId, ref poseOut) 
{ 
    for (IMixedRealityController controller in controllers) 
    { 
        if (controller.As<ControllerType>() != null && controller.handedness == handedness) 
        { 
            poseOut = controller.interactionMappings[inputId].Pose; 
            return true; 
        } 
    } 

    return false; 
} 

We ruled it out because:

  • It wasn't deemed as user friendly as having a dedicated API per controller type/category.
  • Required having an interaction mapping for each joint in hand controllers which is costly as MixedRealityInteractionMapping is currently quite large (160+ bytes x 25+ joints = 4+ KB per hand)
  • Each controller type would have to statically declare IDs for their inputs. At the moment, IDs (i.e. MixedRealityInteractionMapping::id) are assigned at runtime, although in most cases they are harcoded in .DefaultInteractions.get().

Now, I don't consider the proposed solution to be necessarily platform specific, more like device category specific. I would like to see polling utils implemented in terms of the interface/base class defining a device category, e.g. IArticulatedHands/ArticulatedHandsController. Any specific controller implementation in this category can be polled via the same interface. This is immediately useful for device emulation in the editor.

Let me know any more thoughts, doubts or suggestions. I'm particularly interested in pros and cons of helper classes vs extensions.

Right, that clarification makes more sense, and is more geared to the proper implementation for hand input, not about polling. So their issues were to do with the original hand tracking implementation you gave them.

Once that is in a feature release, we can take this feedback on board and use that as a criteria in it's implementation.

In reality, consumers of the MRTK should never have to poll as the input system event based logic should feed it directly to the user. Having multiple facilities polling the same data (especially hand data) is extremely expensive and we should limit the requirement for that.

That being said, we shouldn't rule it out and we should keep it in mind going forward.

Their trouble was specifically with articulated hand controllers which, in our implementation to be released soon, do not fire events for finger joint poses.

There are more that 25 joints in each hand, all updated every frame, so it seemed like a polling model made more sense there instead of firing 50+ events each frame or gathering all the data in hand pose events.

Since we're dealing specifically with hands, maybe instead of drilling down into polling we could try abstracting UP to a robust helper class.

The first thing we see a lot of design-oriented devs write is some kind of wrapper that deals with gathering joint data and converts it to something more usable. Eg transforms or (preferably) structs detailing joint position / orientation / velocity / whatever.

This helper class could pre-screen and relay hand-related events in a way that's more consumable, like detecting fingertip intersections. And it could help reduce a lot of common operations like calculating joint velocity or proximity. We've seen projects where this stuff gets calculated 5 times a frame by different systems using raw joint data.

That is an excellent suggestion @Railboy and it would cut down the impact to the MRTK. My only concern would be to ensure this helper class worked for each hand input controller (leap motion / magic leap), or is the intent to create device specific helper classes? (again no issue with that, it just ups the complexity)

or is the intent to create device specific helper classes?

No, I feel like that would be a burden on devs. I'd rather deal with a universal helper that shaves a few edges off specific device capabilities on my behalf than with a zoo of hand types.

I think it'd be similar to the raycasting utility we have, correct?

I guess any proximity / collision detection would resemble the raycasting utility, in the sense that it does common work in a performant way, yeah.

Though the raycasting utility is totally passive, so a hand helper class would diverge on that point. It also doesn't do any event screening or aggregation or broadcasting, which is all stuff I'd expect.

In an app using hands you can assume common calculations will need to be done by default. like joint velocity. So a hand helper would take initiative and do those automatically each frame, then make the results available, as opposed to doing them on demand (and wasting cycles every nth time it's demanded).

Makes sense to me!
Could we be doing all this stuff in a lower level native library and then exposing that data for anyone to consume? (Sounds like the plan is for this to be closed source anyway, so moving some of this there probably isn't a bad thing)

Not really sure. I don't want the discussion to veer too far off track from @luis-valverde-ms initial proposal - if everyone feels a helper class has merit then maybe we can start a new issue and hash out how it ought to be implemented.

I think it makes total sense to provide commonly used derived data (derived from "raw" input) for articulated hands but, as @Railboy suggested, we should deal with the details in a separate proposal.

About redudant polling, I don't think that's implicit in this proposal. What we're trying to achieve is a way for users to access the current state of a controller. It is up to the controller how to provide that state: if it caches its inputs, which most controllers do, we won't be polling against whatever SDK produced those inputs.

Regarding polling vs events, when it comes to controller inputs I think it is important to have both or, if you press me, just polling because eventing can be built on top of it. The way I see it events should only inform of state changes, e.g. a stationary six DOF controller (on a table for example) should not report its pose every frame via events. Polling allows you to obtain the actual state of the controller so a user script that just started running (i.e. it hasn't been keeping track of the controller pose via events) can know that the controller is on table.

Thoughts?

Like any other data input provider, the Hands component / helper would output data from it's API, the MRTK Provider would translate this in to the common understanding in the MRTK, which would then be fed to whichever Input system was registered, making it available to all components. The Provider would use whichever is the most valid pattern for gathering data, such as polling (which most other providers do already) as required.

There may need to be some slight change to the data structure MRTK will use for this special controller, ensuring it's abstracted enough (like we have done with other services) so that it's common enough to be updated by other providers for hand data, such as ML or Leap Motion.

It may be there might need to be an extra access path, but we should ensure this is the right thing to do due to the performance impact that would likely have.

Was this page helpful?
0 / 5 - 0 ratings