Mixedrealitytoolkit-unity: Generic Input Action Handler

Created on 17 Apr 2019  路  8Comments  路  Source: microsoft/MixedRealityToolkit-Unity

Describe the problem

On the surface, Input Actions _appear_ to be a centralized place for defining user intents. For example, the application may have actions for things like 'Fire', 'Save' and 'Add a Note'.

Inside the MRTK configuration settings, actions can be triggered by any number of input sources. For example, the 'Fire' intent can be triggered by speaking the word "Fire" as well as by pressing the 'X' button on a controller.

Conceptually, a developer would then assume that they could handle the Action (the intent) regardless of the source that triggered it. However, in the current design that is not the case.

In the current design, in order to handle the 'Fire' intent the developer must handle each source event type. In this example they would need to handle the Speech Recognized event and check the event payload for the action as well as handle the Controller Pressed event and check the event payload for the action. This not only requires each intent handler to implement many interfaces, but it means that new sources added via configuration won't trigger until code is added to the handler.

Describe the solution you'd like

We would like a generic way to handle an Action regardless of the source that triggered it. Since many of the current sources have "starting" and "ending" events, this should be reflected too. The generic action handler would have Action_Started and Action_Ended. These would map to source events like Button_Pressed and Button_Released. A strategy would need to be defined for sources that do not provide both events. For example, a voice command would obviously trigger the Started event, but what should be done about Ended?

Describe alternatives you've considered

The alternate, as described above, requires implementing each and every source event type interface.

Additional context

Add any other context or screenshots about the feature request here.

Feature Request Input System Urgency-Soon

Most helpful comment

That article on LUIS is very interesting and somehow disorienting to read as a person named Luis.

All 8 comments

I agree on the convenience of having a mechanism to subscribe to input actions directly. We should look into it.

I don't fully get what you mean with:

implementing each and every source event type interface

Following the voice command example above, if you had a digital input action mapped to both a voice command and to binary controller inputs I think you would have to implement just two interfaces: IMixedRealityInputHandler and IMixedRealitySpeechHandler. Do you have use case in mind were you would need to implement more?

I also don't get:

new sources added via configuration won't trigger until code is added to the handler

If you have a global listener implementing IMixedRealityInputHandler for example, it will receive callbacks for all input sources raising those events. What did you mean there?

I don't fully get what you mean with:

  • implementing each and every source event type interface

In the scenario above, at the end of the Day 1 the application has two sources that can trigger the same action:

  1. Voice Command
  2. Controller Input

This required the application developer to implement two interfaces:

  1. IMixedRealitySpeechHandler
  2. IMixedRealityInputHandler

Already this requires more interface implementations than is ideally necessary for the specific task the developer wishes to perform. This is because the developer doesn't actualy care about _what was said_ nor does the developer care about _which button was pressed_. The developer only cares about the action the user wishes to perform. In this case, "The user wishes to fire".

I also don't get:

  • new sources added via configuration won't trigger until code is added to the handler

On Day 2, the UX Designer on the team decides to add a Gesture Handler that triggers the same action. However, when the application is run the gesture doesn't result in the expected behavior. This is because the Behavior developer didn't know that the UX Designer had added a new gesture trigger for the same action. In order for the gesture to work, the Behavior developer will need to go back into their code and implement the IMixedRealityGestureHandler interface. This breaks the separation of concern between Interaction Design and Software Development.

If you have a global listener implementing IMixedRealityInputHandler for example, it will receive callbacks for all input sources raising those events. What did you mean there?

All sources that can trigger actions do not inherit from IMixedRealityInputHandler. IMixedRealitySpeechHandler and IMixedRealityGestureHandler are current examples of sources that trigger actions but do not raise IMixedRealityInputHandler events. Also, because the input system is extensible, any number of future extension services could trigger actions without raising IMixedRealityInputHandler events. Therefore, IMixedRealityInputHandler is not a catch-all handler for actions regardless of their source.

I can definitely see where you're coming form here with thoughts from your work on LUIS.

I can see the possibility to making the actions a lot more generic this way, and it's an idea worth exploring.

Less interfaces for implementing the desired intent the better.

Thanks @jbienzms, I find that much better explained now.

_For actions to truly abstract inputs, there should be a single interface to subscribe to them._

Would you say this is a fair summary of the issue?

Yes @luis-valverde-ms, I think that's a pretty good statement. Especially since the original name for them is "_Input_ Actions".

A good conceptual model to take a look at is WPF Commanding Overview. Though I will also say that I think the MRTK input system does a fair bit more abstraction than WPF commands (which is a good thing!). WPF commands are still very UI-centric, where the MRTK input system can route input from a variety of sources to a variety of targets, sometimes with no user interface at all.

With a more abstract input system like MRTK, I find it helpful to conceptualize Actions as Intents. I know this was not the original design goal for Input Actions so I do realize I'm pushing the boundary here a bit. But to get an understanding what I mean see Intents in Natural Language for Simulations.

From an Intent perspective it would be more accurate to say:

_To best respond to the user intent, there should be a single interface to receive that intent._

Again, I realize this is taking things a bit further than what was originally designed so your original sentence is perfectly fine. I just wanted to share a few more thoughts on Intent Handling vs Input Handling.

Thanks Luis for taking the time here. We really appreciate the conversation.

That article on LUIS is very interesting and somehow disorienting to read as a person named Luis.

Partially fixed by #4475 . Created #4576 to keep track of the remaining changes.

I'm going to close this one and keep #4576 for the remaining work. @jbienzms feel free to reopen if you feel it is better to keep this too.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

overedge picture overedge  路  3Comments

Alexees picture Alexees  路  3Comments

jimstack picture jimstack  路  3Comments

markgrossnickle picture markgrossnickle  路  3Comments

StephenHodgson picture StephenHodgson  路  3Comments