Winit: A keyboard input model

Created on 6 Jan 2019  ·  149Comments  ·  Source: rust-windowing/winit

TLDR: I think that Winit needs more expressive keyboards events and to follow a written specification to keep platform inconsistencies to a minimum. I propose to adapt the JS KeyboardEvent for winit and to follow the UI Events specification for keyboard input.

Winit is used for many applications that need
to handle different kinds of keyboard input.

  • Games: Physical location of keys like
    WASD for movement and actions.
    Text inpput for names and chat.
  • GUI applications: Text input and keyboard shortcuts.
  • the Servo Browser: Wants to support JS [KeyboardEvent] well.

Currently there are two events for text input in Winit:
[KeyboardInput] and [ReceivedCharacter].

pub struct KeyboardInput {
    pub scancode: ScanCode,
    pub state: ElementState,
    pub virtual_keycode: Option<VirtualKeyCode>,
    pub modifiers: ModifiersState,
}

The KeyboardInput event carries information about keys pressed and released.
scancode is a platform-dependent code identifying the physical key.
virtual_keycode optionally describes the meaning of the key.
It indicates ASCII letters, some punctuation and some function keys.
modifiers tells if the Shift, Control, Alt and Logo keys are currently pressed.

The ReceivedCharacter event sends a single Unicode codepoint. The character can
be pushed to the end of a string and if this is done for all events the user
will see the text they intended to enter.

Shortcomings

This is my personal list in no particular order.

  1. List of VirtualKeyCode is seen as incomplete (#71, #59).
    Without a given list it is hard to decide which keys to include
    and when the list is complete.
    Also it is necessary to define each virtual key code so multiple platforms will
    map keys to the same virtual key codes.
    While it probably uncontroversial that ASCII keys should be included
    for non-ASCII single keys found on many keyboards like é, µ, or ü
    it is more difficult to decide and to create an exhaustive list.
  2. While VirtualKeyCode should capture the meaning of the key there
    are different codes for e.g. "0": Key0 and Numpad0 or LControl and RControl.
  3. The ScanCode is platform dependent. Therefore apps wanting to use keys like
    WASD for navigation will assume an QWERTY layout instead of
    using the key locations.
  4. It is unclear if a key is repeated or not. Some applications only want to
    act on the first keypress and ignore all following repeated keys. Right
    now these applications need to do extra tracking and are probably not
    correct if the keyboard focus changes while a key is held down. (#310)
  5. A few useful modfiers like AltGraph and NumLock are missing.
  6. There is no relation between ReceivedCharacter and KeyboardInput
    events. While this is not necessary for every application some
    (like browsers) need it and have to use ugly (and incorrect) work-arounds. (#34)
  7. Dead-key handling is unspecified and IMEs (Input Method Editors) are not supported.

In general there are many issues that are platform-dependant and where it is
unclear what the correct behavior is or it is not documented.
Both [alacritty] and [Servo] just to name two applications have multiple
issues where people mention that keyboard input does not work as expeced.

Proposed Solution

Winit is not the first software that needs to deal with keyboard input on
a variety of platforms. In particular the web platform has a complete
specification how [keyboard events should behave] which is implemented on
all platforms that Winit aims to support.

While the specification talks about JS objects it can be easily ported
to Rust. Some information is duplicated in KeyboardEvent for
backwards compatibility but this can be omitted in Rust so Winit stays simpler.

See the [keyboard-types] for how keyboard events can look like in Rust.

  • (shortcoming 1) VirtualKeyCode is replaced with a Key. This is an enum
    with all the values for functional keys and a variant for Unicode values
    that stores printable characters both from the whole Unicode range.
    Specification
  • (shortcoming 2) is also adressed by this. There is just one value for keys
    like "Control" but if necessary one can distinguish left/right or
    keyboard/numpad keys by their location attribute.
  • (shortcoming 3) ScanCode is complemented by Code. Codes describe
    physical key locations in a cross-platform way.
    Specification
  • (shortcoming 4) a repeat attribute is added.
  • (shortcoming 5) All known modifier keys are supported.
    **Specification
    Note: W3C decided to include some keys that are usually handled
    in hardware and don't emit keyboard events (like Fn, FnLock)
  • (shortcoming 6) received characters and keyboard events are now one
    (exceptions see below)
  • (shortcoming 7) to handle dead keys and IMEs a composition event
    is introduced. It describes the text that should be added at
    the current cursor position. Specification
    Note: The introduction composition events makes it a bit harder to
    get "just the text" which is currently emitted by ReceivedCharacter.
    Either ReceivedCharacter is kept around for easier use or a utility
    function is provided that takes keyboard and composition events and
    emits the printable text.

Implementation

This is obviously a breaking change so there needs to be a new release of winit and release notes.
While the proposed events are very expressive it is possible to convert Winit to the new
events first and then improve each backend to emit the additional information about key-codes,
locations, repeating keys etc.

Thank you for writing and maintaining Winit! I hope this helps to get a discussion about keyboard input handling started and maybe some ideas or even the whole proposal is implemented in Winit.

needs discussion api enhancement platform parity

Most helpful comment

I still feel like it's important to have consistency across implementations. It's not ideal if client code needs platform dependant code to implement basic functionality.

All 149 comments

Hi, and thanks for taking the time to put this together! Overall, I like the direction this is going, but there are some specific feedback points that come up for this.

VirtualKeyCode is replaced with a Key. This is an enum with all the values for functional keys and a variant for Unicode values that stores printable characters both from the whole Unicode range.

Being more general on this would be a good change. I don't like using a full String for this, though - it introduces various issues that I'm not particularly happy with:

  • A full String is more difficult to match on than an enum, str, or char.
  • Unicode characters have multiple cases, while keyboard keys only have one case. This could introduce some tricky bugs into people's applications.

Unfortunately I can't think of a good replacement that's as flexible as a string while accounting for both of those issues, but it's something that rubs me the wrong way.

There is just one value for keys like "Control" but if necessary one can distinguish left/right or keyboard/numpad keys by their location attribute.

I like the idea of having a left/right enum to distinguish between sided keys. However, the Location enum should be exposed through variants in the Key enum (e.g. Ctrl(Location)), rather than on the main KeyboardEvent struct we expose.

ScanCode is complemented by Code. Codes describe physical key locations in a cross-platform way. Specification

Making scan codes platform-independent is certainly something we should do, although the W3C Code specification relies a bit too much on the layout of the US keyboard for my liking. Perhaps we should use some sort of numeric index for this? I feel we should also remove ScanCode support entirely, since it doesn't seem to provide any real use for cross-platform application programming. I'd be open to a counter-example, though.

Whatever mechanism we decide on, there should be some method for translating between Codes and Keys, for display purposes.

(shortcoming 5) All known modifier keys are supported. Note: W3C decided to include some keys that are usually handled in hardware and don't emit keyboard events (like Fn, FnLock)

I'd like to leave the hardware-handled keys out of our "officially supported" keys, but this would be a good change. We may also want to create a separate ModifiersChanged event, but that needs discussion and I'm not entirely sure it's the right move.

(shortcoming 4) a repeat attribute is added.
(shortcoming 6) received characters and keyboard events are now one (exceptions see below)
(shortcoming 7) to handle dead keys and IMEs a composition event is introduced. It describes the text that should be added at the current cursor position.

I'm down with all of these changes.

Hi, thanks for taking the time to review this!

Being more general on this would be a good change. I don't like using a full String for this, though - it introduces various issues that I'm not particularly happy with:

  • A full String is more difficult to match on than an enum, str, or char.

  • Unicode characters have multiple cases, while keyboard keys only have one case. This could introduce some tricky bugs into people's applications.

Unfortunately I can't think of a good replacement that's as flexible as a string while accounting for both of those issues, but it's something that rubs me the wrong way.

I couldn't agree more and would have preferred to use char instead. Matching is easy and the Key enum can implement Copy. The reason to use a String is that a key string has a base character and 0 or more combining characters because certain languages have keys that can't be represented with a single code point. (The problem with &str is that someone needs to own it and an enum is problematic because someone needs to decide which characters exist ahead of time and extend it for each new Unicode version.)

Because matching strings is so painful I wrote the ShortcutMatcher which is used by Servo. It is a quite convenient way to match keys and shortcuts. (Btw it ignores ASCII case and handles some other quirks)

Unicode characters have multiple cases, while keyboard keys only have one case.

One way to think about keyboard keys is that they have multiple levels. For example the "M" key on my keyboard has four levels that are accessed with different modifier keys: "m", "M", "µ", "º". These values should be a different Key. On the other hand, while the current VirtualKeyCodes can be typed without modifiers on a US-ASCII keyboard (I think) some variants like LBracket can only be accessed with modifier keys (in my case AltGr+8) on other keyboards. For these reasons I think it is preferable to have different Unicode cases in key values.

I like the idea of having a left/right enum to distinguish between sided keys. However, the Location enum should be exposed through variants in the Key enum (e.g. Ctrl(Location)), rather than on the main KeyboardEvent struct we expose.

Is there a specific reason to do it this way?
If there is ever a need to add a location to a key that previously did not have one (e.g. Backspace on num pad) this would be a breaking change.

Making scan codes platform-independent is certainly something we should do, although the W3C Code specification relies a bit too much on the layout of the US keyboard for my liking. Perhaps we should use some sort of numeric index for this?

One upside of using names referring to the US keyboard layout is that this layout is already familiar to a lot of people and there are plenty of diagrams and photos of the layout for quick reference. Classic scancodes are too short (8-bit) and vary between keyboards from different manufacturers. One language independent index used by X11 can be seen below. (search for X11 keycode names)

I'd like to leave the hardware-handled keys out of our "officially supported" keys

Yeah, there should be a list of supported modifiers for each platform in the docs.

We may also want to create a separate ModifiersChanged event, but that needs discussion and I'm not entirely sure it's the right move.

I am not sure when I would use ModifiersChanged event as modifier keys already send keydown and keyup events.

The problem with &str is that someone needs to own it and an enum is problematic because someone needs to decide which characters exist ahead of time and extend it for each new Unicode version.

There's a solution for using &str, actually - we could convert unicode Strings that are constructed at runtime into &'static strs as follows, then we can internally store a cache of keypress strings so that we don't consume additional memory for every keypress:

let string: String = "Hello".to_string();
// Construct a 'static string at runtime.
let x: &'static str = Box::leak(string.into_boxed_str());

That would let us pass &strs through the unicode variant and let people use string matching.

For example the "M" key on my keyboard has four levels that are accessed with different modifier keys: "m", "M", "µ", "º". These values should be a different Key. On the other hand, while the current VirtualKeyCodes can be typed without modifiers on a US-ASCII keyboard (I think) some variants like LBracket can only be accessed with modifier keys (in my case AltGr+8) on other keyboards. For these reasons I think it is preferable to have different Unicode cases in key values.

The purpose of having Key-codes is to let the program figure out which keys have been pressed irrespective of any modifier-key presses - we'd want all of those characters to always be exposed under one key, since they're mapped to the same key. If you want to access the character that's outputted, taking into account modifier keys, you check the received character.

Is there a specific reason to do it this way? If there is ever a need to add a location to a key that previously did not have one (e.g. Backspace on num pad) this would be a breaking change.

Mainly, to make matching more ergonomic. If you wanted to match on both location and key with the types being separate, you'd have to do this:

match (key, location) {
    (Key::A, _) => (),
    (Key::B, _) => (),
    (Key::C, _) => (),
    (Key::Alt, _) => (),
    (Key::Ctrl, Location::Left) => (),
    (Key::Ctrl, Location::Right) => (),
    _ => ()
}

With them combined into one type, it looks like this:

match key {
    Key::A => (),
    Key::B => (),
    Key::C => (),
    Key::Alt(_) => (),
    Key::Ctrl(Location::Left) => (),
    Key::Ctrl(Location::Right) => (),
    _ => ()
}

The second version is nicer to read, and it also lets the reader know when a key's specific location is being ignored, versus when a key only has one possible location. The first version doesn't communicate that information.

Regarding adding a location to an existing key being a breaking change - there shouldn't be any reason we ever have to do that! Keyboard layouts are fairly static, and only a limited subset of keys are going to have multiple locations on the keyboard. We should be able to keep track of which ones have multiple locations and structure the enum as necessary.

One upside of using names referring to the US keyboard layout is that this layout is already familiar to a lot of people and there are plenty of diagrams and photos of the layout for quick reference. Classic scancodes are too short (8-bit) and vary between keyboards from different manufacturers. One language independent index used by X11 can be seen below. (search for X11 keycode names)

My main issue with using the QWERTY keys to specify a layout-independent keymap feels against the spirit of providing such an API. Something in the vein of that X11 index seems like a decent solution, though.

I am not sure when I would use ModifiersChanged event as modifier keys already send keydown and keyup events.

If users could always keep track of which modifier keys have been pressed with keydown and keyup events, we wouldn't need to expose a modifiers parameter at all. The reason we expose them is because if someone presses a modifier key outside of the window then focuses the window, or presses the modifier key inside the window and unfocuses the window, the key-down/key-up events won't be properly delivered.

The reason I was floating a separate ModifiersChanged event was so that we wouldn't have to expose a modifiers variable alongside pretty much every window event, as we do now. However, I realize now that it would be simpler from a user's standpoint to provide stronger guarantees about keypress events so that they can reliably keep track of which keys have been pressed without running into the pitfalls described above (such as, guaranteeing to deliver a KeyUp event for every KeyDown event or automatically sending KeyDown events for all pressed keys when a user focuses the window).

Actually, regarding device-dependent virtual key-codes - what real purpose do they provide that isn't provided by exposing the received character and the device-independent key code? I can't think of a reason for using the virtual key-codes that isn't better-served by one of the other two methods; keyboard mappings should generally be done with the device-independent keys, and character input is best done with received character events.

The UI Event Specification explains how keyboards work. It discusses why each part of the event is useful and how they relate to each other.

The purpose of having Key-codes is to let the program figure out which keys have been pressed irrespective of any modifier-key presses

Looks like we are talking about different things then. You seem to associate the visual markings on the key cap with key codes.
While the UI Events specification and I refer to the functional mapping of the key.

If you want to access the character that's outputted, taking into account modifier keys, you check the received character.

What I propose is that the character that's outputted is the key. Received character is then redundant.

To match with separate key and location you can do this:

match event.key {
    Key::Home => ...
    Key::End => ...
    Key::Control if event.location == Left => ...
    Key::Control if event.location == Right => ...
    _ => ...
}

If a user does not care about key locations they don't have to
know they exist at all. On the other hand if key and location are one type
every user needs to know (or be told by the compiler)
which keys have multiple locations to write Key::Control(_i_dont_care).
(I expect this to be the common case.)

Looks like we are talking about different things then. You seem to associate the visual markings on the key cap with key codes. While the UI Events specification and I refer to the functional mapping of the key.

Not quite - if the user has switched their keyboard layout away from what's printed on their keyboard (say, to Dvorak) the key code would correspond to the remapped keybindings. Otherwise that seems fairly accurate.

What I propose is that the character that's outputted is the key. Received character is then redundant.

So, following the UI Events specification would have us mix character input and other keypresses (ctrl, alt, arrow keys, etc.) into a single API, right? I really don't like the idea of doing that. Having that API in addition to the physical key-press and character composition APIs leads to a situation where there's a lot of overlap for what each API does:

  • Functional key-press API (handles unicode characters most of the time and layout-agnostic key-presses for layout-agnostic keys)
  • Physical key-press API (layout-agnostic keypresses)
  • Character composition API (handles unicode characters sometimes, but only when they aren't tied to a single keypress)

The functional key-press API doesn't have its own specific purpose: sometimes it does things the physical keypress API does, and because it handles the majority of unicode input it make the character composition API easy to ignore.

I'd rather only have two keyboard input APIs:

  • Physical key-press API (layout-agnostic keypresses)
  • Character input API (handles unicode characters and composition events)

Under this design, the purpose of each API is much more clear: the physical keypress API handles mapping each key to a function, and the character input API handles... well, all character input. Skimming through the UI Events spec it seems like it would be possible to map this API onto that, as well.

On the other hand if key and location are one type every user needs to know (or be told by the compiler) which keys have multiple locations to write Key::Control(_i_dont_care).

That's the point of merging those two events - to force users to decide whether they care or not. Whether you like that is up to personal preference, I guess; I like it because it improves the readability of the code (you know when someone's opting out of considering location vs. when there's no location to consider) and the documentation (we don't have to manually specify which keys have locations - if a key has a location, it's inherent to the declaration of the variant).

I'd rather only have two keyboard input APIs:

  • Physical key-press API (layout-agnostic keypresses)
  • Character input API (handles unicode characters and composition events)

Fine. How do you handle keyboard shortcuts like Control+Z (for undo)? Keep in mind that the placement of the Z key varies across common layouts and reasonable people may move the functionality of the Control key to another physical key.

How do you handle keyboard shortcuts like Control+Z (for undo)?

I... hmm.

That's something that crossed my mind briefly when I was first writing that comment, and I'll admit that that design doesn't handle this case well. Ideally, we'd be able to keep the same physical keymap across layouts (which is what you want for things like videogame keymaps), but that also leads to problems when other software developers haven't done that, causing our applications to violate those UX standards!

Something we could do is use the UIEvents-Code keycodes (or an equivalent), and structure keyboard events like this:

struct KeyboardInput {
    /// The pressed key, ignoring keyboard layout.
    ///
    /// Alphanumeric keys always correspond to their location on a QWERTY keyboard,
    /// regardless of whether or not the user is using an alternate keymap. For instance,
    /// pressing the Z key on a QWERTZ keyboard will result in `KeyCode::KeyY` getting
    /// sent. This also ignores any other remappings (e.g. even if the user has bound
    /// Control to Caps Lock, pressing the Caps Lock key will result in `KeyCode::CapsLock`.)
    ///
    /// This is useful for things like videogame keymaps, where the physical location of a
    /// key is more important than the actual key being pressed.
    physical_key: KeyCode,
    /// The pressed key, taking keyboard layout into account.
    ///
    /// If the user is using an alternate keyboard layout or have remapped any of their keys,
    /// their preferred mappings will be sent. Unlike `physical_key`, pressing Z on a QWERTZ
    /// keyboard will output `KeyCode::KeyZ`, and rebound keys as mentioned above will output
    /// the rebound key.
    ///
    /// This is useful for desktop application keymaps, where maintaining keybinding
    /// consistency with other applications is more important than the exact location of the
    /// key pressed.
    logical_key: KeyCode,
    /* other fields intentionally omitted */
}

EDIT: I have physical_key and logical_key using the same type to make it clear that they both have the same underlying variants. We may want to split them into separate types with the same internal layout, as we've done with DPI types, but that decision isn't important for establishing whether or not this general API is a good idea.

Well this design is a lot better.

What happens if I want to detect the "Page Up" key on my numpad? If "Num Lock" is on I want to receive the character "9" instead.

There are two ways I can think of to do that:

  • Pass a location parameter alongside all keys that appear both on the numpad and elsewhere on the keyboard.
  • Always deliver the same keys regardless of whether or not numlock is pressed, and let the application handle translating them into special keys.

My feeling is that we should take the first approach for logical_key and the second approach for physical_key. That sacrifices some consistency across the two input methods, but it also matches better with their respective stated goals.

Pass a location parameter alongside all keys that appear both on the numpad and elsewhere on the keyboard.

I understand that if I press the "Page Up" key I will get a logical_key of PageUp(Standard) and if I press "Page Up" on the numpad I get Page Up(Numpad). Is this correct? But if "Num Lock" is active I will instead receive "9". So some logical keys are now depend on modifiers present?

  • What is the logical_key value for keys not found on un-shifted US keyboards?
  • What is the correct way to detect that a user pressed ":" (colon) for vim-style controls?

I understand that if I press the "Page Up" key I will get a logical_key of PageUp(Standard) and if I press "Page Up" on the numpad I get Page Up(Numpad). Is this correct? But if "Num Lock" is active I will instead receive "9". So some logical keys are now depend on modifiers present?

That is correct. I realize that this may be inconsistent with my stance on the alphanumeric keys, but it feels like there's a difference here since enabling/disabling numlock fundamentally changes how those keys interact with applications, rather than just outputting a different variation of a character.

What is the logical_key value for keys not found on un-shifted US keyboards?

You're talking about these sorts of keys, right?

image

For those, I'd use the Intl**** codes from the UIEvents-Code spec. Honestly, I'd lean towards replacing some of the standard US Keyboard values in that block with more international codes, seeing as there's a pretty wide range in what different keyboard locales put on those keys.

What is the correct way to detect that a user pressed ":" (colon) for vim-style controls?

Because vim mainly uses character input for its controls, I'd say to use the character input API.

Yes the keys marked red. But also those found on keyboards for non Latin scripts.

I understand that you want to use codes from the UIEvents-Code spec for the logical_key values. But these codes are almost arbitrary names to describe keys with a shared location but widely varying functions. I don't know when I would want to use those key values.


I don't think we can reach a consensus on keyboard events. You appear to prefer an API with just a physical location value and a separate API for character input. You made some additions to the keyboard API but it feels rather crude now and heavily relies on the assumption that you know every keyboard layout in existence and can predict how it will be used. (fixed number of key values, how does a numpad work, ...) I especially disagree with not providing an API for shifted keyboard symbols. This is available across Windows, Linux, Mac OS, but you prefer to only expose character data.

I would recommend that if winit changes its keyboard API it copies one from an existing system and does not try to have a unique variant.

Something we appear to agree on, is that there should be a code for physical keyboard locations. Maybe we can add this to the existing API?

I don't think we can reach a consensus on keyboard events. You appear to prefer an API with just a physical location value and a separate API for character input.

To be clear: I'd like character input to be delivered alongside the physical_key and logical_key values in the same event, just not expose the key as character input. Ideally, you'd have a keyboard input event structured like this:

struct InputEvent {
    keyboard_event: Option<KeyboardEvent>,
    composition_input: Option<CompositionEvent>,
}

struct KeyboardEvent {
    physical_key: PhysicalKey,
    logical_key: LogicalKey,
    key_state: ElementState,
}

enum CompositionEvent {
    Char(char),
    CompositionStart(String),
    CompositionUpdate(String),
    CompositionEnd(String),
}

That general structure associates character input with keyboard input, but exposes them as two separate things.

I'm not comfortable with exposing character input events and keyboard input events through the same enumeration (i.e. having enum Key {UnicodeKey(String), /*everything else*/}) for a couple of reasons: one, it creates an unnecessary stumbling block when creating keyboard shortcuts. Two, it hurts internationalization of keybindings.

About keyboard shortcuts: let's say that we exposed a Key enumeration similar to what's shown in the above paragraph, with UnicodeKey exposing shifted unicode values (as far as I understand, that's the structure you proposed initially). If somebody wanted to have control+z be a shortcut for undo, they might write this code:

match (key, modifiers) {
    (Key::UnicodeKey('Z'), Modifiers{ control: true, alt: false, shift: false, logo: false})
        => /*whatever undo stuff*/,
    _
}

The issue there is, because they're matching on Z and not z, that whole undo branch becomes dead code. It's not obviously dead code; there's no way for us to make the compiler warn about it, and it's doesn't seem immediately unreasonable, but it's the sort of API design that leads to developers banging their head against our library wondering why code that they'd think should work doesn't.

Regarding the second point: if a developer with a Latin-script keyboard creates a layout that associates 'a' with an action, and a Russian user (or some other user with a non-Latin keyboard) has a keyboard that doesn't output 'a' without some form of shifting, the non-Latin keyboard will in the best case have keybindings that require extra shifting to function; worst-case, the keybindings won't work at all. Conversely, non-Latin keybindings won't work on Latin keyboards, and an action bound to 'Б' will only work in a select few locales.

Neither of those are API compromises that I'm willing to accept. That's why I don't want to adopt the UI Events API verbatim - I think it's fundamentally flawed in ways that aren't obvious, but concretely harm both users and developers.


One thing that I haven't said but probably should've mentioned sooner: I'm in favor of having a mechanism for translating between our internal key enumeration and the default character output for the keyboard's layout. The intention would be to have a standardized internal structure for keyboard input and then display to the user whatever key value is associated with each particular key for their keyboard layout. I'm sorry I hadn't communicated that before - it's something that was in my head as a given, but seeing as I never wrote it down there's no way you would know that 😅.

Yes the keys marked red. But also those found on keyboards for non Latin scripts.
I understand that you want to use codes from the UIEvents-Code spec for the logical_key values. But these codes are almost arbitrary names to describe keys with a shared location but widely varying functions. I don't know when I would want to use those key values.

Hey, you've gotta have some sort of arbitrary code. QWERTY just happens to be one that isn't arbitrary for a large portion of the world.

I mentioned possibly using some index-based system above, but I've since changed my mind on that. All the foreign-script keyboards I've seen from googling have also had QWERTY markings alongside their non-Latin characters, and if you're programming in Rust you need to have some amount of familiarity with a Latin keyboard to even start using the language.

You made some additions to the keyboard API but it feels rather crude now and heavily relies on the assumption that you know every keyboard layout in existence and can predict how it will be used. (fixed number of key values, how does a numpad work, ...)

How are those unreasonable assumptions to make? From the research I've done, the only difference in keyboard layouts are:

  • Which characters get bound to the alphanumeric keys.
  • If they add any other, miscellaneous keys appropriate for that locale.

There are a limited number of "other, miscellaneous keys"; certainly few enough that we can expose them through a well-formed enum.

As far as assuming how a numpad works: it's a standard that keyboard manufacturers have settled on, and it seems to be standard across every keyboard that has a numpad. If we're making an abstraction we have to make assumptions somewhere, and there's nothing unreasonable about assuming this.

I especially disagree with not providing an API for shifted keyboard symbols. This is available across Windows, Linux, Mac OS, but you prefer to only expose character data.

What's the difference between exposing character input and shifted symbols? I've been working under the assumption that they're the same thing, but you're saying here that they're not; we may be talking about two different things here.

Something we appear to agree on, is that there should be a code for physical keyboard locations. Maybe we can add this to the existing API?

Yes, but I think we can go further with more comprehensive improvements. Like I've said elsewhere - I think that most of the ideas behind your proposal are good, I just don't agree with some of the specifics of how things should get exposed.

Thanks for the work narrowing down an input specification. @Osspial's latest suggestion appears adequate, except that I share @pyfisch's scepticism about use of "key location" codes to describe logical_key.

An example (using X11 key location names): 1 appears as AE01 on US keyboards, Shift+AE01 on Azerty, and AltGr+AC07 on my keyboard. It should be possible to bind an action to e.g. Ctrl+1 and have it work correctly on all these keyboards. In practice, this means a semantic binding to 1, not where 1 would appear on a US keyboard. (The fact that 1 is commonly duplicated on the numpad perhaps works well with @Osspial's suggestion to use Key1(Location) within an enum.)

I also agree with @Osspial's point that the semantic value should be independent of case (people don't think of Ctrl+Z as being Shifted), however one has to be careful here: if an app is to react to Ctrl+1, then it should require the Ctrl modifier (left/right or both) but not care about the state of other modifiers...

... except, e.g., some systems use Ctrl+Z for Undo and Ctrl+Shift+Z for Redo. In general I think this can only be solved via the app checking only those modifiers it actually needs to check:

match key {
  Key::Key1 if modifiers.ctrl() => ctrl1_action(),
  Key::Z if modifiers.ctrl() && !modifiers.shift() => undo(),
  Key::Z if modifiers.ctrl() && modifiers.shift() => redo(),
  _ => (),
}

Regarding the symbolic Key / VirtualKeyCode, I believe the only good option is to produce a custom enumeration. Use of a String makes it too easy for users to use invalid codes without linting. Since apps should never match against this enum exhaustively, it can be #[non_exhaustive], making extension a non-breaking change.

This leaves the following still needing precise definition:

  1. Key-location codes (enum). The Codes specification is sufficient, or X11 names could be used (although I don't believe these extend to things like TV remotes).
  2. Symbolic key names (enum). The Key values specification could be used as a starting point. Alternatively the X11 key symbols could be used (approx. 1300 entries). This is #1266 (on hold, awaiting a decision here).

There are a few others who reported the duplicate key press event feature. It's annoying is there anything I can do to help this along?

These #1220 #146 #1184 should all be merged as duplicates.

The concept of consumed modifiers is relevant for how keyboard shortcuts work with national keyboard layouts.

Regarding the symbolic Key / VirtualKeyCode, I believe the only good option is to produce a custom enumeration. Use of a String makes it too easy for users to use invalid codes without linting.

I disagree. An enumeration will need constant updates and will most likely still lack relevant characters. The authors of the X11 keyboard protocol realized this at some point and decided to use unicode codepoints (with a fixed offset) as keysyms for all characters without an existing keysym. Therefore I think a String or char should be used for printable symbols.

For example on a German keyboard I can press AltGr+Shift+s which produces a (capital sharp s), this character is not found in the keysym table.

Regardless of String vs char vs enum, winit will have to map system-specific codes to its own symbolic code internally (excepting maybe on one platform if it copies that platform's symbolic code). Thus I don't see an advantage to a String over a non-exhaustive enum.

Regardless of String vs char vs enum, winit will have to map system-specific codes to its own symbolic code internally

Platforms, such as X11, already provide functions to map keysyms to Unicode strings, take for example xkb_keysym_to_utf8. Internally named keysyms are looked up from a table but the directly encoded Unicode keysyms are calculated.

Is there a decision to code to the lowes common denominator or to extend interfaces that lack some features? I'm interested in an interface to XQueryKeymap(), even if it doesn't exist on every supported platform. I believe the idea of the interface is universal and should be emulated for platforms that don't provide a method to query a key's state.

I still feel like it's important to have consistency across implementations. It's not ideal if client code needs platform dependant code to implement basic functionality.

I also agree that the functionality should be exposed, and even if it is not emulateable on a platform then the API should be an Option/Result instead in that case to state that it was not performable.

  • I'm waiting on this prior to fixing bugs in my application's input handling, so I'd rather be helpful than argumentative. So take the following with the lense that I just don't understand the Winit project or it's goals.

I understand, BUT. Think about how that looks for clients, do they just expect()? If they don't expect() then they will have this boilerplate code that could instead be part of Winit. I'm not aware of Winit's practice or precedent concerning this topic. I feel strongly that Winit should maintain an is key/button down query tool. Obviously there could/should(to me it doesn't matter) be a way to query if the implementation is using some form of emulation or a platforms provided backend.

I'm not opposed to this tool being part of another crate, I already maintain ash-tray that's an abstraction over Winit.

@cheako any chance we can keep this issue focussed on how to identify a key (scancode, symbol etc.) rather than on the state of the key? The issue is complex enough as it is. Also keep in mind that none of the people who commented here within the last year are core contributors to this project; I suspect like most, this project simply lacks funding.

TLDR;
"Is keydown" API should be explicit in the design of key events, not an afterthought. It would be a simple "go away, bugger off" to indicate that there will be no relation between these APIs or that an "is keydown" API will never be exposed. A sad thought, but an valid one.

Until the is keydown API is discussed, it would be incomplete to finish the API at the core of the discussion here. For example "How will clients *tie the events to querying key state?" could be important and deserves flushing out as part of designing a key event system. The consideration to allow or not the same (scancode, symbol etc.) could drive the decisions on how those are expressed, given that the backends have two ABIs and we are investigating relating them on the front end.

  • An example corner case would be to implement an action upon receiving an event, then at some instantaneous point later checking if the state continues.

When talking about "is keydown", currently, we are talking about filtering events into a state machine that tracks the keys. I've found it impossible "for me" to do this given the current implementation. That's my interest in this ticket and given that the first bullet point in the OP is "Games" I believe it's only natural to test if the proposed solution is well suited for that application. I assume the majority of games will use some kind of is keydown approach.

I'm only developing in Linux and maybe KFreeBSD. Notably I'm unable to cross compile for wine. That said I'd be happy to break the API for the other platforms and if a group of like minded ppl are thinking the same for the other platforms then it would be possible to write a pull request that passes a test suit on those platforms.

Covid19... Is anyone interested in starting a working group to tackle this?

I'm interested. But I should warn you that this is a really thorny problem and it will be difficult to agree on one design and to convince the core maintainers to merge it. Additionally all supported platforms will need to be changed.

Perhaps there should be one interface for every use case? That should remove the difficulty or at least move the complexity to client design time.

Edit: To be clear. A few interfaces where the best interface to use would be platform dependent, but each interface should be usable across all platforms. Instead of trying to bend backwards to write one interface that generally covers all use cases.

There are several (incomplete) proposals above. Perhaps a good starting point (if someone has the time) would simply be to create a table or short document comparing these.

As a next step, perhaps we should use the RFC model — let someone write up a proposal, and post as an RFC (maybe a new PR, maybe even just a new issue + a gist) — the important parts are to have a document and to have a focussed discussion topic.

But I am not a winit maintainer — could one of the maintainers clarify a choice of discussion model?

I think your proposed RFC strategy comes with too much of a maintenance burden. This can be mostly just implemented and iterated upon.

This can be mostly just implemented and iterated upon.

@chrisduerr In the previous discussion multiple different (and sometimes mututally exclusive) proposals have been made. Which one do you think should be implemented and iterated upon?

I don't have the time to go into too much detail here, but I think the gist of it is that it doesn't really matter. Stuff like hardware position that has been discussed a lot doesn't matter since it can be easily added after the fact based on all existing suggestions, though there's hardly ever a reason to use it if you ask me (even for games you don't want to map to location really).

Ultimately winit is not going to invent some magic new silver bullet here, because there is none. Focusing too much on the Web APIs is probably just going to be confusing, but just picking one existing solution that has somewhat aged well over time and going with that is probably the best solution.

An RFC would just be the same thing as this issue with an "RFC" tag attached to it. People are going to comment on how they don't like one aspect and the author is going to respond that they need that for reason XYZ. There's no advantage to RFCs over the current discussion, it's already a proposal and comments on that.

For games the solution is to allow the user to define key mapping... That's just how most games have worked. All winit needs todo is be consistent.

@chrisduerr

(even for games you don't want to map to location really).

Why wouldn't location be interesting for games?
I've been under the assumption that it's more important that the key for "move back" is below "move forward" and between "move left" and "move right" than to make the user re-configure their keybindings because their keyboard layout moves these keys around (virtually and on the keycaps).

@cheako

For games the solution is to allow the user to define key mapping... That's just how most games have worked.

That's certainly not how, say, CS:GO works. I can change the keyboard layout from QWERTY to DVORAK, but the keys I have to press to move around don't change. I'd assume that CS:GO inherits this behavior from the Source engine, which means that there are some high-profile games which use scancodes rather than virtual keycodes for keybinds.


Support for layout-independent keyboard input should be implemented directly in winit if it's going to be exposed at all. The way it's currently done is insufficient: scancodes are platform-dependent (this isn't documented anywhere), which is a bit of a letdown for a cross-platform windowing library.

In any case: Discussing whether to expose cross-platform layout-independent keyboard input or not seems pointless when @Osspial and @pyfisch seem to be very much in favor of it. It looks to me like it's a question of how and when, not if, it's going to happen.

Why wouldn't location be interesting for games?
I've been under the assumption that it's more important that the key for "move back" is below "move forward" and between "move left" and "move right" than to make the user re-configure their keybindings because their keyboard layout moves these keys around (virtually and on the keycaps).

Because it's basically useless. For a lot of stuff you want mnemonics, so that doesn't help at all. For location the only really relevant thing is WASD maybe, but if that's all you need just use the arrow keys. But even WASD can't be used universally because of different keyboard layouts, since the same "location" isn't always in the same place.

So one way or another people will have to remap their bindings. Focusing on something that doesn't even get fixed by the solution just doesn't make much sense. Especially when such a thing isn't essential to the protocol and can be easily added after the fact.

That's certainly not how, say, CS:GO works.

I've noticed a lot of games get this right (layout-independent codes) and a lot get it wrong. I use Colemak but bear in mind there are a lot of users with non-Qwerty layouts, e.g. Azerty.

Because it's basically useless. For a lot of stuff you want mnemonics,

Partly yes, partly no. Should Ctrl+Z be awkward to press on a Qwertz keyboard? (Okay, that's already too "standard" to fix.) Many games map far more than just WASD; sure, memorising all the keys is a pain, but in many cases location is more important than label.

scancodes are platform-dependent (this isn't documented anywhere)

Nevertheless a layout-independent mapping could be implemented over this via an extension mapping from location to scancode (and ideally also key label). IMO this may be the best route forward simply because it enables divide-and-conquer.

Nevertheless a layout-independent mapping could be implemented over this via an extension mapping from location to scancode (and ideally also key label).

Did you mean scancode -> location rather than location -> scancode here? I can imagine how you'd go about implementing the former (#[cfg(os)]-ed match-ing over the value of the scancode), but I can't say the same for the latter.

I guess it doesn't matter. One can configure location codes, and either map to scancode and match that against input, or map input scancode → location and match that against configured codes. Either way, the mapping function can be added later.

Also desirable is a way to map location → key label so that location-based configuration can show users the key's label — I have no idea whether this is possible, but what I have seen in games suggests it probably is.

Partly yes, partly no. Should Ctrl+Z be awkward to press on a Qwertz keyboard?

Yes, it definitely should be. Because that's the only way to communicate the location of the binding to the user. "Undo is mapped to the button where Z would be on your keyboard assuming you don't have a keyboard that reports buttons at different locations than we think where it's at" just doesn't really have a very "convenient" ring to it.

in many cases location is more important than label

I completely disagree. Especially with how unreliable that is.

Yes, it definitely should be. Because that's the only way to communicate the location of the binding to the user. "Undo is mapped to the button where Z would be on your keyboard assuming you don't have a keyboard that reports buttons at different locations than we think where it's at" just doesn't really have a very "convenient" ring to it.

It doesn't have to be. Say you want to inform a user that they need to use WASD on a US-keyboard to move. At least on Linux (libxkbcommon) you then call a function with a layout independent code for the W key and receive the correct label according to the selected keyboard layout. Repeat for all four keys. I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

This interface certainly should not be part of the keyboard input MVP but may be necessary to give the best possible experience.

It doesn't have to be. Say you want to inform a user that they need to use WASD on a US-keyboard to move. At least on Linux (libxkbcommon) you then call a function with a layout independent code for the W key and receive the correct label according to the selected keyboard layout. Repeat for all four keys. I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

That assumes the user goes looking for the key combinations inside of the application.

This interface certainly should not be part of the keyboard input MVP but may be necessary to give the best possible experience.

This was the primary point I was trying to make. It doesn't make much sense to start splitting hairs over this.

I assume that there is a similar interface on Windows and Mac as @dhardy suggests.

MapVirtualKeyW with MAPVK_VSC_TO_VK_EX might do the trick on Windows.


Regarding how VirtualKeyCode should be handled WRT Linux:

There seems to be a need for VirtualKeyCode to have a Unicode(&'static str) variant (where &'static str is an intentionally leaked String) since Linux keycodes (+ some set of modifiers) can seemingly map to arbitrary keysyms/Unicode code points.

Would it be possible to always return the VirtualKeyCode you'd get without any modifiers? E.g. if I've got a weird keymap which is mostly the US one, except I've replaced the keysym bound to Shift+3 with ö, could winit then give me VirtualKeyCode::Key3?

Clearly, the mapping without modifiers is the proper "meaning" of the key (that's how it looks like to me at least).

Changing winit to fix issues with existing games? If we want all games(maybe using winit) to work in all situations, then promoting sane practices is the only way.

Sane practices like having a robust key to action editor... Promoting games that force key locations onto users dose a disservice to users. I find that WASD is located on the far left of my keyboard, leaving no room on that side for hot-keys. However YGHJ is centrally located making more keys not far from your hand while moving.

If we want all games(maybe using winit) to work in all situations, then promoting sane practices is the only way.

Sane practices like having a robust key to action editor...

No one here is saying that games (or other applications) shouldn't have user-configurable keybinds (or that scancodes invalidate the need for such a feature). I'd even go so far as saying that any game which ships without some kind of keybind editor is lesser for it. But the presence of user-configurable keybinds does not invalidate the need for sane (or even just usable) defaults, and that's what using scancodes instead of virtual keycodes should provide for games.

Making an application with user-configurable keybinds straight on top of winit isn't necessarily all that easy, but that's something that can be solved in other crates. The addition of a real API for key locations wouldn't change this much. User-configurable keybinds is either something winit shouldn't tackle or something that should be tackled separately from this issue. My money is on the former being the case.

Promoting games that force key locations onto users dose a disservice to users. I find that WASD is located on the far left of my keyboard, leaving no room on that side for hot-keys. However YGHJ is centrally located making more keys not far from your hand while moving.

Again, no one is saying that games must use WASD. WASD is just mentioned because it's the conventional set of movement controls, not because anyone mentioning WASD is convinced that WASD is the best thing ever and wants to force it on others.

Defaults is something that happens once, less than once on startup... Why should we run code on every keystroke for defaults?

Promoting games that force key locations onto users dose a disservice to users.

The opposite — having to rebind WASD keys etc. because I use a different keyboard layout and the keys end up in inconvenient locations for one-handed use is a pain. The only one trying to force applications to do a specific thing here is you — both "label" and "location" (default) bindings should be supported.

Would it be possible to always return the VirtualKeyCode you'd get without any modifiers?

I don't know. But I'm also not sure it's desirable. E.g. + is Shift+= on a QWERTY keyboard but a dedicated key on QWERTZ., so a binding to + should not be labelled shift+=. Conversely, a binding to Shift++ might not be usable (or at least not distinct from +) on QWERTY, so one has to be careful with default key-maps. Ultimately in some cases the right thing may be to have multiple defaults and choose one based on locale and/or platform.

Why should we run code on every keystroke for defaults?

That shouldn't be necessary? This is why I suggested mapping from key positions to scancodes. But even if only the opposite map function (scancode → location) is available, apps can invoke this only as needed (potentially filling out a dictionary the first time each key is pressed).

@dhardy I think I might not have explained my thoughts well enough.

What I was proposing would lead to + on QWERTZ always giving + as a VirtualKeyCode, regardless of modifiers. Now that I've thought about it some more, I can't quite decide on how that would work with Windows or AZERTY.
Windows can't, to my knowledge, quite re-map it's virtual keycodes in the same way.
AZERTY swaps numbers and symbols on the number keys, so that would also make it behave very differently on Windows.
Making VirtualKeyCode completely consistent across platforms seems a bit more complicated than I thought. I'm not even sure if it's entirely desirable now.

Which is why I suggested having a huge enum like X11, but I'm not really sure if that's desirable either.

It seems to me that @pyfisch's original proposal suggested defining variants only for non-printable keys. I agree with this and and I don't see why should printable characters have an enum variant/const value.

It's much nicer to match on Key::Unicode("ÿ") than on Key::YDiaeresis and it makes application developers' job easier when for example comparing pre-defined characters with the "current" input.

This appears to be @pyfisch's proposal.

The key field contains "translated" input, so for example if one wishes to match against + input this is fine; if one wishes to match against shift+= it is not (though arguably this shouldn't be matched anyway). This field does not distinguish between e.g. left and right Control keys or main-area numbers and the numpad, but the location field does (though as with code below, will extra computation be required to calculate this value?).

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

The code field is what we referred to recently as location. As was mentioned recently, it may not be desirable to include this field since (1) it will likely include extra computations per key-stroke, even if unneeded, and (2) it may not be available on all platforms (e.g. it makes little sense for virtual/screen keyboards). But if this field is removed, something else is needed (probably scancode).

Other than that, provided this interface can be implemented for all platforms, I think it is sufficient.

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.

There may be another issue with key: modifiers affect the result (e.g. Shift + c = C). So does Ctrl + c emit c with the "control" modifier? For whatever reason, on X11, the current ReceivedCharacter(char) event emits U+03 instead, so making this interface work "as expected" might not be so easy... I don't know.

X11 keyboard handling sucks. What you want to use on X11 and Wayland is xkbcommon which does not have this and many other quirks. There are various rust bindings for it, but they are somewhat outdated/incomplete but with bindgen you can easily make your own binding.

You'll get the same U+03 on Wayland with libxkbcommon too, that's what you should get, since you're asking for control chars. I'm not sure if you can tell libxkbcommon to prevent of doing so, but for example alacritty relies on that, and it's also the case for all(at least Windows/macOS/Wayland/X11) platforms to send control chars for control keys. It's not a bug or anything it's just what it is.

@kchibisov There are functions to get the symbol without the control transformation applied.

@kchibisov There are functions to get the symbol without the control transformation applied.

yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.

Relevant on control chars: Wikipedia, XKB, xkb transformation without control translations

yeah, but you're expecting control chars most of the time, and I'd be surprised to not get them.

Are these standard enough to be reliably useful? Because the alternative is to use the API the way we're trying to design it: match Control modifier + c instead of U+03.

I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.

Are these standard enough to be reliably useful?

Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control + c and suppress chars in your app without issues if you want just ctrl + c as a hotkey.

I think it should be possible to find out what the pressed key combination was. I'm not saying that it should be what the keyboard input event presents by default but there should at least be a function that allows the user to extract the layout-dependent key that was pressed together with the modifier. If this is too difficult to implement I could accept that this is not something that winit should offer.

I'm not sure what you're talking about, it's already possible in winit? Like you always know a modifiers state and a keys you're pressing. And modifiers are a special event. I'd just mention that you can't map scancode -> position, etc, since you can't assume qwerty. So the only thing that you can do to help downstream is to map scancode to Physical(A) or something like that. And have a virtual keys like we have right now. If you're planning to match by Received chars it could be tricky and you can't really design it hence compose keys, which could send strings of text in one press.

I'm not sure what you're talking about, it's already possible in winit?

See #1700 for a more detailed explanation of what I was referring to when I wrote "extract the layout-dependent key that was pressed together with the modifier".

@ArturKovacs this relates to what we recently discussed: Control + Key gets mapped to some control char code, but it is also possible to get the unmapped version (for Linux platforms at least).

At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.

At this point it looks like a minimalist workable API would be to return the scancode (XKB keysym) and expose functions to map to unicode (with and without control char mapping) and to the key location.

If we want to have something like that, we shouldn't expect clients to do so(it's way more complex than you may think), what we could do is to allow provide a per window configuration on how modifiers and such input should be handled.

I'm not even saying that you may have a control + certain key rule in xkb that will transform to Super + key on the fly, and you can't just ignore such things, meaning that you should always call to xkb with a proper keymap(which you can't pass to winit users, hence not cross platform), and use that control. As well as someone should handle compose key, right now it's a job of winit, if you expose xkb downstream how should handle composing?

I looked into the documentation of xkbcommon and it doesn't seem that complicated.

Do I misunderstand something or could this really be implemented as shown below?


Click to see the code

struct KeyboardInput {
    // All platforms
    pub scancode: Scancode,
    pub modifiers: Modifiers,

    // *nix only
    layouts: Layouts,
}
// *nix implementation
impl KeyboardInput {
    fn character_with_mods(&self) -> Option<&'static str> {
        let dep_mod = get_dep_mod(self.modifiers);
        let latched_mod = ...
        let locked_mod = ...
        let dep_layout = get_dep_layout(self.layouts);
        let latched_layout = ...
        let locked_layout = ...

        let xkb_key: xkb_keycode_t = get_xkb_keycode(self.scancode);

        unsafe { 
            // note that this thread local xkb_state is only used for these transofmration function
            // and a separate xkb_state can be used in the event loop if needed.
            xkb_state_update_mask(thread_local_xkb_state, dep_mod, ...);
            let sym = xkb_state_key_get_one_sym(thread_local_xkb_state, xkb_key);

            // Look up thread local map of already known `&str`s for keysyms

            // If an &str is found, return that, otherwise continue...

            let size = xkb_state_key_get_utf8(thread_local_xkb_state, xkb_key, ptr::null_mut(), 0);
            if size == 0 {
                return None;
            }
            let mut output = Vec::<u8>::with_capacity(size);
            xkb_state_key_get_utf8(thread_local_xkb_state, xkb_key, output.as_mut_ptr(), output.capacity());
            output.set_len(size);
            let utf8 = std::str::from_utf8_unchecked(output.into_boxed_slice().leak();
            // Add `utf8` to thread local map of known keysyms

            Some(utf8)
        }
    }
    fn character_without_mods(&self) -> Option<&'static str> {
        let dep_mod = EMPTY MOD MASK;
        let latched_mod = EMPTY MOD MASK;
        let locked_mod = EMPTY MOD MASK;
        // and the rest is essentially identical
    }
}

I looked into the documentation of xkbcommon and it doesn't seem that complicated.

I'm saying that clients shouldn't do libxkbcommon handing, we're already using it on Wayland https://github.com/Smithay/client-toolkit/blob/aac3c503242c8a2a9f37f4a2231e7b540e3a575c/src/seat/keyboard/mod.rs#L422.

I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?

I don't understand what you mean by that. Can you explain why code similar to what I wrote shouldn't be in winit?

I mean in winit clients, you're free to do that internally in winit if you want to.

I don't mean expose the xkbcommon functions. I mean make safe wrappers which require only the scancode and the EventLoop or Window as context and return a char. The scancode can be whatever we want to make it and include the keysym, even a layout identifier if necessary.

In an attempt to move this to the implementation phase I tried to gather all suggestions described so far and I compiled an API which seems to represent the most reasonable compromise. Note that each struct may have implementation specific private fields added.

Please let me know if something seems wrong but otherwise I'd like to start implementing this for Windows in a few days.

I'm also willing to make Linux, macOS, and web implementations.

Note that the following code has been updated several times since posting this so some of the following reflections may be obsolete.


Click to see the API

/// Key events are not reported at the beggining, during, and at the end of composition.
// -------------------
// Developers' note:
// This is due to the limitaiton that neither Firefox nor Chrome report keypresses correctly
// during composition on Windows, so in order to maintain consistency, this behavior
// is replicated on other platforms too.
enum KeyboardEvent {
    Key(KeyEvent),
    Composition(CompositionEvent),
}

struct KeyEvent {
    scancode: ScanCode,

    physical_key: PhysicalKey,

    /// This value ignores all modifiers including
    /// but not limited to <kbd>Shift</kbd>, <kbd>Caps Lock</kbd>,
    /// and <kbd>Ctrl</kbd>. In most cases this means that the
    /// unicode character in the `Unicode` variant is lowercase.
    /// 
    /// Note that this is `LogicalKey::Dead` for dead keys.
    /// 
    /// Optimally this wouldn't be the case but unfortunately
    /// this is a limitation of the web API which is applied
    /// for every platform for consistency.
    logical_key: LogicalKey,

    /// This value is affected by all modifiers including but not
    /// limited to <kbd>Shift</kbd>, <kbd>Ctrl</kbd>, and <kbd>Num Lock</kbd>.
    /// 
    /// Use this for text input along with `CompositionEvent`.
    /// 
    /// Note that the `Unicode` variant may contain multiple characters.
    /// For example on Windows when pressing <kbd>^</kbd> using
    /// a US-International layout, this will be `Dead` for the first
    /// keypress and will be `Unicode("^^")` for the second keypress.
    /// It's important that this behaviour might be different on
    /// other platforms. For example Linux systems may emit a  
    /// `Unicode("^")` on the second keypress.
    transformed_key: LogicalKey,
    key_state: ElementState,
    repeat: bool,
}

/// As described at https://www.w3.org/TR/uievents/#events-compositionevents
enum CompositionEvent {
    CompositionStart(String),
    CompositionUpdate(String),
    CompositionEnd(String),
}

/// The layout-dependent key.
/// 
/// This is identical to the label printed on the key when
/// the currently active layout matches the layout of the
/// labels on the keyboard.
///
/// `Fn` and `FnLock` key events are not emmited by `winit`.
/// These keys are usually handled at the hardware or at the OS level.
#[non_exhaustive]
enum LogicalKey {
    Unicode(&'static str),

    Ctrl(Location),
    Alt(Location),
    ...
    LeftArrow,
    RightArrow,
    ...
    F1,
    F2,
    ...
    /// Dead key. See https://www.w3.org/TR/uievents/#dead-key
    Dead,

    /// Reported when the label of the key cannot be determined.
    /// Note that this is distinct from the `Dead` variant.
    Unknown,
}

/// Represents the position of a key independent of the
/// currently active layout.
/// Synonymous with https://www.w3.org/TR/uievents-code/
/// 
/// `Fn` and `FnLock` key events are not emmited by `winit`.
/// These keys are usually handled at the hardware or at the OS level.
#[non_exhaustive]
enum PhysicalKey {
    A,
    B,
    ...
    Digit0,
    Digit1,
    ...

    /// Reported when the position cannot be determined in a
    /// cross-platform manner.
    /// 
    /// For example an on-screen keyboard or a remote control
    /// may not have a layout for which it's sensible to map
    /// the positions to other values of this enum.
    Unknown
}

#[non_exhaustive]
enum Location {
    Standard,
    Left,
    Right,
    Numpad,
}

/// An opaque struct that uniquely identifies a single physical key on the
/// current platform.
/// 
/// This is distinct from `PhysicalKey` because this struct will always
/// be a unique identifier for a specific key however `PhysicalKey` may be
/// `Unknow` for multiple distinct keys.
/// 
/// Furthermore this struct may store a value that cannot be ported
/// to another platform, hence it is opaque. To retreive the underlying
/// value, use one of the platform-dependent extension traits like
/// `XkbScanCodeExt`
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct ScanCode {
    /// platform dependent private fields to uniquely identify a single key
}

/// For X11 (and maybe wayland as well?)
impl XkbScanCodeExt for ScanCode {
    /// returns `xkb_keycode_t`
    fn keycode(&self) -> u32 {
        self.keycode
    }
}

This is a very reasonable proposal.

However there is one particular issue I have a strong opinion about. That
is the decision to ignore all modifier keys to for the LogicalKey::Unicode
value. If the KeyboardEvent and CompositionEvent is correctly implemented
on a platform (which is hard) you can reconstruct the input text from these
two events (and don't need to rely on ReceivedChars or similar API). This
is not possible if modifiers are ignored. An additional concern is that
many applications offer single-key-shortcuts such as "[", "]", "?" etc.
that require Shift or other modifier keys depending on the keyboard layout.

I think there should be an API to get the key value without modifiers, but
it must not be the primary API.

I am happy you want to do some implentations. If you have any questions
about Linux or the Web you can ping or email me.

If the KeyboardEvent and CompositionEvent is correctly implemented
on a platform (which is hard) you can reconstruct the input text from these
two events [...] I think there should be an API to get the key value without modifiers, but
it must not be the primary API.

In my proposal the application may use CompositionEvent::Char for key input with modifiers and use LogicalKey for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input is likely the only field one needs for text input.

An additional concern is that
many applications offer single-key-shortcuts such as "[", "]", "?" etc.
that require Shift or other modifier keys depending on the keyboard layout.

I think this depends on how the application wants to handle these cases.

One way to handle this would be to use the PhysicalKey if position is more important than the label.

Another way is to use the composition input API to look for such input.

if let Some(CompositionEvent::Char(ch)) = input_event.composition_input {
    if ch == '?' {
        // ? was pressed either directly or through a modified key
    }
}

If the KeyboardEvent and CompositionEvent is correctly implemented
on a platform (which is hard) you can reconstruct the input text from these
two events (and don't need to rely on ReceivedChars or similar API).

Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?

In my proposal the application may use CompositionEvent::Char for key input with modifiers and use LogicalKey for key input without modifiers. Furthermore every variant of the CompositionEvent is affected by modifier keys, meaning that composition_input is likely the only field one needs for text input.

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

An additional concern is that
many applications offer single-key-shortcuts such as "[", "]", "?" etc.
that require Shift or other modifier keys depending on the keyboard layout.

I think this depends on how the application wants to handle these cases.

One way to handle this would be to use the PhysicalKey if position is more important than the label.

Many applications have mnemonics, so they are about symbol, not location.

If the KeyboardEvent and CompositionEvent is correctly implemented
on a platform (which is hard) you can reconstruct the input text from these
two events (and don't need to rely on ReceivedChars or similar API).

Could you elaborate on why "reconstructing the input text" instead of relying on platform APIs is desirable?

If you are building a web browser. :wink: I proposed this interface originally for use in the servo browser, but since Mozilla discontinued this project it is not as important now. Besides browsers you will need it for high-quality GUI applications especially in the context of IME. How I understand the CompositionEvent API, its primary purpose is to be able to accurately construct the input text in a text field/word processor. If you decide you don't need this an API with just characters, key-down, key-up is likely sufficient for most games and simple applications.

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

It took me a moment to realize what you meant by this, but I think you mean that there's no easy way to tell that i.e. ! was released when Shift+1 is required to produce !.

Many applications have mnemonics, so they are about symbol, not location.

I'm beginning to think that a proper keyboard layout query API is required to support the greatest number of use-cases.

pub struct KeyboardLayout {
    layout: platform_impl::KeyboardLayout,
}

impl KeyboardLayout {
    pub fn logical_key(&self, physical_key: PhysicalKey, modifiers: ModifiersState) -> LogicalKey {
        self.layout.logical_key(physical_key, modifiers)
    }
}

impl PhysicalKey {
    pub fn to_logical(self, layout: &KeyboardLayout, modifiers: ModifiersState) -> LogicalKey {
        layout.logical_key(self, modifiers)
    }
}

I think Android, Linux (libxkbcommon) and Windows have good enough APIs to implement the minimal example above, and I think there's an experimental API for this in browser-land but I don't know if it's usable for the above example. I have absolutely no idea what the situation is on either macOS or iOS.

There would probably have to be a KeyboardLayoutChanged(KeyboardLayout) event somewhere, but I'm not sure where to stick it.

   /// This value ignores all modifiers like shift and ctrl, and
   /// it is always uppercase.
   Unicode(&'static str),

I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through to_uppercase and to_lowercase.
This might be doable if you use what the key would give you if Caps Lock is the only active modifier. Keys affected by Caps Lock have unambiguous mappings between the character emitted without Caps Lock and one emitted with Caps Lock on the layouts I regularly use, but there are a lot of different layouts out there, and one of them might not have this property.

This seems useful. I did miss CompositionEvent::Char. Although I still can't (easily) find out when a given symbol for which modifiers are needed is released.

Excuse me @pyfisch I completely missed this. Yeah, I think we can give a written guarantee in documentation that every CompositionEvent::Char will come with a Some(KeyboardEvent) from which you can tell wheter it's a press or release. This could maybe be expressed by the types themselves but I think any such design will result in pattern matching hell in the application.

   /// This value ignores all modifiers like shift and ctrl, and
   /// it is always uppercase.
   Unicode(&'static str),

I'm not sure if converting to uppercase is the right thing to do here. Some characters don't round-trip losslessly through to_uppercase and to_lowercase.

Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?

Yeah that uppercase thing I wasn't entirely sure about and I absolutely didn't know that the conversion is not lossless. Thanks for pointing that out! I'll remove the uppercase guarantee and update this to "This value ignores all modifiers including but not limited to shift, caps lock, and ctrl". Does that sound good?

I think you need a couple more commas (and maybe the <kbd> tag), but yes.
"This value ignores all modifiers including, but not limited to, Shift, Caps Lock, and Ctrl"

Thank you, I updated the comment adding the kbd tags as you suggested. I did not include the additional commas however because I'm confident that the current form is grammatically correct, and I think we should not get into a discussion about English grammar or writing style in this thread.

I was thinking more like this. (I'm not certain if we need the layout identifier. I left out composition events which could be included as above.) Both KeySym and KeyboardLayout are probably just a u32 internally or even smaller.

/// Platform specific value identifying a key on a keyboard
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeySym { .. }

/// Platform specific value identifying the keyboard layout (from a list of available ones)
#[derive(Clone, Debug, PartialEq, PartialOrd, Hash)]
struct KeyboardLayout { .. }

struct KeyboardEvent {
    keysym: KeySym,
    layout: KeyboardLayout,
    key_state: ElementState,
    repeat: bool,
}

impl Window {
    fn get_layout_name(&self, KeyboardLayout) -> String;
    fn get_active_layout(&self) -> KeyboardLayout;

    /// Translate the key according to the layout, but disregarding CapsLock and modifier keys
    /// 
    /// If a unicode value is produced, it is usually lower-case.
    fn get_key_label(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;

    /// Translate the key according to the layout and modifiers
    fn get_transformed_key(&self, keysym: KeySym, layout: KeyboardLayout) -> KeyLabel;

    /// Attempt to find the [`KeySym`] producing this label
    ///
    /// This is not guaranteed to return a result. On some platforms it may
    /// never return a result. In some cases it may arbitrarily choose one of
    /// multiple [`KeySym`]s producing this label.
    fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;

    /// Attempt to find the [`KeyLocation`] corresponding to a [`KeySym`]
    fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;

    /// Attempt to find the [`KeySym`] corresponding to a [`KeyLocation`]
    fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;
}

#[non_exhaustive]
enum KeyLabel {
    // TODO: maybe this can be backed by [u8; LEN]
    Unicode(&'static str),

    Ctrl(Location),
    Alt(Location),
    ...
    LeftArrow,
    RightArrow,
    ...
    F1,
    F2,
    ...
}

enum Location {
    Standard,
    Left,
    Right,
    Numpad,
}

/// Identifies locations on a keyboard (relative to US Qwerty?)
#[non_exhaustive]
enum KeyLocation {
   ...
}

Edit: added KeyLocation

I don't see the reason for introducing the KeyboardLayout. Please describe the use-case for it in simple terms so I can understand it 😄

What is the difference between KeyLocation and KeySym? Why isn't KeyLocation enough?

I don't see the reason for introducing the KeyboardLayout

In general, translation is dependent on the layout, and I think all major OSs allow convenient layout switching, so there's no guarantee that the active layout is the same one as when the app launched. This means that if we give the app an API for translating KeySymKeyLabel that can be called later, the value may be wrong if we don't account for this. (As an alternative we could embed the layout within KeySym, though if we ever let apps actively switch layouts this will probably come back to haunt us.)

What is the difference between KeyLocation and KeySym? Why isn't KeyLocation enough?

Two things I guess. One is that KeyLocation is whatever enum we define; there's no guarantee that it will contain a unique value for every key on every keyboard so if we map the OS's identifier to this, then back again for translation to KeyLabel, the result may be lossy. Secondly, that's an extra translation step.

Note: KeySym is what I previously called scancode. This is vaguely modelled on the XKB API.

In general, translation is dependent on the layout [...]

Alright that's fair, but if I'm not mistaken the KeyboardLayout is not needed if we stick with the other proposal.

One is that KeyLocation is whatever enum we define [...] Secondly, that's an extra translation step.

I see.

These points are all compltely valid but only when using the architecture you proposed. However I'm not seening why your proposal is objectively better than the other one. I see two aspects where it's an improvement over the other one:

  • The translation from native keysyms into a KeyLabel is only done when the application truly needs it, so one can save a bit of processing time. Although I don't think this alone is good argument because the performance impact is negligible for something that happens as rarely as a keypress.
  • It's cleaner to get text input for non-composition events. This isn't a good enough argument either to switch to the API you are proposing because the other one can be slightly tweaked to match this behaviour. Namely by removing the Char variant of the CompositionEvent and adding a character field to the KeyboardEvent.

At the same time it adds the burden of having to keep track of the KeyboardLayout which of course is not a burden if it has a use that I'm not seeing right now.

Well, this API is certainly less simple to use, but its advantages are a little more extensive:

  • We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them. As @kchibisov said above, you ideally want both (so your API should have two versions of logical_key):

    Well, for something like terminal emulator, yeah, since if you suppress those you'll have to bind everything possible to those control chars youself, which is nearly impossible to do. You can match Control + c and suppress chars in your app without issues if you want just ctrl + c as a hotkey.

  • Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance (by returning None in various functions). Of course this may be a pain to deal with at the application level, but it avoids having to fudge too much (if e.g. your on-screen keyboard won't give you a physical_key).

We have two desirable translations from native keysyms to labels/values: including modifiers and excluding them.

I completely agree and both proposals solve this problem. Yours have 'label' and 'transformed_key', and the other has 'logical_key' and 'Char'. Yours may be a bit cleaner by using the same type for 'label' and 'transformed_key' but again this is something that can be adopted by the other API.

Translations to/from various formats may not be available on all platforms, or may not be easy to write immediately. My API allows partial compliance [...]

This is the strongest argument in my opinion. In fact this convinced me that your proposal is objectively better. The only thing I'm still a bit worried about is whether an implementation exists for this API for the minimal required featureset on all platforms, but I guess the only way to find out is to try implementing it.

Hold up a minute. After giving this more thought I'm back on the fence. The only functions you proposed returning Options are

fn find_keysym(&self, label: &KeyLabel, layout: KeyboardLayout) -> Option<KeySym>;
fn get_key_location(&self, keysym: KeySym) -> Option<KeyLocation>;
fn find_keysym_by_location(&self, location: KeyLocation) -> Option<KeySym>;

But again, it seems to me that these are not required if we choose the other API. Is there a use-case not covered by the other API where these are needed?

Yet again I updated my proposal to contain more documentation and also changed how text input is reported inspired by @dhardy's latest proposal.

Do we need the transformed (character) output in addition to CompositionEvent? I'm still not really clear on how that API works. Does it return CompositionEnd(text) for all text input?

Your updated proposal looks adequate, I guess. I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

e.g. your on-screen keyboard won't give you a physical_key

Do on-screen keyboards ever not imitate real keyboards? Windows' built-in on-screen keyboard looks like it's indistinguishable from a regular keyboard (if you ignore the window focus stuff).

Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode and instead give you 0 on Windows.
I'm not sure if you could "fake" a KeyPosition for these keys since I don't know exactly how special their behaviour is. uievents-code would have you believe that this is reasonable to do, so there might be a way to get this to work on all platforms.


Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

The modifier-independent KeyLabel should probably be a Option<KeyLabel> for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).


I guess. I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

I think it's a good idea to do this. Such variants should also contain platform-specific values which allow you to somewhat uniquely identify the keypress, at least for KeyPosition. This would be somewhat in line with some games I've played (can't remeber which ones) and Discord (Discord lets me use F20-24 as keybinds, but displays it as "UNK131-135").

Bikeshed: I really prefer PhysicalKey/LogicalKey over KeyPosition/KeyLabel. It's not that important, but I think those names are more in line with the Physical/Logical split in the dpi module. Keyboard input is different from dpi, but I feel like it's similar enough for the analogy to make some sense.

Do we need the transformed (character) output in addition to CompositionEvent?

Unfortunately we do. I didn't realize this earlier myself but as @maroider pointed out, the transformed input has to be able to represent non-printable keys like Insert and Delete due to NumLock shenanigans.

Furthermore I updated the code so that a documentation comment hopefully clears up when a CompositionEvent is triggered.

I'm not sure whether KeyLabel and KeyPosition will need None/Unknown variants to handle discrepancies between platforms and input devices.

You are right, definitely. I added those too.

Do on-screen keyboards ever not imitate real keyboards?

There is no guarantee they do imitate real keyboards. Even if they do, they might contain keys that cannot be sensibly mapped to real keyboard positions which should report Unknown positions in my opinion.

Additionally, some keys like my keyboard's dedicated ⏹, ⏮️, ⏯️, and ⏭️ keys don't emit a unique scancode and instead give you 0 on Windows.

Exactly. I don't think that winit should somehow try to come up with a position for those keys. It should just be Unknown.

Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

Thanks again for pointing that out. I updated the API according to this.

The modifier-independent KeyLabel should probably be a Option for now since the web API this would depend on seems to only be an early draft and is implemented only in Chrome and Chromium-derivatives (and partially at that).

Hmm that is unfortunate indeed. Although I think this can be handled relatively gracefuly until that API gains a more widespread support. Instead of making the logical_key an Option, we could check if the key in the keydown event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, call to_lowercase on it, and use that. If it's neither report Unknown. This would at least allow implementing the most common shortcuts which in my view is the primary reason we have the logical_key field.

Bikeshed: I really prefer PhysicalKey/LogicalKey over KeyPosition/KeyLabel

Not a problem for me. In the updated version I renamed them like this.

Instead of making the logical_key an Option, we could check if the key in the keydown event is lowercase and if it is, use that. Otherwise check if it's uppercase if it is, call to_lowercase on it, and use that. If it's neither report Unknown.

If it's implemented this way, then it should be documented very clearly.

Instead of a transformed Option<&'static str>, there should just be a transformed KeyLabel. This way, you can get at the second layer of the numpad, which would otherwise be inaccessible with the current API.

Thanks again for pointing that out. I updated the API according to this.

Alas, I've led you slightly astray on this one. The layer I was worried about is the base layer, which would still be accessible as logical_key. The second layer (accessible with Num Lock on) is the one with numeric inputs. I usually have Num Lock on, so that's why I got myself confused. I think LogicalKey is more semantically correct (and can handle more peculiar layouts), but Option<&'static str> would have worked with most layouts.

   /// Note that the `Unicode` variant may contain multiple characters.
   /// For example when pressing <kbd>^</kbd> using a US-International
   /// layout, this will be `Dead` for the first keypress and will be
   /// `Unicode("^^")` for the second keypress.

I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^". My layout isn't "US-International", but I doubt that's the issue. In either case, since this is a "dead key thing", it should be handled in CompositionEvent.
You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate which will (hopefully) reveal which dead key was pressed, since each dead key ought to produce a unique combining character. The modifier-independent value will still have to make do with LogicalKey::Dead though.
The way a text editor would have to handle this would be to ignore transformed_key when there's a CompositionEvent, since dead keys ought to fire CompositionEvents.

Other than that, the only thing I have issues with is the shape of the text input part of this API. It feels subtly wrong, but I can't seem to figure out a better way to do it.

It does seem clunky. I'd still like to see a tabulation of what data is available. Something like:

  • scancode (maybe)
  • physical location (or unknown)
  • unicode with modifier transforms (utf-8, may have control chars, we can ignore anything that doesn't map to unicode); may have length 0; max length unknown in general?
  • either:

    • "command" key (arrows, F#, Ctrl, MediaPlay, ...) — this may overlap with above

    • unicode, without modifier transforms (usually lower case)

  • compose buffer

Meanwhile we could categorise input as:

  • press, repeat (held), or release
  • compose start, update or end

Optimisations and redundancies:

  • "'Compose start" is always a press event?
  • Release events don't need to include most of the above data so long as they can be matched against press events; currently this is done via scancode
  • Maybe we can attach a lifetime to WindowEvent and use &'a str instead of String
  • Maybe we could make KeyRelease a separate event type and only include the physical location? No, since this may be Unknown.
  • Any more? I don't see any, despite four simultaneous ways of representing input above.

I still think there's reason to consider my scancode/keysym API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.

Also, my experience with KAS is that one might have complex rules to determine how to handle a press, but handling release is normally a simple dictionary-removal using the scancode as a unique key. Without the scancode there is no unique key. Alternatively we could go with @ArturKovacs's API but add scancode, then (maybe) use a different event type with only the scancode for KeyRelease.

It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations. Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only text input and not key input.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout? Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.

This makes it difficult to do better than the current API.

Are we even able to properly associate key, character and IME input? The composition-event branch lists all three as separate events.

For convenience here's one more link to my updated proposal.

I'm not sure if this is how dead keys behave on every platform. It's been my experience that pressing ^ twice on Linux will give me a single "^"

I didn't know this. Alright... I think this should match the platform specific behaviour then as I would certainly expect all applications to behave similarly to eachother on one platform.

In either case, since this is a "dead key thing", it should be handled in CompositionEvent.

That was my thought as well but when I tested it with Firefox on Windows the Javascript API's key field only contained Dead on the first keypress and the isComposing field was set to false.

You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate

Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the CompositionEnd may be detached from the physical keypress.

I'd still like to see a tabulation of what data is available.

The tabulation you just made there seems accurate to me. To answer a few questions there: I'm definitely not against exposing the platform specific scancode through, say an Ext trait. Otherwise the physical key is the platform independent representation of the scancode which I think you know I didn't want to second guess.

I don't have a strong opinion about whether the Unicode variant should be allowed to have a 0 length. If you think it's benefitial to provide a guarantee regarding whether it can be empty, I would say that there is no problem with that. We can just convert empty strings coming form the OS to unknow at the implementation side given that it's not a dead key input.

The max length of the unicode variant of the transformed in input is unknown as far as I can tell. It's up to the platform's implementation how they want to present that aspect of text input to the applications so it can be any positive length yeah.

"'Compose start" is always a press event?

Yes (unless we find out during implementation that this cannot be guaranteed).

Release events don't need to include most of the above data so long as they can be matched against press events; currently this is done via scancode

I think there must be a balance between avoiding redundancy and ease of use. In my opinion sending all the information once more together with the key release does not tip this balance. Could you show a specific example or otherwise explain where this redundancy is undesirable?

Maybe we can attach a lifetime to WindowEvent and use &'a str instead of String

I don't see what would be the argument for using reference here. Even if you have to make an allocation when taking the string from the OS, the performance impact of such allocations are fully negligable. Unless this is proven otherwise I don't think we should consider introducing non-static lifetimes into this interface.

I don't see any, despite four simultaneous ways of representing input above

I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?

I still think there's reason to consider my scancode/keysym API instead of @ArturKovacs's; it has a smaller message size and avoids having to translate to all types of input for both press and release.

With an estimate favoring your argument, it saves let's say 30 bytes of memory when a smartwatch has at least a million times that much memory. It also saves maybe a few microseconds from a function that get's called once every 50.000 microseconds if the user is typing at 20 keystrokes a second which is faster than the fastest recorded typing speed according to my calculations.

So with differences this small I think that lower memory and better performance are not valid arguments for picking a particlar API.

Without the scancode there is no unique key.

I see. I didn't know about this use case before so I just added the scancode to my proposal.

I'm not sure what you are referring to here. Is it the press, repeat, release, and composition?

Physical location, label (unshifted translation by current layout), translated (unicode + control chars), IME.

Of these, physical location and label may be the same thing (some type of VirtualKeyCode), but with the first using a fixed layout (US) and the latter using the active layout.

Translated input and IME input are roughly the same except that the former may include control chars and the latter may be delayed (and may be passed during multiple edit states).

It appears that I have been terribly naive in assuming we can simply use a scancode and translate to a representation of our choice. Win32 uses a "virtual-key code" and a "scan code" and sometimes requires both plus further state for translations.

Eh, you might be able to get away with using MapVirtualKeyW or MapVirtualKeyExW with MAPVK_VSC_TO_VK_EX and the scancode to get the corresponding vkey. Not sure if this would match some of the "interesting" quirks with the scancode+vkey combinations you get directly.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout).

You have to go out of your way to get this information. My understanding is that the non-alphanumeric keys can't be changed much (if at all) from one layout to another.

Keyboard input structs have been extended for non-keyboard input and so may lack a scancode, however in this case we might choose to deliver only _text_ input and not _key_ input.

Good catch. PhysicalKey::Unknown could also work here.

Also, Win32 doesn't appear to have any way of differentiating "physical location" and "key labels according to the current layout"; it simply has a Virtual-Key Code (whose value presumably depends on the current layout). We might be able to get around this by loading a specific layout such as standard US English and using this for translation in addition to the active layout?

Yeah, there's no native solution for this. You'd have to load the current keyboard layout, say before every Event::NewEvents, since I don't think Windows notifies you that the layout has changed. There's probably also a case to be made for loading the keyboard layout on every keyboard event, since you can change the layout with a keyboard shortcut (Win+Space bar).
You'd then have to check the vkey to see if its a functional key, control pad key, arrow key, numpad key, function key, media key or backspace. If it's one of those, then I think you don't have to look further. For the other keys, you might have to use ToAsciiEx or ToUnicodeEx to get the value that's mean to be produced by a keypress + some set of modifiers.

Or we could attempt translation from scan-codes, though I believe those are device dependent so probably not viable.

From "Keyboard Scan Code Specification":

Under all Microsoft operating systems, all keyboards actually transmit Scan Code Set 2 values down the wire from the keyboard to the keyboard port. These values are translated to Scan Code Set 1 by the i8042 port chip.1 The rest of the operating system, and all applications that handle scan codes expect the values to be from Scan Code Set 1. Scan Code Set 3 is not used or required for operation of Microsoft operating systems.

While that document is from the year 2000, it still seems to be the case today that the scancodes you get from Windows are (mostly) from "PS/2 Scan Code Set 1". They are also stable enough that several notable games use them for keybinds. Unfortunately, Windows doesn't emit non-zero scancodes for certain keys, so you can't have every physical key be represented by a scancode. Some of these keys shouldn't be able to be re-mapped in any way, though (outside of gaming keyboard shenanigans), so you might just get away with using the vkey to retrieve physical location for some those keys.

Are we even able to properly associate key, character and IME input? The composition-event branch lists all three as separate events.

Now that's something I truly don't know.

In either case, since this is a "dead key thing", it should be handled in CompositionEvent.

That was my thought as well but when I tested it with Firefox on Windows the Javascript API's key field only contained Dead on the first keypress and the isComposing field was set to false.

You could potentially get away with LogicalKey::Unicode("^") (on the first keypress) here by cheating a little on the Web backend and wait for the first compositionupdate

Again this is not treated as a composition event at least on the Web but even then I don't think this is a good idea because we want to let the applications know about at least the physical aspects of keypress events as soon they happen. But then it becomes difficlut to conviniently communicate the relationship between the physical keypress and the composition event. Although admitedly that aspect is alreaady not perfect in my current proposal because the CompositionEnd may be detached from the physical keypress.

Pressing ^ should fire composition events immediately after the keydown event, unless I've misunderstood Example 26 in the uievents specification. It may, however, be challenging to associate the compositionupdate event with the keydown event.

Thanks for the response @maroider. This seems to indicate that we could omit physical_location from the results and use a function to try mapping (scancode, vkey) to physical_location as well as physical_location → scancode (with both functions returning an Option).

Although that's only viable if all significant platforms function roughly this way.

Eh, you might be able to get away with using MapVirtualKeyW or MapVirtualKeyExW with MAPVK_VSC_TO_VK_EX

Not a good idea since scancode may be 0.

This seems to indicate that we could omit physical_location from the results and use a function to try mapping (scancode, vkey) to physical_location as well as physical_location → scancode (with both functions returning an Option).

Although that's only viable if all significant platforms function roughly this way.

It will likely work on Linux and Windows.
The web backend will likely be tricky, challenging or impossible to implement properly.
What I've been able to gather from the macOS documentation suggests that it might be possible to implement this.
iOS seems to have a clear separation between on-screen and physical keyboards, and what you've just described might be possible to implement.
Android also has a clear separation between physical and on-screen keyboards, but Android's documentation also explicitly unifies on-screen keyboards and IMEs. What you've described should also be possible to implement here.

With the above in mind, should mobile on-screen keyboard input be treated as IME input? Fully adapting the mobile APIs for text input will likely require some additions to the IME API later down the road, but implementing whatever is decided upon here would be a huge improvement over the current state (which is essentially unimplemented).

Not a good idea since scancode may be 0.

I can't believe I forgot that.


EDIT: After rereading this comment, I feel like we need a more complete overview of what's available in what form on each platform. I've got a very incomplete document that's kind-of-sort-of that, but it needs more work.

This seems to indicate that we could omit physical_location from the results and use a function to try mapping (scancode, vkey) to physical_location as well as physical_location → scancode (with both functions returning an Option).

I'm assuming that the purpose of this would be that returning a None from the mapping functions could indicate that the current keyboard doesn't support identifying keys by their position. If this is the case why don't we express this with the Unknown variant or add a new variant expressing specifically this and keep the physical location in the event struct?

I think I speak for everyone here when I say that this issue is exhausting to participate in which I believe is largely because it's very hard to find the best possible API that satisfies all paltform limitations. This however should not prevent us from at least improving on the current state.

Again here's a link to the updated proposal: https://github.com/rust-windowing/winit/issues/753#issuecomment-695211403

Are we even able to properly associate key, character and IME input? The composition-event branch lists all three as separate events.

Well not on the web at least. In particular the composition start event cannot be associated with keyboard events (and the CompositionUpdate and the CompositionEnd don't emit a keypress event with the correct key when I test on Windows). So as annoying as it is, the keydown and the composition events have to be separated similarly to how it is on the web.

You'd have to load the current keyboard layout, say before every Event::NewEvents, since I don't think Windows notifies you that the layout has changed. There's probably also a case to be made for loading the keyboard layout on every keyboard event, since you can change the layout with a keyboard shortcut (Win+Space bar).

It seems to me that Windows does notify the application about this, by WM_INPUTLANGCHANGEREQUEST.


The proposal that I linked above seems possible to do. It has many compromises for all of us and it turned out to be very similar to the web API due to the limitations there. But as @maroider said, this would be a great improvement so I would love to move to implementation as soon as possible. Even though I know that just like the rest of us I neglected this thread the past month.

@dhardy I know, you've been more or less opposed to this proposal and you wish to take an approach relying on using the minimal amount of data possible and using mapping functions to extract more info. I think the biggest advantage of that approach is that it allows to specify mappings only available on a certain platform using extension traits. But I think that the current set of public fields should be made available on every platform. And adding further extensions like mapping a PhysicalKey to a label can be done later even if we choose the API that I'm and @maroider(?) is suggesting. To be honest you seem to be more knowledagbe in this keyboard input handling topic than I am, but as far as I can tell you never addressed my arguments against choosing an API based on negligible differences in memory usage and speed. And at this point, after almost two years of this issue being opened, I think that starting to implement an API that's not perfect but is a great improvement is more valuable than spending more time discussing what would be the best possible API. Don't get me wrong I'm not trying to shut you out of the conversation because as I said you seem to be more knowledgeable in this, I'd just like the discussion to be focused on the truly concerning aspects and once we agree that the stuff is good enough™ we should move on to getting our hands dirty.

I wrote it earlier but I'll just reinforce it that I'm willing to write the implementation for Windows, macOS, the web, Linux, and Android - in this order of preference.

I have a slight concern about Unicode(&'static str), — this may require memory leaks with every keypress to implement. Lets just use String since you don't want WindowEvent<'a> and perf. impact is negligible.

I'm not a big fan of the way LogicalKey combines a UTF-8 string with a finite set of values, but practically I don't see a better alternative, so it's still the best option from my point of view. PhysicalKey is possibly worse since extra keys are simply not represented, but it's still the best option we have.

Regarding the other arguments — I agree with you. I'm not 100% sure of the need to include a scancode (one could try matching physical_key instead), but potentially it's the best option to allow matching key-down and -up events. (Maybe see how the implementations pan out.)

And thanks so much for the effort you're putting in here.

@dhardy

I have a slight concern about Unicode(&'static str), — this may require memory leaks with every keypress to implement. Lets just use String since you don't want WindowEvent<'a> and perf. impact is negligible.

@Osspial addressed the memory leak much earlier in the discussion (emphasis mine).

There's a solution for using &str, actually - we could convert unicode Strings that are constructed at runtime into &'static strs as follows, then we can __internally store a cache of keypress strings__ so that we don't consume additional memory for every keypress:

let string: String = "Hello".to_string();
// Construct a 'static string at runtime.
let x: &'static str = Box::leak(string.into_boxed_str());

Since the number of unique strings produced by a given keyboard layout should be fairly limited, leaking _some_ (mostly) bounded amount of memory should be fine.

For posterity's sake, I think it's a good idea to state why keyboard-types isn't being chosen as a solution since what we've landed on is fairly similar. It also isn't easy to spot the _why_ since the discussion on this issue has grown quite long.

Key::Character

Key::Character uses a full String. @Osspial seems to prefer something which is easier to match on:

  • A full String is more difficult to match on than an enum, str, or char.

Code

The Code enum includes some keys (Fn and FnLock in particular) which seem to be pretty much exclusively handled in hardware. @Osspial notes:

I'd like to leave the hardware-handled keys out of our "officially supported" keys

F13-F24: keyboard-types does not enumerate these. macOS and Linux possibly also support up to F35.
Physical keyboards rarely (if ever) include those keys these days, but I (and possibly others) assign dedicated macro keys to F13-F24.

Location

keyboard-types sticks the Location enum on the KeyboardEvent struct instead of only sticking it on the keys which have multiple locations. Sticking Location only on the keys which have multiple locations makes it abundantly clear when you need to care about the Location by documenting with the types themselves instead of having to document this elsewhere. @Osspial notes:

I like the idea of having a left/right enum to distinguish between sided keys. However, the Location enum should be exposed through variants in the Key enum (e.g. Ctrl(Location)), rather than on the main KeyboardEvent struct we expose.

There was a concern raised by @pyfisch about having to add a Location field on enum variants, but this shouldn't be an issue since new keyboard layouts aren't really made these days. @Osspial notes:

Regarding adding a location to an existing key being a breaking change - there shouldn't be any reason we ever have to do that! Keyboard layouts are fairly static, and only a limited subset of keys are going to have multiple locations on the keyboard. We should be able to keep track of which ones have multiple locations and structure the enum as necessary.

Modifiers

keyboard-types sticks the Modifiers struct on the KeyboardEvent struct, which goes against the current direction in Winit's API, which makes the user track Modifiers through the ModifiersChanged event. The exact reasoning for this escapes me, so I'm not sure how good of an argument this is against adopting keyboard-types directly.

KeyboardEvent

There's a desire to expose platform-specific keyboard event data, but keyboard-events' KeyboardEvent struct doesn't give us this option. It would be possible to expose this data as a separate event, but the data is currently stuck on KeyboardEvent in @ArturKovacs' latest proposal.


I think that's everything that's been discussed that makes adopting keyboard-types undesirable, but it's entirely possible that I've missed something.

@maroider thanks for making this list of differences. Let me treat these items as feature requests for keyboard-types and we maybe avoid having two different APIs.

Key::Character

Key::Character uses a full String.

The current proposal uses &'static str. Since there are usually only a very limited number of key values I think it is reasonable to cache them in winit and use &'static str.

Code

The Code enum includes some keys (Fn and FnLock in particular) which seem to be pretty much exclusively handled in hardware. @Osspial notes:

I'd like to leave the hardware-handled keys out of our "officially supported" keys

It should be sufficient to add a sentence to the documentation that winit does not emit Fn and similar codes because they are handled in hardware.

F13-F24: keyboard-types does not enumerate these. macOS and Linux possibly also support up to F35.
Physical keyboards rarely (if ever) include those keys these days, but I (and possibly others) assign dedicated macro keys to F13-F24.

Fair point. Having a variant F(u8) instead of F1 to F12 should resolve this. The web specification says that these keys are valid, I just didn't implement it at the time. We should also have Soft(u8).

Location

keyboard-types sticks the Location enum on the KeyboardEvent struct instead of only sticking it on the keys which have multiple locations. Sticking Location only on the keys which have multiple locations makes it abundantly clear when you need to care about the Location by documenting with the types themselves instead of having to document this elsewhere.

There is one very practical issue with this (I just thought of): The numpad contains keys for digits and operators. They should emit a Key event with a Unicode character and the location is numpad. This means we would need to stick location to the Unicode variant, which unfortunately mostly negates the advantage of knowing which keys can have different location values.

Modifiers

keyboard-types sticks the Modifiers struct on the KeyboardEvent struct, which goes against the current direction in Winit's API, which makes the user track Modifiers through the ModifiersChanged event. The exact reasoning for this escapes me, so I'm not sure how good of an argument this is against adopting keyboard-types directly.

I don't know either why winit chooses to have a separate modifiers changed event. My guess it that other events like mouse events also need this information (For actions like Control + left click), and winit tried to keep the events small.

KeyboardEvent

There's a desire to expose platform-specific keyboard event data, but keyboard-events' KeyboardEvent struct doesn't give us this option. It would be possible to expose this data as a separate event, but the data is currently stuck on KeyboardEvent in @ArturKovacs' latest proposal.

Easiest solution would be to add an extra field to the keyboard-types KeyboardEvent, which can hold any additional data winit wants to send. If the field is not needed users just set it to ().

Key/Code

Fair point. Having a variant F(u8) instead of F1 to F12 should resolve this. The web specification says that these keys are valid, I just didn't implement it at the time. We should also have Soft(u8).

Could you explain why you believe F(u8) is better than enumerating every F-key? I don't think I've seen any indication that anything above F35 is used anywhere.
I also don't understand the purpose of the Soft keys. The UI Events specification lists each of them as a "General purpose virtual function key, as index x.", but they don't seem to be discussed anywhere else in the specification.

Location

There is one very practical issue with this (I just thought of): The numpad contains keys for digits and operators. They should emit a Key event with a Unicode character and the location is numpad. This means we would need to stick location to the Unicode variant, which unfortunately mostly negates the advantage of knowing which keys can have different location values.

The LogicalKey/Key enumeration isn't supposed to be where Winit exposes character/text input, so I think it might be reasonable to special case the digits and symbols in question and give them their own Key variants.

Modifiers

I don't know either why winit chooses to have a separate modifiers changed event. My guess it that other events like mouse events also need this information (For actions like Control + left click), and winit tried to keep the events small.

Personally, my only real hang-up here is that keyboard-types' KeyboardEvent would make Winit's API surface inconsistent once the deprecated modifers fields on various events get removed.

KeyboardEvent

Easiest solution would be to add an extra field to the _keyboard-types_ KeyboardEvent, which can hold any additional data _winit_ wants to send. If the field is not needed users just set it to ().

This seems like a good solution to me. It might also be nice to have a without_extra method that always returns a KeyboardEvent<()> so users can throw away the extra information. This would also allow crates which want to interface only with keyboard-types to stick with () instead of sprinkling generics all over their code without giving their users paper cuts.

CompositionEvent

I'm not sure what we should do in terms of composition events. The shape and implementation of that particular API is being discussed in #1497 and it seems there is yet to emerge a consensus on that. My gut is telling me that composition event support should be ignored for the purposes of getting closer to closing this issue, but again, I'm not sure.

One more thing I _completely_ forgot:
I've been pushing for the inclusion of a modifier-independent LogicalKey/Key to be delivered alongside the modifier-dependent LogicalKey/Key which is already a part of the UI Events specification. I think that the loss of this part of the API is an acceptable compromise if keyboard-types is able to support Winit's use-case in a satisfactory manner in _all_ other cases. Modifier-independent LogicalKey/Keys can be queried for after the fact once a KeyboardLayout API is added, if I'm not terribly mistaken (although Windows may or may not make that difficult).

Key/Code

Fair point. Having a variant F(u8) instead of F1 to F12 should resolve this. The web specification says that these keys are valid, I just didn't implement it at the time. We should also have Soft(u8).

Could you explain why you believe F(u8) is better than enumerating every F-key? I don't think I've seen any indication that anything above F35 is used anywhere.

Well, personally I have only ever seen F1-F12 in hardware. Some obsolete systems appear to have a F36 key. Meanwhile ncurses has a key enumeration listing F57 as the highest key. I don't know if these keys were used anywhere in the last 20 years, and I am doubtful that they will be used by some winit software.
I can also add F13 through F35 instead since that appears to be the highest number any modern systems we know of support. If it turns out we are wrong about this, it's easy to add more keys.

I also don't understand the purpose of the Soft keys. The UI Events specification lists each of them as a "General purpose virtual function key, as index x.", but they don't seem to be discussed anywhere else in the specification.

Me neither, the W3C spec seems to inherit them from Qt: Qt.Key, StackOverflow elaborates that they were used on some mobile platform: What key is Keys.Context1 tied to in Qt/QML?.

Location

There is one very practical issue with this (I just thought of): The numpad contains keys for digits and operators. They should emit a Key event with a Unicode character and the location is numpad. This means we would need to stick location to the Unicode variant, which unfortunately mostly negates the advantage of knowing which keys can have different location values.

The LogicalKey/Key enumeration isn't supposed to be where Winit exposes character/text input, so I think it might be reasonable to special case the digits and symbols in question and give them their own Key variants.

A special case wouldn't be the least surprising option imho. If this separate variant is implemented a hotkey on "," needs different code than a hotkey on "." for example. I bet winit will see a few issue reports from confused users.
(LogicalKey/Key is useful for text input and used in servo for this purpose. But I don't want to debate this further.)

Modifiers

I don't know either why winit chooses to have a separate modifiers changed event. My guess it that other events like mouse events also need this information (For actions like Control + left click), and winit tried to keep the events small.

Personally, my only _real_ hang-up here is that keyboard-types' KeyboardEvent would make Winit's API surface inconsistent once the deprecated modifers fields on various events get removed.

Yeah, that would be weird. Maybe consider un-deprecating the field and using Modifiers from keyboard-types?

Closing notes

Even if winit decides that the KeyboardEvent from keyboard-types is appropriate for its use-case it is still beneficial to reuse the Key, Code and Modifiers as this means other crates like servo and druid that use keyboard-types don't need to match and convert whole enums but only the keyboard event itself.
A modifier-independent logical key could be added to the structure in the extra field, if keyboard-types is used. This can be added without problems after the initial implementation.
Similarly composition events can be added later.

Location

The LogicalKey/Key enumeration isn't supposed to be where Winit exposes character/text input

This is not true with the our latest API proposal. The LogicalKey provides the unicode character input as affected by the modifiers. This isn't exposed anywhere else so this is what users need to use for text input.

I think it might be reasonable to special case the digits and symbols in question and give them their own Key variants.

A special case wouldn't be the least surprising option imho. If this separate variant is implemented a hotkey on "," needs different code than a hotkey on "." for example.

Yeah I completely agree with the latter statement here. Even if it's used for shortcuts, not text this is going to be problematic. I do like the fact that we expose which keys have a location and which don't, but with this issue considered, I'm very much in favour of placing the location in a separate field, just like in keyboard-types.

#### Modifiers

Yeah, that would be weird. Maybe consider un-deprecating the field and using Modifiers from keyboard-types?

The reasoning behind depracating the modifiers field was to avoid having multiple sources of truth. I remeber that @Osspial formulated this argument in a comment for an issue or a PR but I cannot find it anymore.

Omitting the modifier-independent logical key

I originaly got involved with this issue for this feature. The use case for which I needed this was shortcut keys see #1700. Given that the modifier-dependent key ignores ctrl, I'm fine with removing the modifier-independent key. Ignoring Ctrl is what the web does. To see this, run the following in your browser's console and then press keys in combination with Shift or Ctrl

window.addEventListener("keydown", e => {
   console.log(e.key); 
});

Earlier in the thread @kchibisov argued that ctrl must be included in the translated char. I don't mind this as long as we have access to the character without the ctrl modification. The problem is that if winit emits the control character it's impossible for the application to find the layout-dependent key for it. The situation is similar with going from ctrl-less-char to ctrl-char. As far as I understand, it's practically not possible to do when using a Russian or an Arabic layout.

With that said, I do agree that implementing the modifier-less key is problematic at least on the web. It does seem to be possible with xkbcommon and I'm guessing it's possible on Windows and macOS as well. So my suggestion is to move it to a private, platform dependent field from which this can be accessed using an extension trait. So something like

struct KeyboardEvent {
    // all the public stuff...

    platform: PlatformKeyboardEvent,
}
impl DesktopKeyboardEventExt for KeyboardEvent  {
    pub fn logical_key(&self) -> LogicalKey {
        self.platform.logical_key
    }
}

Furthermore there's a problem with using the keyboard-types Key type for our "modifier-dependent" key because the keyboard-types documentation states that it's an implementation of the UI Events Specification. However according to that specification a control character is not a valid "key attribute value". Placing a control character in such a type would be just as incorrect as making a String instance which has a non-utf8 character.

The modifier-indipendent key on the other hand could use the keyboard-types Key but I think it would be a bad idea to do that because that would leave us using two enums which are identical in shape and only differ in their meaning. That would be a fine example of code duplication especially considering the size of those enums.

The "logical key problem" is back to haunt us. :cry: If I am not mistaken three different "logical keys" have been proposed, each useful for a different set of applications:

  • web key, as specified in UI Events KeyboardEvent key Values by W3C. This is either a named key value or a character with all modifiers applied except Ctrl. Useful for graphical applications and necessary to implement the web API on top of winit.
  • desktop key, character with all modifiers applied including Ctrl, contains control characters. Standard on most operating systems and desktop environments. Needed to implement the Ctrl handling in terminal applications.
  • modifier independent key, character or named key, ignoring all pressed modifers. Usually the same as the printed symbol on a keycap. Requested for keyboard shortcuts.

Did I miss something important? I will add any missing information to the text above.

It is not possible to infer the web key from the desktop key because some information is lost during the Ctrl transformation.
It is also not possible to infer the modifier independent key from the two others because the relation between shifted keys and their unshifted counterparts is dependent on the keyboard layout.
It is probably possible to infer the desktop key from the web key, if it is known whether Ctrl is pressed and not already consumed.

With these considerations in mind I propose a slightly different API based on @ArturKovacs proposal and the recent discussion for KeyboardEvent. It gives access to all requested kinds of logical keys, uses only a single enum for logical keys (that can be re-used from keyboard-types ) and still provides access to all needed values.

struct KeyboardEvent {
    pub state: keyboard_types::State,
    // **web key**
    pub key: keyboard_types::Key,
    pub code: keyboard_types::Code,
    pub location: keyboard_types::Location,
    pub repeat: bool,
    // Modifiers are transmitted using the modifiers changed event.
    // Additional private fields necessary to implement all methods.
    necessary_private_fields: (),
}

impl KeyboardEvent {
    // **web key** 
    /// Key value but with Ctrl transformation applied.
    /// The returned string is identical to the result of
    /// https://xkbcommon.org/doc/current/group__state.html#ga0774b424063b45c88ec0354c77f9a247
    /// on X11/Wayland.
    pub fn key_with_ctrl_transformation() -> Option<String> {
        // This function can either use a private flag, whether Ctrl transformation applies
        // and then execute the algorithm from https://github.com/xkbcommon/libxkbcommon/blob/6268ba1c77e248ea7ef829ec6d3ffedabe17086e/src/state.c#L899
        // or the return value is stored in a private field.
        // Control characters are valid UTF-8, so they can be returned in a string without issue.
        unimplemented!()
    }
    // **modifier independent key**
    /// Key value without any modifiers applied.
    pub fn key_without_modifers() -> Option<String> {
        // This needs to be stored in a private field.
        // I don't know if this is possible to correctly implement on Linux.
        unimplemented!()
    }
}

As far as I can tell neither of key_with_ctrl_transformation and key_without_modifers is possible on the web (which is fine). And here's the reason for each:

key_with_ctrl_transformation: There's the key and the code fields from which one might try infering the ctrl char. If you use the key, you might get я instead of z when using a Russian layout so using the key it's not possible to reliably get the control char. If instead you use the code, you might ket KeyZ when the user pressed the y key on a German layout, so that doesn't work either.

key_without_modifers: The issue here is most easily seen for the numbers below the function keys. If I press Shift+1 I get ! in the key field. But for example on the Hungarian layout ! is not on the 1 key so using the web API it's not possible to tell what would the character be without the Shift key.

But I believe they both should be doable on all desktop platforms. I do like your suggestion however I can still see two points where keyboard-types conflicts with something we did seem to reach a decision about.

1, The usage of Unicode(&'static str).
2, Having the Key, the Code, and the Location be #[non_exhaustive] enums

I'm assuiming that adding non_exhaustive wouldn't be an issue for you but I know that making the unicode variant use a reference is not an option. So after giving it some thought I would be willing to sacrafice static references at the altar of convenience for the rest of the ecosystem. IF we were to do that, below is how the adjusted proposal could look like.

And one more thing. I removed the composition API from this, as I realized that what we had for it so far is pretty much useless and a correct composition (IME) API is out of scope for this issue.

EDIT: The following code has been edited after it was posted. Some comments below might be reflecting to older versions.


Click to see the API

pub struct KeyEvent {
    pub scancode: ScanCode,

    /// Represents the position of a key independent of the
    /// currently active layout.
    /// Conforms to https://www.w3.org/TR/uievents-code/
    /// 
    /// Note that `Fn` and `FnLock` key events are not emmited by `winit`.
    /// These keys are usually handled at the hardware or at the OS level.
    pub physical_key: keyboard_types::Code,

    /// This value is affected by all modifiers except <kbd>Ctrl</kbd>.
    /// 
    /// This is suitable for text input in a GUI application.
    /// 
    /// Note that the `Unicode` variant may contain multiple characters.
    /// For example on Windows when pressing <kbd>^</kbd> using
    /// a US-International layout, this will be `Dead` for the first
    /// keypress and will be `Unicode("^^")` for the second keypress.
    /// It's important that this behaviour might be different on
    /// other platforms. For example Linux systems may emit a  
    /// `Unicode("^")` on the second keypress.
    ///
    /// ## Platform-specific
    /// - **Web:** Dead keys might be reported as the real key instead
    /// of `Dead` depending on the browser/OS.
    pub logical_key: keyboard_types::Key,

    pub location: keyboard_types::Location,
    pub state: keyboard_types::State,
    pub repeat: bool,

    platform_specific: platform::KeyEventExtra,
}

// This would of course only be compiled for dekstop OSes
// like Linux, Windows, macOS, BSD, etc
impl KeyEventDesktopExt for KeyEvent {
    /// This value is affected by all modifiers including but not
    /// limited to <kbd>Shift</kbd>, <kbd>Ctrl</kbd>, and <kbd>Num Lock</kbd>.
    /// 
    /// This is suitable for text input in a terminal application.
    /// 
    /// `None` is returned if the input cannot be translated to a string.
    /// For example dead key input as well as <kbd>F1</kbd> and
    /// <kbd>Home</kbd> among others produce `None`.
    /// 
    /// Note that the resulting string may contain multiple characters.
    /// For example on Windows when pressing <kbd>^</kbd> using
    /// a US-International layout, this will be `None` for the first
    /// keypress and will be `Some("^^")` for the second keypress.
    /// It's important that this behaviour might be different on
    /// other platforms. For example Linux systems may emit a  
    /// `Some("^")` on the second keypress.
    fn char_with_all_modifers(&self) -> Option<String> {
        self.platform_specific.char_with_all_modifers
    }

    /// This value ignores all modifiers including
    /// but not limited to <kbd>Shift</kbd>, <kbd>Caps Lock</kbd>,
    /// and <kbd>Ctrl</kbd>. In most cases this means that the
    /// unicode character in the resulting string is lowercase.
    /// 
    /// This is useful for shortcut key combinations.
    ///
    /// In case `logical_key` reports `Dead`, this will still report the
    /// real key according to the current keyboard layout. This value
    /// cannot be `Dead`.
    fn key_without_modifers(&self) -> keyboard_types::Key {
        // This is possible on linux when using xkbcommon
        // by passing empty modifier masks to `xkb_state_update_mask()`
        // then calling `xkb_state_key_get_one_sym`
        self.platform_specific.char_with_all_modifers
    }
}

/// An opaque struct that uniquely identifies a single physical key on the
/// current platform.
/// 
/// This is distinct from `keyboard_types::Code` because this struct
/// will always be a unique identifier for a specific key however
/// `keyboard_types::Code` may be `Unidentified` for multiple distinct
/// keys.
/// 
/// Furthermore this struct may store a value that cannot be ported
/// to another platform, hence it is opaque. To retreive the underlying
/// value, use one of the platform-dependent extension traits like
/// `XkbScanCodeExt`
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
struct ScanCode {
    /// platform dependent private fields to uniquely identify a single key
}

/// For X11 (and maybe wayland as well?)
impl XkbScanCodeExt for ScanCode {
    /// returns `xkb_keycode_t`
    fn keycode(&self) -> u32 {
        self.keycode
    }
}

Having the Key, the Code, and the Location be #[non_exhaustive] enums

I can change this, when I wrote the code #[non_exhaustive] wasn't stable yet.

    /// This value is affected by all modifiers except <kbd>Ctrl</kbd>.
    /// 
    /// This is suitable for text input in a GUI application.
    /// 
    /// Note that the `Unicode` variant may contain multiple characters.
    /// For example on Windows when pressing <kbd>^</kbd> using
    /// a US-International layout, this will be `Dead` for the first
    /// keypress and will be `Unicode("^^")` for the second keypress.
    /// It's important that this behaviour might be different on
    /// other platforms. For example Linux systems may emit a  
    /// `Unicode("^")` on the second keypress.

This isn't how keys should work. I am aware that some browsers work like that, but the correct behavior would be to use composition events to get the text input for dead keys. (Can we maybe just not specify this right now in detail and consider this again when we implement dead keys and IME?)

Text input

Does the current proposal effectively replace WindowEvent::ReceivedCharacter since you seem to want to expose basic text input inside KeyEvent via logical_key, @ArturKovacs?

Location

Where did it go?

Actually, now that I think about it, I'd like to talk about where to stick Location again.
One use-case I haven't already mentioned is using the Key/Code enums' serialized forms in config files. Amethyst already does this (it uses VirtualKeyCode directly in its Button enum). It would be nice to be able to write (in RON) Code(Ctrl(Right)) to specify the right control key with this scheme. Key(Character(Numpad, "/")) is admittedly of less utility, but it would be more consistent to have Location be on both enums instead of being exposed once on the Code enum and once on the KeyboardEvent struct.
The most consistent API would be one where Location is exposed only on KeyboardEvent.

It would also be nice to have Key::Caracter(Location::Standard, "w") serialize to and from Character("w").

You can't really say "I don't care about which Key::Ctrl, it just has to be a Key::Ctrl" with this scheme tough, so there's still a case for creating your own (de)serialization logic and all that if you really need to represent such a thing. Such a scheme would also be out-of-scope for Winit.

In the end, this isn't necessarily a blocker for me. I'm certainly willing to make my own (de)serialization logic if I need it.

Dead-keys

This isn't how keys should work. I am aware that some browsers work like that, but the correct behavior would be to use composition events to get the text input for dead keys. (Can we maybe just not specify this right now in detail and consider this again when we implement dead keys and IME?)

@pyfisch Example 25 seemingly contradicts this, but I agree with you in that dead keys should be handled through composition events, and that we should hold off on specifying any exact behaviour.

Sacrificing Unicode(&'static str) at the altar of convenience

I'm mostly fine with this for now.

KeyEventDesktopExt

I'm not entirely sold on the names of the methods, but it looks good enough for now.

Text input

Does the current proposal effectively replace WindowEvent::ReceivedCharacter [...]?

Yes it does. Once this is implemented, the WindowEvent::ReceivedCharacter variant should be removed.

Location

Oops I accidentally left it out from the API. I just added it.

It would be nice to be able to write (in RON) Code(Ctrl(Right))

It would indeed be nice. However I don't think there's a better way of exposing the location of the key from winit, and you seem to agree with this based on how you expressed yourself and on the fact that you didn't suggest an alternative.

I'm certainly willing to make my own (de)serialization logic if I need it.

That's reassuring because I would say that this is the right approach to the situation you described. Of course one wouldn't have to write (de)serialization from scratch, you could just define a struct SerializedKey(pub Location, pub Key) which you feed into your choice of (de)serializer.

Dead-keys

@pyfisch Example 25 seemingly contradicts this, but I agree with you in that dead keys should be handled through composition events

Composition events will not be exposed by this API as a proper IME API is out of scope for this issue, and is being tackled by #1497 . I think at least for now we should forward Dead keys the way the target platform reports them to keep consitency across applications on a single platform. I don't think this is impactful enough to discuss this further in this thread.


All-in-all to me this API is good enough to start implementing. I probably won't start this weekend, but I hope I won't jinx this but I'd like to start sometime next week.

@ArturKovacs I've created a keyboard-types branch for winit: https://github.com/pyfisch/keyboard-types/tree/winit It already contains the F13 to F35 function keys and uses #[non_exhaustive] enums. Just open an issue over there for any other required changes.

Dead Keys

I'd like to back-track a bit on "not specifying how these are handled":

  • KeyboardEvent.logical_key always gives characters without diacritics from dead keys. Keys which produce precomposed characters, such as é on AZERTY will be still be exposed here.
  • Some way of accessing "the character/string the platform gave Winit" (ignoring Ctrl, of course) is added, such as an extra method or field. This is how you'd get composed characters with diacritics. This also has the added benefit of making it clear that getting the characters this way is a stop-gap measure.

I've thought about this for some hours now, and I think this should be implementable even on the Web.

As a note, Fn on Mac keyboards is very different from other keyboards and is actually sent as a separate modifier (which macOS uses for some special keys). Apple makes it impossible for 3rd party keyboard vendors to utilize this Fn key, so it's probably safe to ignore it (you have to spoof an Apple USB VID:PID in order for the macOS HID driver to enable parsing for it). But it is possible to emulate with a custom driver.

Apple includes it in the CGEvents though as maskSecondaryFn (https://developer.apple.com/documentation/coregraphics/cgeventflags).

All other Fn usage is typically done inside the keyboard controller itself (though I think I read some new USB HID proposals from Microsoft that adds a new HID event for Fn keys).

Thank you @pyfisch. I noticed that you didn't add non_exhatustive to Location. After thinking about it a bit, I believe that's fine. I think this is all for now; new feature requests might come up while implementing this but I don't anticipate any additional change required in keyboard-types.

Thanks for the insight @haata, yeah if I'm not mistaken the decision for ignoring Fn and FnLock keypresses was to maximize consistency across platforms and avoid unpleasant suprises.

I'd like to preface to following with stating that I once again updated the proposal.

Dead keys

I realized that we cannot really remove the IME API like I thought we could. The problem is that logical_key cannot replace ReceivedCharacter because ReceivedCharacter also forwarded IME results but for example on the web there's no keypress event for composition end so there wouldn't be any scancode, yet our API requires scancode to always map to a real, physical key.

A large pain point is that there doesn't seem to be an agreement on IME handling at #1497. That's why I was so quick to jumping to removing IME because I don't see how this will ever be implemented if we include IME in this issue as well. In order to mitigate the previously mentionned issues with including IME input within the keyboard event, I think we should add a single variant to WindowEvent in the shape of:

pub enum WindowEvent {
    ...

    /// A text composition (IME) ended with the provided text.
    ///
    /// This API is limited on purpose. A more feature ritch IME API is
    /// being discussed at:
    /// https://github.com/rust-windowing/winit/issues/1497
    // TODO remove this once 1497 is complete
    ReceivedImeText(String)
}

Then ReceivedCharacter could indeed be removed. As every key-press related text input would be handled by logical_key and IME generated text input would be handled by ReceivedImeText.

KeyboardEvent.logical_key always gives characters without diacritics from dead keys.

As logical_key is for text input for non-ime keyboard events and as such, it should be affected by dead-key-like composition and it should report the Dead varian for dead keys otherwise text input would behave incorrectly.

Some way of accessing "the character/string the platform gave Winit" (ignoring Ctrl, of course) is added [...] This is how you'd get composed characters with diacritics

If I understand you correctly this is also what logical_key is for. And getting the character this way wouldn't be a stop-gap measure according to how I interpret the API. It would simply be the way to do it.

@ArturKovacs You know what? You're probably right about logical_key, even though I find using logical_key for text input in this way to be an unfortunate decision.

ReceivedImeText seems like a good idea for now.

New API proposal after some discussion at #1788

NOTE: The following code has been edited to address comments that follow, so some comments may be obsolete.

Click for the code

/// Contains the platform-native physical key identifier (aka scancode)
#[derive(Debug, Eq, PartialEq, Hash, Serialize, Deserialize)]
pub enum NativeKeyCode {
    Unidentified,
    Windows(u16),
    MacOS(u16),
    XKB(u32),
}

/// Represents the position of a key independent of the
/// currently active layout.
/// 
/// Conforms to https://www.w3.org/TR/uievents-code/
/// 
/// For most keys the name comes from their US-layout equivalent.
#[derive(Debug, Eq, PartialEq, Hash, Serialize, Deserialize)]
#[non_exhaustive]
pub enum KeyCode {
    A,
    B,
    // ...
    ControlLeft,
    ControlRight,
    // ...
    F1,
    F2,
    // ...

    /// The native scancode is provided (if available) in order
    /// to allow the user to specify keybindings for keys which
    /// are not defined by this API.
    Unidentified(NativeKeyCode)
}

pub trait KeyCodeExtScancode {
    /// The raw value of the platform specific physical key identifier.
    fn to_scancode(self) -> u32;
    /// Constructs a `KeyCode` from a platform specific physical key identifier.
    fn from_scancode(scancode: u32) -> KeyCode;
}
impl KeyCodeExtScancode for KeyCode {
    fn to_scancode(self) -> u32 {
        todo!()
    }
    fn from_scancode(scancode: u32) -> KeyCode {
        todo!()
    }
}

/// Represent a key according to a specific keyboard layout.
#[derive(Debug, Eq, PartialEq, Hash, Serialize, Deserialize, Copy, Clone)]
#[non_exhaustive]
pub enum Key<'a> {
    /// When encoded as UTF-32, consists of one base character and zero or more combining characters
    Character(&'a str),
    Ctrl,
    Shift,
    // ...

    /// Contains the text representation of the dead-key
    /// when available.
    /// 
    /// ## Platform-specific
    /// - **Web:** Always contains `None`
    Dead(Option<char>),

    /// The native scancode is provided (if available) in order
    /// to allow the user to specify keybindings for keys which
    /// are not defined by this API.
    Unidentified(NativeKeyCode)
}

pub enum KeyLocation {
    Standard,
    Left,
    Right,
    Numpad,
}

pub struct KeyEvent {
    pub physical_key: KeyCode,

    /// This value is affected by all modifiers except <kbd>Ctrl</kbd>.
    /// 
    /// This has two use cases:
    /// - Allows querying whether the current input is a Dead key
    /// - Allows handling key-bindings on platforms which don't
    /// support `KeyEventExtModifierSupplement::key_without_modifiers`.
    /// 
    /// ## Platform-specific
    /// - **Web:** Dead keys might be reported as the real key instead
    /// of `Dead` depending on the browser/OS.
    pub logical_key: Key<'static>,

    /// Contains the text produced by this keypress.
    ///
    /// In most cases this is identical to the content
    /// of the `Character` variant of `logical_key`.
    /// However, on Windows when a dead key was pressed earlier
    /// but cannot be combined with the character from this
    /// keypress, the produced text will consist of two characters:
    /// the dead-key-character followed by the character resulting
    /// from this keypress.
    ///
    /// An additional difference from `logical_key` is that
    /// this field stores the text representation of any key
    /// that has such a representation. For example when 
    /// `logical_key` is `Key::Enter`, this field is `Some("\r")`.
    /// 
    /// This is `None` if the current keypress cannot
    /// be interpreted as text.
    /// 
    /// See also: `text_with_all_modifiers()`
    pub text: Option<&'static str>,

    pub location: KeyLocation,
    pub state: ElementState,
    pub repeat: bool,

    pub(crate) platform_specific: platform::KeyEventExtra,
}

/// Additional methods for the `KeyEvent` which cannot be implemented on all
/// platforms.
pub trait KeyEventExtModifierSupplement {
    /// Identical to `KeyEvent::text` but this is affected by <kbd>Ctrl</kbd>.
    ///
    /// For example, pressing <kbd>Ctrl</kbd>+<kbd>a</kbd> produces `Some("\x01")`.
    #[inline]
    fn text_with_all_modifiers(&self) -> &Option<String>;

    /// This value ignores all modifiers including
    /// but not limited to <kbd>Shift</kbd>, <kbd>Caps Lock</kbd>,
    /// and <kbd>Ctrl</kbd>. In most cases this means that the
    /// unicode character in the resulting string is lowercase.
    ///
    /// This is useful for key-bindings / shortcut key combinations.
    ///
    /// In case `logical_key` reports `Dead`, this will still report the
    /// real key according to the current keyboard layout. This value
    /// cannot be `Dead`.
    #[inline]
    fn key_without_modifiers(&self) -> Key<'static>;
}

impl Window {
    /// Reset the dead key state of the keyboard.
    ///
    /// This is useful when a dead key is bound to trigger an action. Then
    /// this function can be called to reset the dead key state so that 
    /// follow-up text input won't be affected by the dead key.
    ///
    /// ## Platform-specific
    /// - **Web:** Does nothing
    // ---------------------------
    // (@ArturKovacs): If this cannot be implemented on every desktop platform
    // at least, then this function should be provided through a platform specific
    // extension trait
    pub fn reset_dead_keys(&self) {
        todo!()
    }
}

Key::Character and KeyEvent.text

It's unclear to me why you're proposing &'static str in one place and String in another when you also write the following:

It was pointed out on several occasions by multiple contributors that having a heap-allocated object in this struct seems undesirable. Heap allocations are notorious for being slow. But slow is relative, and a heap allocation takes about a microsecond (or less) whereas key events are produced at most once about every 50 000 microseconds. That is, a heap allocation here has practically no negative effect.

key_without_modifiers

In case logical_key reports Dead, this will still report the real key according to the current keyboard layout. This value cannot be Dead.

I'm not quite certain why you're specifying this behaviour. The key in the base layer could still be a dead key, although not necessarily the one actively modifying the next keypress.

reset_dead_keys

I agree that it should be removed from the cross-platform API if it can't be reasonably implemented on every desktop platform.
In that case, I'd still like to have it available through a platform-specific extension trait.

// It was pointed out on several occasions by multiple contributors
// that having a heap-allocated object in this struct seems
// undesirable. Heap allocations are notorious for being slow. But
// slow is relative, and a heap allocation takes about a microsecond
// (or less) whereas key events are produced at most once about every
// 50 000 microseconds. That is, a heap allocation here has practically
// no negative effect.

Saying things are fine without actually having tested it seems very dishonest, especially when there's really not much of a reason why it would be necessary to use an allocated string here.


Also why is it not possible to just get the scancode? What if I don't want a KeyCode or NativeKeyCode?


Why is text_with_modifiers a method that does the same thing as the text variable which isn't a method, just adding in another modifier.

fn text_with_all_modifiers(&self) -> &Option;

This API is restrictive: it should return Option<&String> (to allow simply returning None) or Option<&str> or just Option<String> (assuming it's rarely called, allocating on usage may be fine and may optimise out anyway — though this reasoning does fall foul of the previous comment).

Moreover though, I find the KeyEvent struct messy: it's not clear cut which fields should be public vs which should be platform-specific details and it requires quite a bit of translation up-front. I still think it would be better to expose everything through methods with an internal trait (for platform implementation).

Edit: the public methods in the API should have #[inline] since without it cross-crate inlining is not enabled without full LTO. For in-crate APIs it's probably better to just let the compiler figure it out.

It's unclear to me why you're proposing &'static str in one place and String in another

My reasoning for using &'static str is that it's easier to match on than a String. If you have a &'static str you can do the following.

match logical_key {
    Key::Character("a") => do_stuff(),
    _ => (),
}

This does not compile with Charater(String). I guess text could also be a &'static str but that field isn't meant to be matched on and using a String there simplifies the implementation as far as I can see. But I do agree that it's a bit odd to see three different character representations in this small API (the third one being Dead(Option<char>)). With all that considered, I don't really have a preference for one or the other so to satisfy the need to minimize heap allocations I changed the text to be &'static str.

The key in the base layer could still be a dead key, although not necessarily the one actively modifying the next keypress.

The US international layout is almost identical to the US layout; one difference is that while ' (apostrophe) is a dead key on the US internationaly layout it's a regular character key on the US one. If I use both layouts and I have a shortcut key combination including that character, it's best if the action gets triggered on both layouts. Of course the applications could add a branch for this manually but I don't think they should have to do that.

reset_dead_keys

Alright I updated the comment in the proposal, reflecting your suggestion.


Saying things are fine without actually having tested it seems very dishonest, especially when there's really not much of a reason why it would be necessary to use an allocated string here.

I'm trying to be as honest and objective as I can. I do make mistakes however. I changed the text field so that it is &'static str.

Also why is it not possible to just get the scancode? What if I don't want a KeyCode or NativeKeyCode?

We've been trying to focus on a minimal viable product and I don't remember this being raised earlier. Such a function can of course be added fairly easily so I added this as KeyCodeExtScancode::to_scancode to the proposal.

Why is text_with_modifiers a method that does the same thing as the text variable which isn't a method, just adding in another modifier.

I don't know of a reasonably simple way to deduce what the control character is, given the text field. For example on a Russian layout Ctrl+Ф has the same location as Ctrl+A on US layout and the two of the has the same effect in most programs on Windows. For example in CMD with the "Enable Ctrl key shortcuts" turned off, both of these produce the ^A character (character code 1).

For example on a Russian layout Ctrl+Ф has the same location as Ctrl+A on US layout and the two of the has the same effect in most programs on Windows. For example in CMD with the "Enable Ctrl key shortcuts" turned off, both of these produce the ^A character (character code 1).

That sounds very wrong to me and I certainly wouldn't use cmd as a good example for anything at all. Though unfortunately I don't have access to any russians at the moment.

This API is restrictive: it should return Option<&String> (to allow simply returning None) or Option<&str> or just Option

Good point. Option<&str> seems best out of these.

Moreover though, I find the KeyEvent struct messy: it's not clear cut which fields should be public vs which should be platform-specific details

I don't know what you mean by this. Only platfom_specific is platform specific. The rest of the fields guarantee a reasonably platform agnostic behaviour.

the public methods in the API should have #[inline] since without it cross-crate inlining is not enabled without full LTO

Yeah, this seems very reasonable. I added it where it seemed reasonable (where the function returns a private field).

That sounds very wrong to me and I certainly wouldn't use cmd as a good example for anything at all

I'm experiencing the same effect on the default Terminal on macOS. Nevertheless we could remove text_with_modifiers and only add it when theres a specific demand for it.

I seemed to remember that someone specifically requested having a way to get the input which includes the ctrl chartarters but I wasn't sure. Now that I checked again, it was @kchibisov in this comment: https://github.com/rust-windowing/winit/issues/753#issuecomment-693280304 . Frankly I don't feel qualified at the moment to argue on either side, so I'd be interested in @kchibisov 's opinion here.

IIIRC, I was the one pushing to have all but a few control characters to be relegated to text_with_modifiers's predecessor key_with_all_modifiers (or something like that). After (re-)reading @kchibisov's comment, I think merging text_with_modifiers into text should be fine. The control characters don't seem to interact with dead keys at all (on Windows), which is the only thing I could think of which _may_ have posed some sort of issue.

The following sequence where ~ is a dead key

  1. ~
  2. Ctrl+a
  3. a

should produce:

  1. ""
  2. "\u{0x1}"
  3. "ã"

I seemed to remember that someone specifically requested having a way to get the input which includes the ctrl chartarters but I wasn't sure. Now that I checked again, it was @kchibisov in this comment: #753 (comment) . Frankly I don't feel qualified at the moment to argue on either side, so I'd be interested in @kchibisov 's opinion here.

Yes, of course it's required. Any application dealing with just raw text input will need this, especially terminal emulators. I'm just saying that pressing a key that isn't A with control and having it do the same thing as Ctrl+A seems very wrong to me.

I think merging text_with_modifiers into text should be fine

I like this. Removing unncecesarry things is always nice. Although there is one problem, which makes me a bit hesitant about merging the two. As far as I can tell, the web cannot forward control characters from control + key combinations, meaning that the text field would exhibit suprising differences between platforms. Perhaps the benefits outweigh the drawbacks, if they do documentation could just include something like

    /// ## Platform-specific
    /// - **Web:** There *must not* be any assumptions made whether a key
    /// pressed together with <kbd>Ctrl</kbd> would produce control characters or not.
    /// It is best to ignore this field while the `ModifiersState::CONTROL` modifier is active.
    pub text: Option<&'static str>,

I tried to make the wording such that once browsers add support for receiving the control characters, this field can be implemented identical to how it would be on the desktop and the extra note can be removed.

I'm just saying that pressing a key that isn't A with control and having it do the same thing as Ctrl+A seems very wrong to me.

I'm a bit confused and I don't want to ignore you, so please clarify: is this a personal opinion about those terminal emulators or are you arguing for something in relation to the API at hand?

There are two more topics I would like to discuss

Deserializing a Key

Due to the fact that it contains a static lifetime, it can only be deserialized from a static string/object. I believe this has very little utility in real world applications. We could remove the deserialize attribute, forcing applications to use either the KeyCode or a custom object representing Key in configuration files. The same applies to KeyEvent although I don't see any use for being able to (de)serialize such an object anyways.

Platform friendly line breaks in text

Following the discussion at #477 it seems very likely that most developers will expect text to contain a platform native line break, i.e. "\n" on Unix and "\r\n" on Windows. I think this would fit the current API really well because one could still easily find which key was pressed from the physical_key or the logical_key while this change in text would make text input handling easier downstream.

Platform friendly line breaks in text

If we go this route, then the separation between text and text_with_modifers should stay.

We could remove the deserialize attribute, forcing applications to use either the KeyCode or a custom object representing Key in configuration files.

Winit should definitely be able to provide deserialization facilities for key bindings. One way or another it's going to be needed anyways, winit is the correct place to provide it. KeyCode is obviously not sufficient.

Following the discussion at #477 it seems very likely that most developers will expect text to contain a platform native line break, i.e. "\n" on Unix and "\r\n" on Windows.

I disagree with this. Nothing wrong with always sending \r, in fact it's the correct thing to do, at least on Linux. Sending \n on Linux isn't more correct, but the opposite.

[...] deserialization facilities for key bindings. One way or another it's going to be needed anyways, winit is the correct place to provide it.

I absolutely agree that this is going to be needed at some layer either way. I do thing though, that there's an important distinction between "key bindings" and a Key value. Key bindings can be a combination of Keys and I don't think that winit should provide any representation for key combinations. This is something that could be provided entiher by an "extension crate" or a higher level framework.

However I'm not fundamentally against allowing Key instances to be deserialized, but if we allow it, it should be done in a way that's practical for most (or all) real use cases. I don't think that only being able to deserialize from a 'static object is practical. So I can think of two solutions here. We either revert back to Character(String) or we make a pub enum GenericKey<T: AsRef<str>>. The second approach could be implemented along these lines:

Click for the code

pub type Key = GenericKey<&'static str>;
pub type DeserializedKey = GenericKey<String>;

macro_rules! define_generic_key {
    {$($key_variant: ident),+} => {
        #[derive(Serialize, Deserialize)]
        pub enum GenericKey<T: AsRef<str>> {
            Character(T),
            $($key_variant),+
        }
        impl<T: AsRef<str>> GenericKey<T> {
            fn matches<Q: AsRef<str>>(&self, other: &GenericKey<Q>) -> bool {
                match self {
                    GenericKey::Character(s) => {
                        if let GenericKey::Character(o) = other {
                            s.as_ref() == o.as_ref()
                        } else {
                            false
                        }
                    }
                    $(GenericKey::$key_variant => {
                        matches!(other, GenericKey::$key_variant)
                    })+
                }
            }
        } 
    }
}
define_generic_key! {
    F1,
    F2
    // ...
}


I disagree with this. Nothing wrong with always sending \r, in fact it's the correct thing to do, at least on Linux. Sending \n on Linux isn't _more_ correct, but the opposite

Consider tying your view to something external to you. For example "sending \r is better because that's what the OS sends anyways". With that said what is more correct for this field, in my opinion depends on what would be the cleanest and easiest to use. The two use cases for keyboard input are handling key bindings and inserting text. The physical_key and logical_key are designed for the former, while text is designed for the latter. I think it would be better to use platform native line breaks in the text field because that that minimizes the post processing needed for text input downstream.

With that said what is more correct for this field, in my opinion depends on what would be the cleanest and easiest to use.

So a unified \r it is then?

I think it would be better to use platform native line breaks in the text field because that that minimizes the post processing needed for text input downstream.

What is a "text field"? Text input depends strongly on where it is used. For Alacritty at least \r would cause less work. Why would winit post-process it, just for applications to post-process it again. Unless the post-processing is always necessary, might as well delay it until you actually need to do it so you don't do it 17 times. Also pointlessly sending two characters when one is fully sufficient is again just causing useless trouble.

So a unified \r it is then?

Cheeky :D

For Alacritty at least \r would cause _less_ work. Why would winit post-process it, just for applications to post-process it again.

Hmm I didn't know that. Also I just realized that even text editors might want to use something else than the platform native one, depending on the output file or user preference. Alright let's just use \r and have it be documented. I'll update the proposal in a minute.

Regarding deserialization, what do you think about using Character(String) or enum GenericKey<T: AsRef<str>>?

Couldn't we have Key simply be a enum Key<'a> { .. }, and define KeyEvent.logical_key as a Key<'static>?

Regarding deserialization, what do you think about using Character(String) or enum GenericKey>?

For me personally, that's just one more reason to use char really.

For me personally, that's just one more reason to use char really.

Which isn't really an option because of the web backend.

EDIT: Never mind about that other bit.

I updated the proposal: https://github.com/rust-windowing/winit/issues/753#issuecomment-753307584

(Oops I forgot that we already had documented the Enter key translating to \r)

Yeah, using Key<'a> looks much cleaner. It's not as flexible, it cannot be used like this: fn parse_config() -> Key but I think until we get a request to change it, this will suffice.

Due to the fact that it contains a static lifetime, it can only be deserialized from a static string/object.

You could always just leak the string. If excessive memory allocations are a concern, you could use a cache or interner.

You could always just leak the string. If excessive memory allocations are a concern, you could use a cache or interner.

It might then make sense to do something like this

#[derive(Deserialize, ...)]
pub enum Key {
    #[serde(deserialize_with = "deserialize_key_character")] 
    Character(&'static str),
    ...
}

and have deserialize_key_character use the HashMap of leaked strings we are going to have to use anyway to use &'static strs.

Another option would be something like the following:

pub enum Key<'a> {
    Character(&'a str),
    ...
}

impl<'a> Key<'a> {
    pub fn to_static(self) -> Key<'static> {
        match self {
            Self::Character(s) => // Use the `HashMap` of leaked strings to get a `'static str`
            k => k,
        }
    }
}

Of the two, I think I prefer the second approach since it reduces the amount of implicit leaking.

I'm not happy with either. The second would be sort-of okay, but you can't use the HashMap that winit uses internally because that's going to look like this HashMap<NativeKeyCode, &'static str> but you don't know what's the keycode (scancode) for the Key you are trying to make static. However you could use a global HashSet<&'static str> to at least avoid having duplicates among deserialized strings.

But even then what you wrote @maroider does not compile because due to k => k, the compiler requires self to have a static lifetime. So if we did this, we would either have to copy paste all variants of the enum or would have to do very similar macro magic to what I proposed with GenericKey. So I think that we should either use the GenericKey approach or we just go with Key<'a> without the to_static conversion.

We can't really use a HashMap<NativeKeyCode, &'static str> because the keyboard layout could change while the application is runnig. We'd have to use a HashSet<&'static str> in both cases if we want to avoid leaking more than we absolutely have to.

As for the lifetime thing: we could transmute the lifetime in the catch-all case, but a macro would indeed be better.

We can't really use a HashMap<NativeKeyCode, &'static str> because the keyboard layout could change while the application is runnig

I believe we have to use the map at least on Windows because if I'm not mistaken that's the only reasonable way to get the proper logical_key for key release events. (There are no WM_CHAR messages for key release). And the way you use a map correctly is that you detect if the current keyboard layout is identical to what you have in the map, and if it's not then you iterate through every possible virtual key value and query the scancode with MapVirtualKey and the character with ToUnicode. This is how Firefox does it too IIRC.

As for the lifetime thing: we could transmute the lifetime in the catch-all case

How do you know that that's sound?

I believe we have to use the map at least on Windows because if I'm not mistaken that's the only reasonable way to get the proper logical_key for key release events. (There are no WM_CHAR messages for key release).

Ah, you are of course correct here. But I think we'd also want a HashSet<&'static str>, anyway, to avoid repeatedly leaking the same strings after layout changes.

How do you know that that's sound?

Well, it's not my favorite trick, but it should be sound since we'd only be transmuting the lifetime on the variants without a reference.

pub enum Key<'a> {
    Character(&'a str),
    Dead(Option<char>),
    F1,
    F2,
    F3,
    ...
}

impl<'a> Key<'a> {
    pub fn to_static(self) -> Key<'static> {
        match self {
            Self::Character(s) => // Use a `HashSet` of leaked strings to get a `'static str`
            // SAFETY: This should be safe since 'a only means something for the `Character` variant.
            k => unsafe { std::mem::transmute::<Key<'a>, Key<'static>>(k) },
        }
    }
}

Using a macro for the conversion would of course be better.

I've been thinking about merging text_with_all_modifiers into text. It really doesn't make much sense to provide both on platforms that support text_with_all_modifiers. But merging the two is a bit awkward due to the web not providing the control-modified-character. Maybe the best approach is to provide the text exclusively through platform specific traits:

pub trait KeyEventExtModifierSupplement {
    /// ... affected by all modifiers including Ctrl ...
    fn text(&self) -> Option<&str>;
}

pub trait KeyEventExtWeb {
    /// ... affected by all modifiers except Ctrl ...
    fn web_text(&self) -> Option<&str>;
}
Was this page helpful?
0 / 5 - 0 ratings