The Microsoft Terminal does not seem to correctly handle dead-keys for a custom keyboard layout made with Microsoft Keyboard Layout Creator.
In fact, it does not correctly handle dead-keys which have a mapping for themselves in the keyboard layout.
|Platform|ServicePack|Version|VersionString|
|---|---|---|---|
|Win32NT||10.0.19018.0|Microsoft Windows NT 10.0.19018.0|
Additionnal software:
This issue happens with many dead-keys. For the purpose of this discussion, let us focus on one particular dead-key.
On a French AZERTY keyboard, the AltGr+2 combination is a dead-key used to input the ~ (TILDE, U+007E) diacritical mark. By itself, this sequence does not produce any output to the Terminal.
Obviously, when this sequence if followed by a supported character, for instance o, the corresponding õ character is typed into the Terminal. A spacing version of the ~ character can be typed into the Terminal by following the dead-key sequence with a Space.
However, when using a custom AZERTY-NF keyboard layout, the corresponding sequence does not work.
In order to reproduce this issue:
~ (COMBINING TILDE, U+0303) has been typed into the Terminal.õ has been typed into the Terminal.
Alternatively, you can reproduce this issue by following this other set of steps:
File|Open Existing Keyboard… menu command, open the French AZERTY keyboard layout (named _Français_ in the list)Add an additionnal mapping for the AltGr+2 dead key like so:
é key (VK_2) and push the All… button in the corresponding dialog.ctrl-alt-<key> dead key whose value is 0x007e and push the … button.U+007e (TILDE) to the composite U+0303 (COMBINING TILDE) as illustrated by the following screenshot.
Project|Build DLL and Setup Package menu command, compile the resulting layout. It should be generated in your %USERPROFILE%\Documents\layout01 folder.Install the custom layout by running Setup.exe.
~ (COMBINING TILDE, U+0303) has been typed into the Terminal.õ has been typed into the Terminal.Similarly to what happens with the regular French keyboard, the AltGr+n combination on an AZERTY-NF keyboard layout is a dead-key and should not produce any output to the Terminal.
Followed by a supported character, for instance o, this dead-key sequence corresponds to the ~ diacritical mark and should type the character õ by itself on the terminal.
When typing AltGr+n, o, the string ~õ is typed into the Terminal, whereas the string õ was expected instead.
The heart of the problem lies in the following conditions:
WM_CHAR message that ultimately calls the ToUnicodeEx function. I believe calling ToUnicodeEx as part of the WM_CHAR handler is incorrect, because the effect of this function depends on the previous internal keyboard state.
Obviously, the expected behaviour is correctly observed in the builtin CMD, PowerShell and WSL terminals on Windows.
By compiling the latest version of the source code for Microsoft Terminal, I can see that the expected behaviour is also correctly observed when running the OpenConsole.exe program. The faulty behaviour only happens in the WindowsTerminal application launched from the CascadiaPackage.
I was able to pinpoint a troublesome behaviour at this location in the source code.
There, the ToUnicodeEx^ function does not return the correct result.
The reason for that is that this function is called inside a WM_CHAR handler where the keyboard internal state already contains the current character. However, the ToUnicodeEx function changes its behaviour depending on the current keyboard state.
For instance, consider the following table, that summarizes the parameters supplied to the ToUnicodeEx function:
|Layout|Sequence|Virtual Key Code|Scan Code|States|
|---|---|---|---|---|
|French AZERTY|AltGr+2|VK_2 (0x32)|0x03|CONTROL+MENU|
|French AZERTY-NF|AltGr+n|VK _N (0x4e)|0x31|CONTROL+MENU|
When called by itself in a a simple program, the ToUnicodeEx returns the correct result for both layouts, which is:
-1 indicating a dead key."~" (TILDE, U+007E).However, when called from within the WM_CHAR handler, the keyboard state contains the current (dead key character), the ToUnicodeEx function tries to combine the character corresponding to the supplied virtual key code, scan code and key states with the current internal keyboard state for the process.
2 indicating that a two-character string is returned."~~" (TILDE, U+007E; TILDE, U+007E)In fact, the ToUnicodeEx function _combines_ the supplied character with the previous keyboard state. In the case of virtually all keyboard layouts on the planet, there is no such valid combination. Therefore, the ToUnicodeEx function emits a string where the corresponding dead key character is emitted twice. That is what happens on Windows in all applications when typing the sequence AltGr+2 twice on a French AZERTY keyboard.
Microsoft Terminal takes advantage of this fact, and when the ToUnicodeEx function returns anything other than 1 (a single character) or -1 (a dead key character), it does not produce any output.
As for the AZERTY-NF layout of the customized layout described in the section for how to reproduce this issue, the ToUnicodeEx function behaves like so:
1 indicating that a single character string is returned."~" (COMBINING TILDE, U+0303).This is consistent because the internal keyboard state already contains the dead key character and when combined with itself again, the keyboard layout is instructed to emit this particular character.
Because this results in a single character string, Microsoft Terminal writes this character to the output.
I understand that this is a freak issue but I believe virtually all other Windows terminal-like apps handle the custom keyboard layout correctly, which seem to indicate that this is a bug in Microsoft Terminal only.
@lhecker, this might be vaguely related to the alt gr stuff...
This is a really awesome issue description @springcomp. Thanks! 🙂
Although there's a small nit from my side: The code you linked is basically called from something like the WM_KEYDOWN event rather than WM_CHAR. 😄 (link)
But yeah, this is a bug I've already noticed to some extent - it's urgency simply hasn't been "triaged" yet.
As I've mentioned in the past apart from us not supporting surrogates etc., our usage of ToUnicodeEx doesn't seem to be quite there yet.
I'm not quite familiar with the Win32 API yet, but I'm pretty sure that the underlying issue stems from us calling ToUnicodeEx within the scope of WinRT's UIElement.KeyDown event instead of an actual WM_KEYDOWN event handler.
@miniksa I've been _really_ packed with tasks in the last few weeks, but since this issue falls into the scope of "my" work on that _CharacterFromKeyEvent method I wouldn't mind taking a stab at this next weekend. (...if no one else fixes it until then of course.)
@springcomp I'm starting to have more time for side projects like this again. 🙂
Today I briefly took a stab at it and - tl;dr - was able to reproduce the behavior you described.
The issue seems to be that you cannot reliably call ToUnicode*() from within a WinRT key event handler. It appears it simply produces unexpected/incorrect results for dead keys.
Our WinRT based project always yields a return value of 1 for ToUnicodeEx() while "old" projects like the one linked above return the expected result of -1, indicating a dead key that we should not print out.
As far as I can see the only thing we can do is to work around this issue somehow by detecting dead key combinations (but how?) or build our own userland ToUnicodeEx(). 😕
I personally have no clue which existing function would be capable of doing that apart from ToUnicodeEx(). I hope I can come up with something in the future.
Luckily this issue should not affect too many people I believe.
Observations while reproducing the behavior
The only other thing that I had to do, was to fully restart my PC after installing your keyboard layout, otherwise Windows refused to view it as a valid one. Alright let's assume I press AltGrnSpace (in that order)... Due to recent changes in the rendering pipeline this looks a bit different now:  To ensure the cause is the AltGr handling code I then slightly modified the mentioned `_CharacterFromKeyEvent` method to hack in some debug output: https://gist.github.com/lhecker/48ad2ebb32471b00a55913072216a4e9#file-terminal-cpp-L17-L37 The output for the above sequence is:
_CharacterFromKeyEvent: 1 U+0303
...while it actually should return `-1` instead, since AltGrn is a dead key & combining character. Damn. 😢
Your observation @springcomp is sligthly incorrect in that regard though as mentioned before:
Our use of `ToUnicodeEx()` is not the one combining the dead key into "\~õ".
Instead `ToUnicodeEx()` is invoked in the key event handler and incorrectly returns 1, which is why we send `U+0303` to the shell appearing to you as a "\~" character.
Now when you press e.g. O Windows still _correctly_ combines the keyboard state and sends us a character event containing "õ" which we then send to the shell.
What we need to stop doing is to send `U+0303`, but to do that we need `ToUnicodeEx()` to start returning -1 [here](https://github.com/microsoft/terminal/blob/9145903e118bfa3b48749694689aec3a191bc4ac/src/cascadia/TerminalCore/Terminal.cpp#L332).
Side note: Yeah [the check right after that](https://github.com/microsoft/terminal/blob/9145903e118bfa3b48749694689aec3a191bc4ac/src/cascadia/TerminalCore/Terminal.cpp#L335) should actually be `result == 1 ? ...` instead, otherwise we would still send the `U+0303` regardless.
Thanks for taking the time to look into this.
Your analysis is consistent with my observations and my belief is that you cannot reliably call ToUnicodeEx __because__ of the global state that is persisted to the keyboard. That is, ToUnicodeEx gives consistent results, but it is wrong to call it as part of a WM_KEYDOWN handler.
I stand by my previous analysis, and what you observe is consistent with the keyboard layout.
Let me explain :
My keyboard layout has, among other mappings, the following behaviour:
õ.COMBINING TILDE (U+0303).This is by design and a recommendation of the recent standard I'm trying to be compliant with. That is as much as possible, use _dead keys_ where possible, but sometimes, you can use _combining characters_ to produce characters with more than one diacritical.
That said, let's analyse what happens when you press a key:
The user presses AltGr+n.
The terminal receives WM_KEYDOWN.
The _CharacterFromVirtualKey gets called eventually.
Then ToUnicodeEx is called with the following arguments:
VK_N0x31CONTROL+MENUTherefore, ToUnicodeEx attempts to find the character corresponding to __the combination__ of the global state and its arguments and find the mapping described in Case b) above. The output buffer is populated with a single character COMBINING TILDE (U+0303) and returns 1.
Thus, Terminal displays this character even though the user does not expect it.
When the user then presses o, Terminal receives WM_CHAR that Windows has synthetized to produce the correct character, which, in turn gets displayed. Thus resulting in the string ~õ described in my original description .
The situation is exactly the same as with the traditional AZERTY layout.
On this layout, repeating a dead key does not produce a valid character combination, therefore, Windows outputs two TILDE (U+007E) characters.
Again, let's analyse what happens when you press a key:
The user presses AltGr+2.
The terminal receives WM_KEYDOWN.
The _CharacterFromVirtualKey gets called eventually.
Then ToUnicodeEx is called with the following arguments:
VK_20x03CONTROL+MENUToUnicodeEx attempts to combine the global keyboard state with its arguments. As explained, pressing TILDE twice is not a valid character. Therefore, the output buffer is populated with the string ~~ and ToUnicodeEx returns 2. Because the return value is neither -1 or 1, Terminal does not output any character, which is consistent with what the user expects.
Several observations:
WM_KEYDOWN, Windows would have sent a WM_DEADCHAR but Terminal does not seem to have code for handling this message.ToUnicodeEx does not seem too difficult. I would do it with a combination of VkScanKeyEx and the current modifier keys (Alt, Ctrl, Shift) as well as the various tables returned by the keyboard layout to return a reliable value that does not depend on the global keyboard state.I'll try and hack a proof of concept userland function in the coming days.
Although, I think the correct and future proof solution would rather be to
WM_KEYDOWNWM_DEADCHAR which, to my knowledge, is not currently handled.WM_CHAR which carries the correct character and therefore does not require Terminal to try and infer the character.I wanted to say thanks for taking the time.
It's true that this issue does not affect too many people... Yet 😉
Thanks for the explanation about ToUnicodeEx!
I'm not entirely clear on one part yet though:
Assuming the "Global state of the keyboard contains AltGr+n".
You're saying that if we now call ToUnicodeEx(VK_N, 0x03, CONTROL+MENU, ...) it combines the global state (AltGr+n) with the method arguments.
Aren't those arguments basically the same as the global state itself? Which mean we're always trying to collapse two identical things (global state and arguments)?
Apart from this:
We absolutely need to handle key events because a lot of interactive terminal applications have bindings on e.g. Alt + arrow keys, PageUp, PageDown etc. which you can't track with character events (obviously).
What we could try and do is to return early here if ToUnicodeEx returns non-zero for the given arguments (= it's a potential character).
Then we could add support for the keyboard state (ControlKeyStates) to the SendCharEvent method just like for SendKeyEvent.
Using this information we could then still generate the e.g. proper escape codes if Ctrl/Alt/Shift is being pressed (e.g. Alt, must still generate and send the 2 output characters \x1b, to the shell).
P.S.: I can tackle this soon-ish. As I said I'm pretty positive that I'll find free time to work on this more often from now on. 🙂
That said I'm extremely terribly afraid of breaking everyone's workflow if we follow my idea (or something similar to it) and cause another "MEGATHREAD" issue. 😄
Aren't those arguments basically the same as the global state itself? Which mean we're always trying to collapse two identical things (global state and arguments)?
Yes! That’s what I’m saying. I’m pretty positive this is what is happening.
To definitely convince yourselves, you could try and hack _CharacterFromKeyEvent and, while the user presses AltGr+n, you could feed ToUnicodeEx with, say, the following arguments:
VK_O (0x4F)0x18I’m pretty sure the output buffer would be populated with the string "õ" (a single character) and that ToUnicodeEx would return 1.
@springcomp @miniksa To test my idea I made a hacky WIPPPP modification which ignores all key events that contain characters (i.e. ToUnicodeEx() returns non-zero).
This causes the character event handler to kick instead which is (hopefully) always given a properly composited unicode character.
And yes, basic functionality seems to work! Regular AltGr combinations seem to still work as well as the AltGrn dead key of this issue!
You can find it here. (And this is the diff.)
Note: I haven't reimplemented support for surrogate sequences yet (and the code is quite hacky).
I'm fairly concerned that I might break some delicate handling though, since I need to perfectly replicate the TerminalInput::HandleKey functionality within the character handler. This is more serious than any relevant XKCD. 😱
But if we _do_ get this to work properly we're probably circumventing a whole lot of problems in the future, since character events are surely quite a bit more robust than key events, aren't they?
I love this!
Sure seems a more robust solution, if we can identify and fix for corner cases.
I'm looking forward to seeing something along these lines make it to the main repo.
I've been slacking off a bit over Christmas regarding this issue. I'm sorry for that. 'Been learning some vue.js - the framework with 'tegridy. 😄
I just wanted to let you guys know that I'm still working on it.
I believe I got my changes to be mostly functional now: "Weird" non-english languages work better than ever as far as I can see! Again, here are the changes.
The only thing missing is the proper handling of surrogates in TerminalInput::HandleKey.)
:tada:This issue was addressed in #4192, which has now been successfully released as Windows Terminal Preview v0.11.1121.0.:tada:
Handy links:
Most helpful comment
I've been slacking off a bit over Christmas regarding this issue. I'm sorry for that. 'Been learning some vue.js - the framework with 'tegridy. 😄
I just wanted to let you guys know that I'm still working on it.
I believe I got my changes to be mostly functional now: "Weird" non-english languages work better than ever as far as I can see! Again, here are the changes.
The only thing missing is the proper handling of surrogates in
TerminalInput::HandleKey.)