Terminal: Korean IME does not work as expected

Created on 15 Jan 2020  Β·  24Comments  Β·  Source: microsoft/terminal

Environment

Windows build number: Windows 10 2004 19041.21
Windows Terminal version (if applicable): 0.8.10091.0

Any other software?

Steps to reproduce

  • Run Windows PowerShell tab
  • Typing text $x = 'ν•œκΈ€'

image

Expected behavior

  • The resulted text should be $x = 'ν•œκΈ€'

Actual behavior

  • But the resulted text was $x = 'ν•œκ·Έγ„±κΈ€'
Area-Input Area-TerminalControl Help Wanted Issue-Bug Priority-1 Product-Terminal Resolution-Fix-Committed v1-Scrubbed

Most helpful comment

Me and @guswns0528 tried https://github.com/microsoft/terminal/pull/4796 and it worked perfectly! Still don't know why the 2 TextUpdate events were fired but looks like your patch fixed it anyway. πŸ‘

A video typing 'κ°μ‚¬ν•©λ‹ˆλ‹€!' in new microsoft terminal

All 24 comments

The bug breaks my daily workflow :(

Thanks for the bug report! @rkttu /@yjh0502 Does this repro in a legacy console (pwsh.exe) window? Or does this only happen in the Windows Terminal?

@zadjii-msft Currently it only happens in WT, but legacy console's behavior also weird. It does not display character compositions. For example, to make a character 'ν•œ', user can typing the key 'γ…Ž', 'ㅏ', and 'γ„΄'. Usually, this sequence displayed like 'γ…Ž' -> 'ν•˜' -> 'ν•œ', but currently it displays just a complete character directly (ex. '' -> 'ν•œ').

FYI, I'm using Dubeolsik (λ‘λ²Œμ‹) Korean IME.

@zadjii-msft Same issue here. Terminal with WSL(Debian), pwsh, powershell, cmd not work expectly. //'ν•œκ·Έγ„±κΈ€'

Legacy WSL(Debian), pwsh, powershell, cmd can type properly, //'ν•œκΈ€'
but not showing character compositions

This is another great repro gif from @juhokang in #4311

Windows build number: Microsoft Windows [Version 10.0.18362.592]
Windows Terminal version (if applicable): Windows Terminal (Preview) Version: 0.8.10091.0

ex) Type μ•ˆλ…•ν•˜μ„Έμš” with the keyboard

koreannotnormal

Expected behavior

show μ•ˆλ…•ν•˜μ„Έμš” on the terminal

Actual behavior

shows μ•ˆλ…€γ„΄λ…•ν•˜γ…Žν•˜μ…μ„Έμ„Έμš”μš”

FYI, expected behavior during composition.
ime

https://github.com/microsoft/vscode/issues/89853 Possibly similar issue happening in VS Code

@zadjii-msft Is there any progress on this issue? It seems like quite a lot of people encountered this problem.

@rkttu Nope, when there _is_ progress, someone will make sure to chime in on this thread. It's been triaged as a P1 bug for 1.0, so we won't be shipping 1.0 without a fix for this, so stay tuned. If anyone is particularly passionate about this bug, we'd be happy to review a PR. Until someone's been _assigned_ to this bug, you can be sure you won't be stepping on our toes ☺️

Possible workaround (not tested): try removing Droid Sans Fallback from the fonts list of the app if it's there.

https://github.com/microsoft/vscode/issues/89853#issuecomment-581495374

Note that this problem did not occur in version 0.7.3451.0.

Possible workaround (not tested): try removing Droid Sans Fallback from the fonts list of the app if it's there.

microsoft/vscode#89853 (comment)

@hatsunearu Unfortunately https://github.com/microsoft/vscode/issues/89853 is caused by wrong glyph rendering and that workaround dosn't work here.

@guswns0528 discovered that this bug is first caused by https://github.com/microsoft/terminal/commit/dfa7b4a1. https://github.com/microsoft/terminal/commit/dfa7b4a1 itself is right commit so we can't simply revert it.

Looks like this issue is caused by strange behavior of Core text APIs. Originally, the Composition complete event should only be fired when the letter composition is completed. Like this:

  • If you are typing γ…Ž ㅏ γ„΄ γ„± γ…‘ γ„Ή, composition event should be fired twice.
    ν•œ, and κΈ€

But in here it's fired before that. Like this:

  • If you type γ…Ž ㅏ γ„΄ γ„± γ…‘ γ„Ή in here, composition event is fired three times.
    ν•œ, κ·Έ, and κΈ€

It might be a bug of Core text APIs but core text API is just a wrapper of Text Services Framework, which is super stable framework inheritted from Windows XP era. So I need to investigate further.

I'm currently debugging this issue and I'll comment on here if I find something. Please feel free to share any information if you find something. Thanks

@simnalamburt thanks! Note that @leonMSFT is currently working in this area and has a couple pull requests out for this; perhaps you two can coordinate?

Cool! Currently I am suspecting wrong parameter of NotifyTextChanged function as the cause of the bug. But in fact, since I first saw the Core text API today, I still don't know what's wrong. Any help or information is always welcomed!

@simnalamburt thanks a lot for the investigation! I tried to see if I can repro the two vs three composition completed events issue, but I only see two composition completed events. Maybe you could provide a screenshot of the debugging logs you used to find this? Update: I just found this out, but you're likely seeing multiple composition completed events after the first character is finished because when we call NotifyTextChanged to reset the text server's buffer, it'll fire _another_ composition completed event. However, since we have the if-statement to check if there's anything in _inputBuffer, we don't do anything on our side.

I was also taking a look at this bug earlier, and perhaps my findings could explain why you're seeing the bug you're seeing. Here's the behavior I'm observing and why I _think_ we're messing up Korean IME input:

So, going through your example keysequence, pressing γ…Žγ…γ„΄, would result in three TextUpdated events being received, with the _inputBuffer and the _textBlock having the character ν•œ.

Now the user presses γ„±, and what will happen is the following:

  1. TextUpdated is received with the character ν•œ.
  2. CompositionCompleted is received, signaling that ν•œ is the finished composition.
  3. As part of our CompositionCompletedHandler, we send the contents of _inputBuffer (which is ν•œ) to the terminal and reset the _inputBuffer and _textBlock. Then we also notify the text server that they should also make their "buffer" empty as well. (This is the NotifyTextChanged call that you mentioned).

What should happen now is that we should receive a TextUpdated event with the text as ν•œγ„±.
However, we're actually not getting any TextUpdated events after our CompositionCompletedHandler is finished _because_ we're telling the text server to reset their text buffer. Since their buffer is empty, it won't tell us that we should update our text to be anything.

This is why you'll run into the weird issue where you'll be pressing γ…Žγ…γ„΄, which works fine, and you'll see ν•œ on the screen, but once you make another input, like γ„±, nothing happens. The γ„± keypress triggers the CompositionCompleted event for the previous character ν•œ, which tells the text server to clear their buffer. So, you will need to press γ„± again to make γ„± show up on the screen.

So, the core of the problem is that we need to send the IME input to the terminal when we believe composition is finished, and we naturally also need to clear our buffer whenever we send some input to the terminal. We also need to keep the text server's buffer and our _inputBuffer in sync, so whenever we clear our buffer, we tell the text server to clear theirs as well.

As a small test, I've tried commenting out the code where we're telling the text server to reset their text buffer, and lo and behold, text comes out as you would expect, without having to double-press any characters. The only problem here is that if we don't reset our text buffer, every CompositionCompleted event will cause us to send the whole _inputBuffer (which included literally everything you've ever typed while in IME mode) to the terminal, resulting in lots of duplicate input.

I'm currently trying to think of a way around this, but I'm giving you a summary of my findings so maybe you can also repro and investigate further to see if I've missed something! πŸ˜„

@leonMSFT Wow thanks for your detailed explanation! Now I understand what was going on in my development environment.

Currently I’m trying to leave some unfinished characters in text buffer instead of totally clearing it in CompositionCompletedHandler.

Please share any information or updates and let me help anything I can! Actually there are lots of people waiting for this issue to be resolved since there are not much options that Korean developer can choose in Windows. Any share will be helpful and the whole Korean developer community will be grateful to you! πŸ˜„

@simnalamburt Yup, not clearing the whole text buffer, but leaving unfinished characters in is key! Luckily, I think I'm close to getting the fix for this out! πŸŽ‰I'm specifically testing out trying to type out this sequence: μ•ˆλ…•ν•˜μ„Έμš”, which was provided earlier in this thread.

koreanoutput

It _seems_ to work as expected, but before I have a PR out for this fix, I'll need to make sure I haven't messed up any other IME input modes, so hang tight! πŸ˜ƒ

One thing I would like your help on is letting me know of other sample character sequences that might possibly break the way I'm handling Korean IME! I don't know Korean at all, so having the sequence laid out in english characters like it was above with "dkssudgktpdy" (which comes out as μ•ˆλ…•ν•˜μ„Έμš”) really helped!

Wow it was very fast! I lost my chance to become hero lol

Actually there are not many corner cases in Korean IME. And your sample
video (μ•ˆλ…•ν•˜μ„Έμš”) showed that it perfectly handles one of very famous corner
cases in Korean IME called β€œλ„κΉ¨λΉ„λΆˆ ν˜„μƒβ€.

If μ•ˆλ…•ν•˜μ„Έμš” works perfectly I expect the other cases to work fine, but I’ll
share you few more samples just in case.

gksrnrdj whgdk (ν•œκ΅­μ–΄ μ’‹μ•„)
To test whether aborting composition with space works fine

gksrnrdjEnterwhgdk (ν•œκ΅­μ–΄\nμ’‹μ•„)
To test whether aborting composition with enter works fine

Actually there are bunch of things to test further like

  • Test if alternative Korean IME like μ„Έλ²Œμ‹ works fine
  • Test if swiching IME in the middle of composition works fine (it’s very
    common for Korean people)
  • etc

But these cases might be complicated to ask you to test so just share your
patch or make the draft PR. I have bunch of Korean and Japanese friend
developers interested in this issue and they will battle test it for you!

2020λ…„ 2μ›” 29일 (ν† ) 09:27, Leon Liang notifications@github.comλ‹˜μ΄ μž‘μ„±:

I tried to reproduce the scenario that you described, but I'm having trouble. I typed γ…Žγ…γ„΄γ„± and I got 2 textUpdate event instead of 0 after typing "ν•œ".

# Typed γ…Ž
_compositionStartedHandler() called.
_textUpdatingHandler() called.
Text:  γ…Ž
Range: [0, 0]

# Typed ㅏ
_textUpdatingHandler() called.
Text:  ν•˜
Range: [0, 1]

# Typed γ„΄
_textUpdatingHandler() called.
Text:  ν•œ
Range: [0, 1]

# Typed γ„±
_compositionCompletedHandler() called.
_SendAndClearText() called.
inputBuffer was L"ν•œ" and cleared.
_compositionStartedHandler() called.
_textUpdatingHandler() called.
Text:  γ„±
Range: [0, 0]
_textUpdatingHandler() called.
Text:  γ„±
Range: [0, 0]
_compositionCompletedHandler() called.
_SendAndClearText() called.
inputBuffer was L"γ„±γ„±" and cleared.

Is there anything that I misunderstood, or did something changed with https://github.com/microsoft/terminal/commit/31c9d19a72ef5fce19ab480c0a5e407064e15941#diff-7708ccd4133d008adca4935827f7ddb7?

https://github.com/simnalamburt/terminal/commit/acf74bc8ad947 this is a patch that I used for tracing.

That's really weird! I pulled your branch from your fork (and the branch called patch-4226) and tried to do the same thing you were doing and this is what I'm getting:

testingdebug1

After pressing γ„±, as you can see, _textUpdatingHandler, _compositionCompletedHandler and _SendAndClearText are called in sequence, and another _compositionCompletedHandler is called afterwards. I'm not seeing the two extra _textUpdatingHandler calls that you're seeing though. 😒

That's strange. My issue (2 text update event) is being reproduced consistently on two computers, mine and PC of @guswns0528. I wonder what the difference is.

My Windows specifications:

  • Windows Terminal version: https://github.com/simnalamburt/terminal/commit/acf74bc8ad947
  • Windows Edition: Windows 10 Pro
  • Windows Version: 1909
  • Windows OS build: 18363.657
  • Processor type: 64-bit operating system, x64-based processor
  • Windows display language: English (United States)
  • Default app language, Default input language: ν•œκ΅­μ–΄

From the details you've provided, I don't really see a difference 😒. However! I finally have a PR out, so feel free to take a look and play around with it!

Me and @guswns0528 tried https://github.com/microsoft/terminal/pull/4796 and it worked perfectly! Still don't know why the 2 TextUpdate events were fired but looks like your patch fixed it anyway. πŸ‘

A video typing 'κ°μ‚¬ν•©λ‹ˆλ‹€!' in new microsoft terminal

Was this page helpful?
0 / 5 - 0 ratings