Terminal: Narrow emoji >=U+10000 make WriteCharsLegacy sad ("wrong character insertion when scrolling to bash history")

Created on 18 Jun 2020  ·  12Comments  ·  Source: microsoft/terminal

_Notes from https://github.com/microsoft/terminal/issues/6555#issuecomment-646206984_

[0x4]   OpenConsoleFullPDB!IsGlyphFullWidth + 0x9   
[0x5]   OpenConsoleFullPDB!WriteCharsLegacy + 0x2a3                  <<<<<<<<<<<<<<<<<
[0x6]   OpenConsoleFullPDB!WriteBuffer::_DefaultStringCase + 0xaf   
[0x7]   OpenConsoleFullPDB!WriteBuffer::PrintString + 0x16   
[0x8]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::AdaptDispatch::PrintString + 0x61   
[0x9]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::OutputStateMachineEngine::ActionPrintString + 0x42   
[0xa]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::StateMachine::ProcessString + 0x197   

We're calling IsGlyphFullWidth once for each half of the surrogate pair. You can well imagine how nicely that works out.


Environment

Windows build number: Win32NT 10.0.19041.0 Microsoft Windows NT 10.0.19041.0
Windows Terminal version (if applicable): 1.0.1402.0

Using WSL Ubuntu LTS version 20.04

Steps to reproduce

  1. Install Microsoft Terminal from Windows Store. Keep the default settings.
  2. Enable WSL and install Ubuntu 20.04 from Windows Store.
  3. Start Ubuntu 20.04 via WSL2.
  4. When bash prompt appears, run exactly this (from the first character until the end):
echo 🔥🌡  

the utf-16 equivalent (just to see the exact characters)

\u0065\u0063\u0068\u006f\u0020\ud83d\udd25\ud83c\udf21\u0020\u0020
  1. Press enter to see the result.
  2. Press key UP
  3. Press key DOWN
  4. Go to step 6.

Expected behavior

switch from

echo 🔥🌡  

to


THIS PROBLEM CANNOT BE REPRODUCED ON OTHER TERMINAL EMULATORS

Actual behavior

character 'e' inserted at the beginning

Area-Output Issue-Bug Priority-2 Product-Conhost

Most helpful comment

@francogp this isn't quite the same issue; that glyph is specified as being "narrow" (only taking up one cell), but it's also considered an emoji. We could render it better, but that's more likely #1472.

All 12 comments

Ho boy, is the thermometer supposed to be wide, and we're treating it as narrow?

This problem repros in conhost as well as Terminal, so it's not necessarily a conpty miscommunication.

I think it's the other way around - it's supposed to be narrow but we're treating it as wide. This could perhaps be because conhost uses different width calculations to WT. My understanding was that this was by design, for backwards compatibility reasons (see issue #4604). But that might result in the conhost cursor being in a different position to whether WT expects it to be. Perhaps we need a way to tell conhost to use the standard width calculations when in a conpty session (assuming that is the problem).

I think this is more likely a width disagreement between _bash and conhost_, actually, coupled with a somewhat “recent” regression where BS can move halfway into a double-width cell.

I think this is more likely a width disagreement between _bash and conhost_

Yeah, but that's to be expected if conhost doesn't use a standard width measurement. You can get the same result with the U+FFFD character from my original bug report - you just need to be using a font that renders it as double-width (e.g. MS Gothic).

widechar

So unless conpty is triggering different behaviour, conhost's view of character widths is assumedly going to be dependent on whatever font is used by default, which is not necessarily in sync with bash or the Windows Terminal renderer.

Here's another example of the problem that doesn't involve bash. Try executing these two statements:

printf ".X\e[AX\n\n"
printf "🌡X\e[AX\n\n"

In both cases WT renders the first X in the second column, because it thinks both the . and the thermometer are single cell width. But in the case of the thermometer, conhost then "corrects" the cursor position when it is moved, so the second X ends up in the fourth column instead of the third.

image

Huh, we did work to make sure conpty and WT treat ambiguous width glyphs as narrow _always_. cf #2066 for more info on that choice (which I believe is the correct one). Perhaps we regressed that and they disagree once more? Hmm

(Conpty was actually circumstantial: the ambiguous width lookup fell through to the VT renderer, which threw up its hands and said “friend you don’t have a font, so it’s narrow.”)

Ah shit.

[0x4]   OpenConsoleFullPDB!IsGlyphFullWidth + 0x9   
[0x5]   OpenConsoleFullPDB!WriteCharsLegacy + 0x2a3                  <<<<<<<<<<<<<<<<<
[0x6]   OpenConsoleFullPDB!WriteBuffer::_DefaultStringCase + 0xaf   
[0x7]   OpenConsoleFullPDB!WriteBuffer::PrintString + 0x16   
[0x8]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::AdaptDispatch::PrintString + 0x61   
[0x9]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::OutputStateMachineEngine::ActionPrintString + 0x42   
[0xa]   OpenConsoleFullPDB!Microsoft::Console::VirtualTerminal::StateMachine::ProcessString + 0x197   

We're calling IsGlyphFullWidth once for each half of the surrogate pair. You can well imagine how nicely that works out.

I was just going to say I thought it might have something to do with surrogate pairs. The fix I was proposing in #4604 didn't help in this case (although it does fix U+FFFD).

_how did we ever expect this to work 😦_

I'm going to make this the master issue for "narrow emoji that are also surrogate pairs make life very sad"

@francogp this isn't quite the same issue; that glyph is specified as being "narrow" (only taking up one cell), but it's also considered an emoji. We could render it better, but that's more likely #1472.

Was this page helpful?
0 / 5 - 0 ratings