Terminal: Consider making ConPTY and Windows Terminal treat all ambiguous-width characters as 1 cell instead of asking the font

Created on 23 Jul 2019  ·  15Comments  ·  Source: microsoft/terminal

Note that the SCS escape sequence doesn't work in the Linux text console [...]

You're absolutely right here.

I also realized I was wrong with PuTTY. Up to version 0.70 (which I tested) PuTTY didn't support line drawing in UTF-8 (as per Markus Kuhn's recommendation for UTF-8 being stateless). You either have to have a legacy charset, or version 0.71 with "Window -> Translation -> Enable VT100 line drawing even in UTF-8 mode". I now tried the latter, and it indeed converts the underscore to a space.

So it looks like Windows Terminal and VTE are the buggy ones here. I've just filed VTE 157.

Just for curiosity: Are you aware of any application which emits this? Why would any app do so, given that the regular space is also a space? :)

As for the choice of diamond character, I don't think the width is something that can be "fixed" in the terminal code. I believe the dimensions of an ambiguous width character are decided by the font.

I firmly disagree here. In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used. In many terminals the font can also be changed runtime and it's absolutely not feasible to then rearrange the cells. In some other cases there is no font at all (e.g. the libvterm headless terminal emulation library, or a detached screen/tmux), or there are multiple fonts at once (a screen/tmux attached from multiple graphical emulators).

The only way to do that is via some external agreement on the number of cells, which is typically the Unicode EastAsianWidth, often accessed via wcwidth(). It's not perfect (changes through Unicode versions, has ambiguous characters, etc.) but is still the best we have.

glibc's wcwidth() reports 1 for ambiguous width characters, so the de facto standard is that in terminals they are narrow.

If the glyph is wider then the terminal has to figure out what to do. It could crop it (newer versions of Konsole, as far as I know), overflow to the right (VTE), shrink it (Kitty I believe does this), etc.

_Originally posted by @egmontkob in https://github.com/microsoft/terminal/issues/2049#issuecomment-513588977_

Area-Rendering Issue-Task Needs-Tag-Fix Product-Conpty Product-Terminal Resolution-Fix-Committed

Most helpful comment

And for what it's worth, here's what I get when I try it:

| before | after |
|-|-|
|image|image|
|image|image|

All 15 comments

From @egmontkob's note above, and from seeing how some other terminal emulators do this, it looks like this might be the correct choice. There's some affordances in certain projects for supporting "legacy" ambiguous character widths, but by and large terminals have agreed that they should be a single cell wide.

And for what it's worth, here's what I get when I try it:

| before | after |
|-|-|
|image|image|
|image|image|

@DHowett-MSFT how does this play with emoji? Aren't they usually ambiguous, but actually double wide?

Nah, emoji are specifically double-width:
image

This is good approach, it seems to solve part of the unicode rendering issue, which might solve Chinese/double-width character issues, quite a lot emoji issues. but I wonder if it only solves some issues. as unicode 9 is soon a headache

VS Code and hyper.js use xterm.js as terminal engine, as they are working on similar Unicode
handling solution here. They had a long history with only wcwidth-ish solution, and now UTS#51 is a big issues, especially missing Unicode 8/9(till latest 13) and Unicode modifier/sequence.

Also, iterm2 a popular terminal app on Mac OS made a lot changes years back to suppor Unicode.

Since terminal/console/wsl is system app, I hope a more mature and overall solution is discussed, proposed, reviewed and implemented for further extension. Current Unicode support is partial and kind of bugfix only

@DHowett-MSFT
Maybe you can make an option to run WT in “old far east application mode” to keep CP 932/936/949/950 compatibilty:

  • Characters width (count in cells) is identical to how many bytes used in these code pages;
  • For CP 932, turn \ and ~ into ¥ and ;
  • For CP 949, turn \ into ;

No.

@DHowett-MSFT
So keep all the weird CP things in CONHOST (V1)?

Codepages have proven, almost without exception, to be an unmitigable disaster. They complicate the text buffer, they complicate the handling of DBCS characters, they provide little to no value in modern UTF-8-aware applications.

The codepage stuff will stay on the far side of ConPTY and be rendered to the terminal in nice good and clean UTF-8. :smile:

@DHowett-MSFT Well what I mean is that, some far east console applications may assume that characters' width follows the code page byte count, so turning them into single-width may break these applications (though... you can still throw them into ConHost V1).

Another issue may include:

  • Characters like : Many fonts (afaik Pragmata Pro) will make it double-width since they are “complex”.

I get that, but to quote the initial post that spawned this issue:

I firmly disagree here. In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used.

@DHowett-MSFT
Hmmm, can we make use of OpenType tags?

  • If a text run is considered only having 0 or 1-cell characters, we apply hwid to them, so font makers can switch their glyphs to a narrower one.
  • For a text run considered only having 0 or 2-cell characters, apply fwid instead.

This is somehow like how UAX #50 works: Analyze runs first, then apply vert on upright runs and vrtr on rotated runs.

:tada:This issue was addressed in #2928, which has now been successfully released as Windows Terminal Preview v0.6.2951.0.:tada:

Handy links:

Was this page helpful?
0 / 5 - 0 ratings