Windows build number: Version 10.0.18362.175
Windows Terminal version (if applicable): 0.2.1831.0
_ character, which should map to a blank.` character, which should map to a diamond._ character doesn't map to a blank, and the ` character maps to a double-width diamond which causes subsequent characters to be out of alignment.Here's an example printf statement that will demonstrate the issue:
printf '\e(0[_]\n[`]\n\e(B'
For reference, the Special Graphics character set is documented here:
https://vt100.net/docs/vt220-rm/table2-4.html
I'd expect to see output looking something like this:
[ ]
[♦]

I know this is a minor nit, but it should be an easy fix, and I'd be happy to provide a PR for it. It's just a matter of replacing two characters in the translation table of the TerminalOutput class, and updating the lower bound of the translation range in the TerminalOutput::TranslateKey method.
For the diamond I'd recommend unicode character U+2666 (Black Diamond Suit). Not only does it render as a single width character, but it has better font support, so will typically still work without font fallback.
Also, the pedant in me would just love to tidy up the case of the hex values in that table so they're all consistently lowercase (or whatever you prefer), and fix some of the comments that are incorrect. Assuming it's appropriate to include cleanup like that in the same PR.
Space/underscore: It's weird how inconsistent other terminals are. E.g. Konsole, st, urxvt, Terminology display a space, whereas VTE, ~PuTTY, and the Linux text console~ (EDIT: see correction below) display an underscore. Xterm is truly interesting (ahem buggy): it displays a dotted rectangle (glyph missing) for me, but as I copy-paste it it turns into a space (is displayed and copy-pasted as such). The page you linked says "(blank)" whereas for 0x20 it says "SP", so it's not clear to me that "(blank)" necessarily means space rather than "unspecified" or something similar. Anyone has access to a DEC VTxxx to actually try it out? :)
Diamond: In the emulators I've checked it copy-pastes as U+25C6. Another possibility is to double check why it occupies two cells rather than one. U+25C6 is ambiguous width (as opposed to U+2666 neutral width), but it's not the only one in this set, e.g. so is U+2500 (which 0x71 is mapped to), and ambiguous should normally mean narrow.
Note that the SCS escape sequence doesn't work in the Linux text console, so if you want to test the Line Drawing characters, you have to rely on the SO control character to select the G1 character set (which defaults to the Line Drawing set). An equivalent test case for the Linux text console would be:
printf '\016[_]\n[`]\n\017'
And that produces a space (technically A0) for the first character, and probably a backtick for the second character (internally it's U+25C6, but it displays the premapped character if the font can't render that glyph).
As for XTerm, it used to leave 5F as an underscore, but that was fixed in patch #338, with the note "improve display and checksum for DEC Special Graphics by mapping 0x5f to 0". I'm not sure what the intent of the 0 is, but that's not the equivalent of the null control character - it is actually writing something out. And it does render as a space if you have the _Line-Drawing Characters_ option enabled on the _VT Fonts_ menu.
The bottom line is that VTE and PuTTY are the only terminals (at least in your list) that leave 5F as an underscore. I can accept the argument that "blank" might mean some other kind of whitespace, like A0 or maybe a printable null, but given the dictionary definition, and the fact that DEC manuals are known to use the terms space and blank interchangeably, it's hard to believe that the intent was to leave that character as an unmapped underscore.
As for the choice of diamond character, I don't think the width is something that can be "fixed" in the terminal code. I believe the dimensions of an ambiguous width character are decided by the font. So if we want this to work, I think we would have to choose a glyph that is narrow in the majority of console fonts (and U+2666 is the best fit for those requirements). But maybe someone from MS will correct me on this - I'm not positive.
Note that the SCS escape sequence doesn't work in the Linux text console [...]
You're absolutely right here.
I also realized I was wrong with PuTTY. Up to version 0.70 (which I tested) PuTTY didn't support line drawing in UTF-8 (as per Markus Kuhn's recommendation for UTF-8 being stateless). You either have to have a legacy charset, or version 0.71 with "Window -> Translation -> Enable VT100 line drawing even in UTF-8 mode". I now tried the latter, and it indeed converts the underscore to a space.
So it looks like Windows Terminal and VTE are the buggy ones here. I've just filed VTE 157.
Just for curiosity: Are you aware of any application which emits this? Why would any app do so, given that the regular space is also a space? :)
As for the choice of diamond character, I don't think the width is something that can be "fixed" in the terminal code. I believe the dimensions of an ambiguous width character are decided by the font.
I firmly disagree here. In terminal emulation, apps have to be able to print something and keep track of the cursor, whereas they by design have no idea of the font being used. In many terminals the font can also be changed runtime and it's absolutely not feasible to then rearrange the cells. In some other cases there is no font at all (e.g. the libvterm headless terminal emulation library, or a detached screen/tmux), or there are multiple fonts at once (a screen/tmux attached from multiple graphical emulators).
The only way to do that is via some external agreement on the number of cells, which is typically the Unicode EastAsianWidth, often accessed via wcwidth(). It's not perfect (changes through Unicode versions, has ambiguous characters, etc.) but is still the best we have.
glibc's wcwidth() reports 1 for ambiguous width characters, so the de facto standard is that in terminals they are narrow.
If the glyph is wider then the terminal has to figure out what to do. It could crop it (newer versions of Konsole, as far as I know), overflow to the right (VTE), shrink it (Kitty I believe does this), etc.
~I'm also wondering, given the looks of the glyphs shown by embedded PNGs in the table you linked to, whether y and z should translate to the slanted ⩽ 2A7D and ⩾ 2A7E, rather than ≤ 2264 and ≥ 2265. This question goes to all the terminal emulators out there :-) And again, to someone who has access to a hardware DEC terminal to check its behavior.~
As for XTerm, it used to leave 5F as an underscore, but that was fixed in patch #338 [...]
I was using 344, that is, a "fixed" version. If I enable VT Fonts -> Line-Drawing Characters, I indeed get a space straight away. If I leave it disabled, the dotted rectangle that converts to a space on mouse selection can't reasonably be defended, it must be a bug.
You're right that an incoming 0 character is a no-op, so this means that this line-drawing mode would be the only way to create new cells of the 0 character, which sounds a pretty bad design to me (in the world of UTF-8, line-drawing mode should die).
VTE internally uses the 0 character to denote the "erased" state (as per ECMA-48; I'm not familiar with DEC docs to tell whether they also use this terminology), this erased state means for us not to copy-paste it if it's at the end of a line. I don't know if other terminals (or Xterm in particular) also do so, or have other means of separating the 0 character from the erased state.
As you also stated in your original report, we're nitpicking on extreme corner cases.
vt100.net has dumped font images: https://vt100.net/dec/vt220/glyphs
You're right that an incoming 0 character is a no-op,
IIRC xterm uses 0 for cells that have not been written normally. It's a "unset" sentinel in a way.
vt100.net has dumped font images
Thanks. These aren't slanted, so the currenty used glyphs are the best choice.
IIRC xterm uses 0 for cells that have not been written normally. It's a "unset" sentinel in a way.
This sounds the same as the "erased" concept of ECMA-48.
The only way to do that is via some external agreement on the number of cells, which is typically the Unicode EastAsianWidth, often accessed via wcwidth(). It's not perfect (changes through Unicode versions, has ambiguous characters, etc.) but is still the best we have.
glibc's wcwidth() reports 1 for ambiguous width characters, so the de facto standard is that in terminals they are narrow.
If the glyph is wider then the terminal has to figure out what to do. It could crop it (newer versions of Konsole, as far as I know), overflow to the right (VTE), shrink it (Kitty I believe does this), etc.
That seems like a reasonable approach to me, and it's probably worth raising a separate issue to address that. However, even if that capability were in place, I'd still consider a cropped or an overflowing diamond inferior to one that just fits by default. Although one that was shrunk to fit would probably be acceptable, but I don't see that happening any time soon.
So I'd just like to get a quick and easy fix in place for now, and if there's really a demand for that diamond being U+25C6, it can always be changed back in the future if/when ambiguous width characters are handled better. But in the meantime, a glyph that doesn't break the layout is surely an improvement.
Since it looks like we may well be getting ambiguous-width characters interpreted as a width of 1, I thought I'd try enabling that myself and seeing what the U+25C6 diamond looks like, and it's not nearly as bad as I expected. It looks to me like it's being shrunk to fit, so there's no cropping or overflowing.
Still, when compared with the U+2666 glyph, the latter is undoubtedly a better fit (at least on Windows), and more closely matches the original VT100 design. Combined with the fact that it also works without font fallback, I still think it's the preferable choice.
@j4james Yea I think I'm fine with you opening a PR for this. I'm definitely okay with the (blank) -> space change. The diamond one I'm conflicted on - U+2666 does seem like a more correct interpretation, and would work better until #2066 is merged. That being said, if other terminal emulators are using U+25C6, then maybe we should just stick with that one too. That being said, we could always just leave it as U+2666 with a comment, and if there's someone who _really_ cares, changing it back would be trivial.
:tada:This issue was addressed in #2081, which has now been successfully released as Windows Terminal Preview v0.3.2142.0.:tada:
Handy links: