Microsoft Windows [Version 10.0.16299.371]
If you look at the full character set for code page 437, there are a number of useful symbols in the "control" range (for example, the symbols for card suits), which were commonly used in text-based games from the DOS era. I was just trying to use some of these recently, and noticed that they no longer work in the new console.
I thought at first this was a problem with the Consolas font, since they do work with the old Raster fonts, but I then tried using the Consolas font in the legacy console, and was surprised to find that they work there too. So there's obviously something the old console is doing to support those characters that is missing in the v2 console.
Below you can see a simple test case comparing the v2 console with Raster fonts (left), the legacy console with the Consolas font (middle), and the v2 console with the Consolas font (right).

In case it's not obvious, the characters being echoed are entered with the Ctrl key, i.e. Ctrl+B, Ctrl+D, etc.
When code page 437 is selected (or one of the regional equivalents) I would expect the characters in the control range to display their correct symbol as defined by the code page (assuming of course that the character isn't actually functioning as a control character).
This is by-design. Legacy mode is required to run those old DOS things. The console team is trying to move away from that old stuff and get into the new such as getting UTF-8 vice UTF-16, somewhat implementing xterm, etc. due to the feedback they've gotten from the community. We don't want the old stuff so if you need it, that's what legacy mode is for.
I mean, as the person who pulled it out, it's by design at that point in time because it made my life more difficult in terms of the other changes I was trying to make to update the rest of the console code.
The way this worked was that it kept an in-memory display canvas of those symbols out of the raster font and tried to stitch them into every other font that used them.
If there were enough demand, we could theoretically re-add it somehow. Probably by attempting to remap them to a Unicode codepoint for each of those symbols behind a checkbox that lets you turn it on or off. But I'm pretty sure demand overall is low as this is the first report I've heard of this in the 3 to 4 years since I took it out. For now, I'd advise using the legacy mode to get at this behavior.
But to one extent, @DarthSpock is right. Implementing UTF-8 is currently a higher priority to us than maintaining or reimplementing support for this particular feature. If more people vocally want this than UTF-8, we can switch gears. :P But I doubt that.
There is one character that is used on e.g. Swedish keyboards that has code 21 decimal in CP437 (i.e. after chcp 437): §, and that works in the non-legacy console and doesn't have a raster font. I think you could do the same hack for all in range 0-31. In DOSBox 0.74 Ctrl+U, Ctrl+V, Ctrl+W, and Alt+21, Alt+22, Alt+23 gives §▬↨ (number-keys from numerical keyboard).
This also affects CP850 etc: "code points 1–31 and 127 (01–1Fhex and 7Fhex) may be either ASCII control characters or code page 437 graphics, depending on context." https://en.wikipedia.org/wiki/Code_page_850#Character_set
All CP437 characters have a corresponding UTF-8 character, see
https://en.wikipedia.org/wiki/Code_page_437#Characters
Alt+127 works in the non-legacy console in CP437 and CP850 and gives ⌂.
I'm also for xterm and UTF-8. WSLtty (WSL Terminal) supports entering CP437 1-31 using Alt+1 to Alt+31, and this even though UTF-8 is selected, but this might collide with acting as an xterm.
@miniksa: Based on your response, I think you may not have understood the limited conditions under which I was expecting this to work. So just to clarify:
I wasn't expecting these characters to work when utf-8 was selected as the code page - I only expected them to work if the selected code page had actually defined symbols for those code points. They don't work in the legacy console when utf-8 is selected either.
I also wasn't expecting them to work with fonts that don't have those glyphs. But Consolas certainly does, and I expect most other fonts would too, considering all those characters are included in WGL4. So I wasn't expecting font fallback to the raster fonts - I assumed it was simply a matter of mapping those code points to existing symbols in the active font, much like you'd be doing for the rest of the characters in the code page.
That said, I can understand if you're just saying this isn't a priority at the moment. I can also accept that it may be a lot more complicated than it appears to an outsider. I just wanted to be sure you hadn't misinterpreted my request.
Sorry I think I'm miscommunicating or mixing the two things here. Let me try a bulleted list of things to see if that helps.
We'd love to have time to implement everything that people are concerned with, but the team and I do not. I just want to be honest: this was removed from the new console on purpose, we believe the legacy console is a sufficient workaround for now, fixing this exists on our backlog, we know about it, we have an idea of how we'd fix it, but it is currently ranked too low relative to other features.
I hope this helps clarify it.
I must admit that considering NTVDM isn't supported on x64 so old DOS apps aren't supported anyway, and I don't see many Win32 CUI apps that would get impacted.
There is a workaround for impacted apps by using the Unicode (UTF-16) version of the console API. Apps trying to render ANSI-art should simply #define UNICODE and use proper code-point anyway as box-drawings and block-elements might not map to the current user codepage either, instead of changing the user's codepage or expecting the whole world to use CP 437.
Codebase using CP 437 internally could stay as it and simply replace their print function by a combination of MultiByteToWideChar and WriteConsoleW to easily become console codepage independent.
Outputting these characters in the v2 console with CP 437 selected.

Codepage 437 is a superset of ASCII. It also includes an alternate mapping for the legacy PC glyphs, which you can get instead of ASCII control characters by specifying the MultiByteToWideChar flag MB_USEGLYPHCHARS.
This would be a lot less painful if the Legacy Console setting wasn't global!
While looking into another issue, I came across something in the code which I think is the reason why these characters aren't displaying "correctly". In the WriteCharsLegacy method in __stream.cpp_, there is actually some code that tries to handle the conversion of control characters to their OEM glyphs (see here). The problem is that the _CharType_ returned by GetStringTypeW is a combination of bit flags - not a single value - so it can't be compared directly with C1_CNTRL. This can easily be fixed by changing that condition to a proper flag test:
if (WI_IsFlagSet(CharType, C1_CNTRL))
I'd love to provide a PR for this if nobody has any objections.
@miniksa knows best here, but that sounds like the right fix to me!
I'm only moderately terrified that it is, in fact, in WriteCharsLegacy. That function is ... one of the worst parts of conhost, and one of our biggest compatibility concerns. I'd wait for Michael to report on whether this is sane or insane, or whether that behavior is now codified and set in stone and can never be changed. :smile:
if (WI_IsFlagSet(CharType, C1_CNTRL))
In short, this is probably fine to fix.
However, I would personally feed a few characters through WriteCharsLegacy under the debugger and assert that your theory is correct first (that multiple flags coming back are what the problem is) before making the change.
I am mildly terrified, less than Dustin, because it is freaking WriteCharsLegacy which is the spawn of hell and I fear some sort of regression in it.
In long, why is it fine to fix?
For reference, this particular segment of code https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/_stream.cpp#L514-L539 appears to only be used when the codepoint is < 0x20 or == 0x7F https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/_stream.cpp#L408 and ENABLE_PROCESSED_OUTPUT is off. https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/_stream.cpp#L320
I looked back at the console v1 code and this particular section had a divergence for "Western" countries and "Far East" countries (a geopolitically-charged term, but what it was, nonetheless.)
For "Western" countries, we would unconditionally run all the characters through MultiByteToWideChar with MB_USEGLYPHCHARS without the C1_CNTRL test and move the result into the buffer.
For "Eastern" countries, we did the C1_CNTRL test and then if true, we would run through MultiByteToWideChar with MB_USEGLYPHCHARS. Otherwise, we would just move the original character into the buffer and call it a day.
Note in both of these, there is a little bit of indirection before MultiByteToWideChar is called through some other helper methods like ConvertOutputToUnicode, but that's the effective conversion point, as far as I can tell. And that's where the control characters would turn into acceptable low ASCII symbols.
When we took over the console codebase, this variation between "Western" and "Eastern" countries was especially painful because conhost.exe would choose which one it was in based on the Codepage for Non-Unicode Applications set in the Control Panel's Regional > Administrative panel and it could only be changed with a reboot. It wouldn't even change properly when you chcp to a different codepage. Heck, chcp would deny you from switching into many codepages. There was a block in place to prevent going to an "Eastern" codepage if you booted up in a "Western" codepage. There was also a block preventing you from going between "Eastern" codepages, if I recall correctly.
In modernizing, I decided a few things:
#ifdef and #elses that would sometimes surround individual lines in the function bodies. Gross.Now, after that backstory. This is where the problem comes in. It looks like the code you're pointing to that didn't check flags and instead checked direct equality... is the way that it was ALWAYS done for the "Eastern" copy of the code. So it was ALWAYS broken for the "Eastern" codepages and country variants of the OS.
I don't know why the "Eastern" copy was checking C1_CNTRL at all in the first place. There is no documentation. I presume it has to do with Shift-JIS or GB-2312 or Unified Hangul or something having a conflict < 0x20 || == 0x7F. Or alternatively, it's because someone wrote the code naively thinking it was a good idea in a hurry and never tested it. Very possible and even probable.
Presuming CJK codepages have no conflict in this range for their DBCS codepages... we could probably remove the check with GetStringTypeW entirely and always run everything through ConvertOutputToUnicode. More risky than just the flag test change... but theoretically an option as well.
Thank you for taking the time to go into this in so much detail. I appreciate the amount of work that must have gone into refactoring that code, and I do understand your concern about breaking things. I have stepped through the function in the debugger to confirm that it was working the way I expected, but I'll try and do a lot more testing before I submit the PR.
Having now checked the character type for all the characters in the control range (0 to 31, and 127), I can confirm that they all have both the C1_CNTRL and C1_DEFINED flags set. A few also have C1_BLANK and/or C1_SPACE. So none of them would have matched on a direct comparison. That said, it seems that the C1_DEFINED flag didn't always exist (I believe it was added in XP), so it's possible a direct comparison with C1_CNTRL would actually have worked in the past (in most cases at least), so maybe that's the reason why this code was originally written the way it is.
Something that came up in testing, though, was that my suggested fix would actually change the behaviour of the null character. Because a null is now successfully matched as a C1_CNTRL, it would no longer fall through to the special-case handling that previously mapped it to a space, and that caused problems elsewhere in the code (see issue #1955). So to address that I've also had to reorder the code slightly - not quite the straightforward fix I'd thought it would be!
One other thing. You mention above that this only applies when ENABLE_PROCESSED_OUTPUT mode is off, but it's actually the other way around - it only applies when ENABLE_PROCESSED_OUTPUT mode is set. That seems wrong to me (I'd expect it to work in both cases), but that's the way the legacy console behaved as well, so if that's a bug it's a separate issue. And since the majority of code would be using the default mode (where ENABLED_PROCESSED_OUTOUT is set), I don't think it's that big a deal.
Having now checked the character type for all the characters in the control range (0 to 31, and 127), I can confirm that they all have both the
C1_CNTRLandC1_DEFINEDflags set. A few also haveC1_BLANKand/orC1_SPACE. So none of them would have matched on a direct comparison. That said, it seems that theC1_DEFINEDflag didn't always exist (I believe it was added in XP), so it's possible a direct comparison withC1_CNTRLwould actually have worked in the past (in most cases at least), so maybe that's the reason why this code was originally written the way it is.Something that came up in testing, though, was that my suggested fix would actually change the behaviour of the null character. Because a null is now successfully matched as a
C1_CNTRL, it would no longer fall through to the special-case handling that previously mapped it to a space, and that caused problems elsewhere in the code (see issue #1955). So to address that I've also had to reorder the code slightly - not quite the straightforward fix I'd thought it would be!One other thing. You mention above that this only applies when
ENABLE_PROCESSED_OUTPUTmode is off, but it's actually the other way around - it only applies whenENABLE_PROCESSED_OUTPUTmode is set. That seems wrong to me (I'd expect it to work in both cases), but that's the way the legacy console behaved as well, so if that's a bug it's a separate issue. And since the majority of code would be using the default mode (whereENABLED_PROCESSED_OUTOUTis set), I don't think it's that big a deal.
Sorry, yes. I probably just mentally transposed the boolean state. And I agree. It's probably not that big of a deal.
Thank for catching and identifying the null issue as well!
:tada:This issue was addressed in #1964, which has now been successfully released as Windows Terminal Preview v0.5.2661.0.:tada:
Handy links:
Most helpful comment
Sorry I think I'm miscommunicating or mixing the two things here. Let me try a bulleted list of things to see if that helps.
We'd love to have time to implement everything that people are concerned with, but the team and I do not. I just want to be honest: this was removed from the new console on purpose, we believe the legacy console is a sufficient workaround for now, fixing this exists on our backlog, we know about it, we have an idea of how we'd fix it, but it is currently ranked too low relative to other features.
I hope this helps clarify it.