Powershell: Messy code with VSCode's reading from stdin

Created on 20 Jul 2020  路  19Comments  路  Source: PowerShell/PowerShell

I'm using echo 浣犲ソ | code - but got messy code. The key point is that in cmd the command works.

I reported at https://github.com/microsoft/vscode/issues/102917 but he suggests opening an issue here. There are some more tests in that issue.

鍥剧墖

鍥剧墖

Environment data

Name                           Value
----                           -----
PSVersion                      7.1.0-preview.5
PSEdition                      Core
GitCommitId                    7.1.0-preview.5
OS                             Microsoft Windows 10.0.19042
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
Issue-Question Resolution-Answered

All 19 comments

I suggested to report the issue here only because the issue does not reproduce with cmd.exe it seems.

What does $OutputEncoding returns?

Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : True
CodePage          : 65001

鍥剧墖

For PS 5.1:

鍥剧墖

Perhaps a root of the issue is that code is based on chcp that is [Console]::OutputEncoding in PowerShell but PowerShell is using $OutputEncoding to communicate with external executables.

For a test you could assign CP936 value to $OutputEncoding.

/cc @mklement0

Yeah, with $OutputEncoding=[System.Text.Encoding]::GetEncoding(936) both pwsh and PS 5.1 can get the correct result.

@imba-tjd Thanks for confirmation! I do not think we can resolve the issue automatically. Can you use Utf8 in console?

Do you mean chcp 65001? That won't help with my case because VSC can't read chcp correctly but that's another thing.

For myself I can set $OutputEncoding in my pwsh profile but generally I think this issue should be fixed by VSC.

鍥剧墖

/cc @TylerLeonhardt for information.

To clarify what VSCode is doing:

  • we run chcp
  • this is using node.js exec command which possibly is spawning cmd.exe and not powershell
  • we read the output
  • we have a map of supported encodings (see below)
  • we try to find the encoding in the output
  • we then pass this encoding into iconv-lite to convert the input string using this encoding to UTF-8
const windowsTerminalEncodings = {
    '437': 'cp437', // United States
    '850': 'cp850', // Multilingual(Latin I)
    '852': 'cp852', // Slavic(Latin II)
    '855': 'cp855', // Cyrillic(Russian)
    '857': 'cp857', // Turkish
    '860': 'cp860', // Portuguese
    '861': 'cp861', // Icelandic
    '863': 'cp863', // Canadian - French
    '865': 'cp865', // Nordic
    '866': 'cp866', // Russian
    '869': 'cp869', // Modern Greek
    '936': 'cp936', // Simplified Chinese
    '1252': 'cp1252' // West European Latin
};

I think the culprit here is - as I said - node.js exec is not using Powershell.

Is there maybe an environment variable in Powershell we could get the encoding from?

tl;dr

The problem is that Visual Studio Code's CLI, code, doesn't pick up in-session changes to the active OEM code page, and seemingly always uses the _system's_ OEM code page.

Additionally, a fix to the PowerShell Integrated Console may be required - see https://github.com/PowerShell/vscode-powershell/issues/2816


The _PowerShell Integrated Console_ (the special shell that comes with the PowerShell extension) sets [console]::OutputEncoding invariably to UTF-8, irrespective of what the active OEM code page is.

Note that running a _regular PowerShell session_ in an integrated VSCode terminal does _not_ exhibit this behavior - it, like regular console / Windows Terminal windows, _consistently_ uses the active OEM code page for both [console]::InputEncoding and [console]::OutputEncoding.

While PowerShell itself by default respects the active OEM code page for _receiving data from_ external programs (via [console]::OutputEncoding), it does _not_ do so for sending data _to_ external programs, because the $OutputEncoding preference variable, which controls the encoding used, is set to a _fixed_ default value in both PowerShell editions:

  • It is ASCII(!) encoding in _Windows PowerShell_ (which means that characters outside the 7-bit ASCII range are sent as literal ? chars.)

  • It is more sensibly UTF-8 in _PowerShell [Core] v6+_, but note that it would always have made more sense to let $OutputEncoding default to _whatever the active OEM code page is_.

In the case at hand (a PowerShell [Core] session, _UTF-8_-encoded text is therefore piped to code.

That the PowerShell Integrated Console doesn't _also_ set the _input_ encoding - [console]::InputEncoding - to UTF-8 should be considered a bug, and this bug causes chcp not to recognize the intended change to 65001, the UTF-8 code page - see https://github.com/PowerShell/vscode-powershell/issues/2816

For _direct_ calls to chcp in the PowerShell Integrated Console, you can fix this problem by running
[console]::InputEncoding = [console]::OutputEncoding (afterwards, chcp reports 65001, but not that using chcp to _change_ the OEM code page doesn't work, because .NET _caches_ the [console] encodings); however, this alone is _not_ enough to fix the problem: code still only ever sees the _system_ OEM code page.

Therefore, the UTF-8-encoded text is misinterpreted as code page 936-encoded.

Here's a cmd.exe demonstration of the problem that shows that code is the culprit:

C:\> chcp 65001
C:\> echo 鈧瑋 code --verbose -
Running "chcp" to detect terminal encoding...
Reading from stdin via: C:\Users\jdoe\AppData\Local\Temp\code-stdin-xbt.txt
Marker file for --wait created: C:\Users\jdoe\AppData\Local\Temp\emsolgg
Detected raw terminal encoding: cp437

Note how cp437 (which happens to be the active OEM code page on my US-English system) is detected rather than 65001.

As I understand, pwsh is using $OutputEncoding which is utf8 in | code -, and is different from chcp which relates to [Console]::xxx.
As for PS 5.1, it seems can't be solved under default settings because you said that's ? literal.

And the way VSC detects chcp is by starting a new session and run chcp, so it can't detect in-session changes.

And the way VSC detects chcp is by starting a new session and run chcp, so it can't detect in-session changes.

Indeed it _doesn't_ detect it, but generally it _is_ possible for a _child_ process to correctly detect the console's active OEM code page - as evidenced by running chcp (which itself by definition runs in a child process) _directly_, after having run
[console]::InputEncoding = [console]::OutputEncoding = $OutputEncoding, and getting Active code page: 65001.

So the problem must be in _how_ chcp is invoked by VSCode / Node.js: the exec() function @bpasero mentions indeed involves a call to cmd.exe, but that doesn't explain the problem either, given that _direct_ execution of cmd /c chcp again is capable of picking up an in-session change.

It seems that the child process created by Node.js' exec() is disconnected from the console associated with the parent process. I personally don't know enough about Node.js to be able to tell if there's a solution.

If there is no solution, then the best -but suboptimal - workaround would be to check the process name of the parent process and _infer_ the value of $OutputEncoding from the originating executable file name (powershell -> ASCII; pwsh -> UTF-8) based on the assumption that the _default_ $OutputEncoding value is in effect - which, of course, isn't necessarily true.

Thanks, @bpasero.

I've realized that it isn't Node.js' exec() function that is the problem (as an aside: execFile() would be more efficient - no need to involve the shell).

For instance, the following works fine from cmd.exe:

C:\>chcp 65001
C:\>node -pe "require('child_process').execFileSync('chcp').toString()"
Active code page: 65001

Instead, I suspect that because Code.exe - the executable launched by the code.cmd batch file - is a Windows _GUI_-subsystem application, it is therefore disconnected from the caller's console.

Curiously, though, there's at least _some_ code in Code.exe that explicitly attaches to the caller's console, as evidenced by logging messages appearing there (asynchronously).

I see two potential solutions:

  • If technically feasible: call chcp - or, preferably, the GetConsoleOutputCP WinAPI function in-process - from the part of Code.exe that has attached to the caller's console.

  • Otherwise, set an environment variable in code.cmd that the Node.js code can later pick up; e.g.:

for /f "tokens=2 delims=:" %%v in ('chcp') do set CONSOLE_CP=%%v
set CONSOLE_CP=%CONSOLE_CP: =%

Let me summarize (on the assumption that my analysis is correct):

  • There is a problem with Visual Studio Code (code) itself (on Windows): irrespective of what shell is used, it currently cannot detect the caller's _active_ OEM code page, i.e., an in-session switch to a different code page.

  • If the fix still requires a call to the chcp utility (as opposed to WinAPI function GetConsoleOutputCP()), then in order for the PowerShell Integrated Console to work properly, _it_ must fix https://github.com/PowerShell/vscode-powershell/issues/2816.

@mklement0 thanks a lot for the analysis, I will go ahead and update https://github.com/microsoft/vscode/issues/102917 with your findings.

I would stay away in our code from relying on native Windows C-functions to call, because of the overhead of dragging in native modules via node.js. Ideally node.js would provide this, but until then I would like to keep using chcp.

I think the workaround as a user is to set VSCODE_CLI_ENCODING in the shell that pipes into VSCode. We support this environment variable as a way to explicitly set the encoding to pick.

My concern is that even if VSC could get the correct active OEM code page, my initial question is still not solved, because pwsh uses $OutputEncoding, which is not equal to chcp's value by default (I'm using a normal console, not the PS extension provided).

@imba-tjd, if you're running PowerShell as a regular shell in the integrated terminal, you can put the following into $PROFILE to make your session fully UTF-8 (would also work for the PowerShell Integrated Console):

# Make the console (terminal) use UTF-8 in all aspects.
[console]::InputEncoding = [console]::OutputEncoding = $OutputEncoding = [System.Text.Utf8Encoding]::new()
# Tell VSCode what stdin encoding to use.
$env:VSCODE_CLI_ENCODING='utf8'

This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.

Was this page helpful?
0 / 5 - 0 ratings