Powershell: [Console]::OutputEncoding doesn't work to parse exe with unicode output

Created on 14 Oct 2019  路  19Comments  路  Source: PowerShell/PowerShell

Steps to reproduce

I have a Windows executable that produces unicode (utf-16) output. In PowerShell 5.1, I can set the [Console]::OutputEncoding property so the output of that command gets correctly interpreted. On PowerShell Core 6.2.3, that doesn't appear to work.

I've also tried setting [Console]::InputEncoding and $OutputEncoding, but the problem persists.

For example, I use the wsl.exe binary here, so this should repro on any system that has the Windows Subsystem for Linux installed.

[Console]::OutputEncoding = [System.Text.Encoding]::Unicode
wsl.exe --list -v | ForEach-Object { $_ }

Expected behavior

PS C:\Users\svgroot> [Console]::OutputEncoding = [System.Text.Encoding]::Unicode
PS C:\Users\svgroot> wsl.exe --list -v | ForEach-Object { $_ }
  NAME            STATE           VERSION
* Ubuntu          Stopped         2
  Ubuntu-18.04    Stopped         2
  Alpine          Stopped         1

Actual behavior

PS C:\Users\svgroot> [Console]::OutputEncoding = [System.Text.Encoding]::Unicode
PS C:\Users\svgroot> wsl.exe --list -v | ForEach-Object { $_ }
    N A M E                         S T A T E                       V E R S I O N

 *   U b u n t u                     S t o p p e d                   2

     U b u n t u - 1 8 . 0 4         S t o p p e d                   2

     A l p i n e                     S t o p p e d                   1

Environment data

Name                           Value
----                           -----
PSVersion                      6.2.3
PSEdition                      Core
GitCommitId                    6.2.3
OS                             Microsoft Windows 10.0.19001
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
Issue-Question Resolution-Fixed

Most helpful comment

This is due to a breaking change in .NET Core. You should initialize ProcessStartInfo.StandardInputEncoding/StandardErrorEncoding/StandardOutputEncoding if they're redirected. .NET Framework defaults to using Console.OutputEncoding if you don't initialize StandardOutputEncoding, but .NET Core defaults to calling Process.GetEncoding((int)Interop.Kernel32.GetConsoleOutputCP()) which is UTF8 (on my system).

This is the code that creates ProcessStartInfo:

https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1088-L1150

All 19 comments

That's not good - the bug is still present as of PowerShell Core 7.0.0-preview.4.

Here's a repro that doesn't require WSL:

[Console]::OutputEncoding = [text.encoding]::unicode; sfc /? | Write-Output

As a Pester test:

[Console]::OutputEncoding = [text.encoding]::unicode; sfc /? | Write-Output | Should -Not -Match "`0"

Has this ever worked in PowerShell Core, or has it been broken ever since 6.0.0?

@vexx32: It's also broken in 6.0.0.

This is due to a breaking change in .NET Core. You should initialize ProcessStartInfo.StandardInputEncoding/StandardErrorEncoding/StandardOutputEncoding if they're redirected. .NET Framework defaults to using Console.OutputEncoding if you don't initialize StandardOutputEncoding, but .NET Core defaults to calling Process.GetEncoding((int)Interop.Kernel32.GetConsoleOutputCP()) which is UTF8 (on my system).

This is the code that creates ProcessStartInfo:

https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1088-L1150

If I understand correctly, then, a fix should be to set the ProcessStartInfo.StandardInput(/Output)Encoding to match [console]::Input(/Output)Encoding values explicitly?

Should this respect [console] encoding settings, or $OutputEncoding? From what I recall, those values don't always align, if I'm not mistaken?

I don't know if it should use $OutputEncoding or Console.OutputEncoding, but the code would be something like this:

C# bool redirectStdOut = true; bool redirectStdErr = true; bool redirectStdIn = false; var startInfo = new ProcessStartInfo(); if (redirectStdOut) { startInfo.RedirectStandardOutput = true; startInfo.StandardOutputEncoding = Console.OutputEncoding; } if (redirectStdErr) { startInfo.RedirectStandardError = true; startInfo.StandardErrorEncoding = Console.OutputEncoding; } if (redirectStdIn) { startInfo.RedirectStandardInput = true; startInfo.StandardInputEncoding = Console.InputEncoding; }

Actually to match PS 5.1 behavior it should not use $OutputEncoding

Agreed, @0xd4d: I don't know how .StandardInput comes into play, but on the output side It should definitely be [Console]::OutputEncoding, because that is how it has always worked in Windows PowerShell, where it determines how PowerShell decodes stream output _from_ external programs.

$OutputEncoding controls what encoding is used to send data from Powershell _to_ external programs, via a pipe. It defaults to UTF-8 in PSCore and to ASCII(!) in WinPS. In either edition it can differ from [Console]::OutputEncoding.

Thanks for looking into this, everyone. Hopefully this can get fixed soon.

Yep. Note that C:\Windows\system32\sfc.exe in Windows 10 outputs utf-16. It's a powershell question that comes up occasionally.

@mklement0 I guess that would mean .StandardInputEncoding should match $OutputEncoding, then? 馃 On the assumption that we may be piping _into_ such a command as well.

@vexx32 I've only glanced at the code, and I see that the pipe that is connected to the child process' stdin explicitly uses $OutputEncoding:

https://github.com/PowerShell/PowerShell/blob/425bc36a6fe66b571fc88f25500b0b3a3cf3e2a7/src/System.Management.Automation/engine/NativeCommandProcessor.cs#L1797-L1801

I don't fully understand how that relates to the default .StandardInput encoding - it looks like it may override it.

@SvenGroot do you have any examples on the input side where we have issues with encoding?

@SteveL-MSFT No, I only use OutputEncoding in my scenario.

Here's my _guess_ as to what we should do:

  • When piping data from PowerShell to an external process, it is $OutputEncoding that already drives the standard input encoding for the child process (no change there - this was never broken).

  • When _not_ piping (starting an interactive console application, for instance), i.e. when stdin is _not_ redirected, we should set .StandardInput to [Console]::InputEncoding.

  • Whether redirected or not, .StandardOutput should always be set to [Console]::OutputEncoding

@mklement0 when not piping, what is value of setting .StandardInput to any encoding? For your 3rd bullet, I believe you meant [Console]::OutputEncoding. For my PR, I'm focusing on output only unless someone brings a case where input encoding is a problem.

@SteveL-MSFT: Thanks for the correction re 3rd bullet point - I've fixed my previous comment.

what is value of setting .StandardInput to any encoding?

My thinking is: An interactive console application that reads from stdin probably expects the _console's_ (terminal's) input encoding to be in effect (that's presumably how it works in Windows PowerShell).

Glad to see this was fixed for .StandardOutput.

As for setting .StandardInput to [Console]::InputEncoding: please see #10907, @SteveL-MSFT.

:tada:This issue was addressed in #10824, which has now been successfully released as v7.0.0-preview.6.:tada:

Handy links:

Was this page helpful?
0 / 5 - 0 ratings