Powershell: Unicode Display Error (discovered via npm STDOUT)

Created on 19 May 2017  Â·  14Comments  Â·  Source: PowerShell/PowerShell


@vors Hi. I spoke with you yesterday in Gitter. Thanks for your reply.

Steps to reproduce

Have something to STDOUT try to use Unicode characters. (I don't know if this is actually STDOUT on Windows or whatever, but the equivalent if it's not.)

Expected behavior

Render Unicode Characters

Actual behavior

Wrong characters rendered.
powershell_bad_charset

Environment data

> $PSVersionTable
Name                           Value
----                           -----
PSVersion                               5.1.14393.1198
PSEdition                                Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                           10.0.14393.1198
CLRVersion                             4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion  2.3
SerializationVersion                1.1.0.1

I think this may be the issue?
http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters

Issue-Discussion WG-Interactive-Console

Most helpful comment

I get your frustration, and while PowerShell itself has work left to do with respect to UTF-8 support - and getting it right is the aim of the yet-to-be-released v6 - the problem is related to a layer _below_ the shell and incomplete support for it in the _Python_ version that the current docker-compose is built on:

  • Shells (such as cmd.exe or PowerShell) as well as console applications need to be told by the _environment_ what character encoding to use.

  • In the world of Windows consoles (console windows), it is the active Windows code page that determines the character encoding, as reported by the chcp utility, for instance.

  • Code pages are identified by numbers, and the official Windows code page for UTF-8 is 65001.

  • The Python 2.x version that docker-compose is built on doesn't recognize this particular code page when printing to the console (it doesn't know what encoding cp65001 means).

    • This is indicative of Python's own slow, painful evolution toward complete Unicode support.
    • In a stand-alone 2.x Python version you can fix the problem by setting the PYTHONIOENCODING environment variable, but this doesn't work for _embedded_ versions, which is the case for docker-compose.
    • It seems that the problem was fixed in Python 3.5, so using that as the embedded version for docker-compose would probably fix the issue, but is a non-trivial effort.

There's nothing that any shell invoking docker-compose can do to fix that directly.


However, there is a workaround:

Seemingly, if you don't output to the _console_ directly, the problem doesn't surface.

Using a pipeline in PowerShell that seemingly just passes the lines through seems to help:

Powershell decodes the input before sending it through the pipeline, and re-encodes it on output - and it _does_ recognize code page 65001 as UTF-8 (but note that when you send output to a _file_ you need to be aware of PowerShell's quirky, cmdlet-dependent encoding defaults).

docker-compose --version | % ToString  # Doesn't break with code page 65001

You can define a function wrapper (be sure to invoke it without the .exe extension):

 function docker-compose { docker-compose.exe $args | ForEach-Object ToString }

All 14 comments

Node.js outputs UTF-8 by default, though a program is free to change stdout's encoding.

_Generally_, here's what you need to do to make your PowerShell console window UTF-8 aware:

  • On Windows 8.1 or below, ensure that the window uses a TrueType font (in Windows 10, it is by default). This is the prerequisite for being able to display _all_ Unicode characters.

  • Additionally:

    • The console window's code page must be switched to 65001, the UTF-8 code page (which is usually done with chcp 65001, but the PowerShell command below does that implicitly).

    • PowerShell must be instructed to use UTF-8 to communicate with _external utilities_, both when _sending input_ and _receiving output_.

The following magic incantation in PowerShell does this (as stated, this _implicitly_ performs chcp 65001):

$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

Do note that _legacy_ utilities that do not support Unicode may _break_ in such a console window.

_Edit_: To _persist_ these settings, i.e. to make your future interactive PowerShell sessions UTF-8-aware by default, add the command above to your $PROFILE file.

You can verify proper UTF-8 handling with the following test command:

PS> $captured = '€' | node -pe "require('fs').readFileSync(0).toString().trim()"; $captured; $captured.Length
€  # '€' character (U+20ac; UTF-8 0xe2 0x82 0xac) was properly echoed.
1  # '€', despite being composed of *3* bytes in UTF-8, was properly recognized as a *single* char.

Note: Not all Unicode-aware fonts are created equal in terms of the set of Unicode characters they support; in Windows 10, the following fonts seem to support the most characters:

  • MS Gothic
  • NSimSun
  • SimSun-ExtB
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

Did the trick for me. Is this persistent? Do i need to run this every time I open a Powershell? Any info would be awesome, like if I need to stick it in a config file or something.

@mklement0 You're the best! Thanks!

@chuanjiao10 Hope you're feeling better today!

@412andrewmortimer:

My pleasure.

Do i need to run this every time I open a Powershell?

Just put the command in your $PROFILE - I've also updated my previous comment.

@chuanjiao10: I'm not entirely sure what you mean, but, yes, PowerShell should at least have a less obscure way of handling UTF-8 properly and ideally even default to UTF-8 in PowerShell Core.

There is a pending RFC about UTF-8 support; it is being debated here.

@mklement0 So that solution worked, but then I tried to use Docker, namely docker-compose and this is what it says.

  Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "logging\__init__.py", line 872, in emit
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Logged from file service.py, line 470
Traceback (most recent call last):
  File "docker-compose", line 3, in <module>
  File "compose\cli\main.py", line 64, in main
  File "compose\cli\main.py", line 116, in perform_command
  File "compose\cli\main.py", line 888, in up
  File "site-packages\colorama\ansitowin32.py", line 40, in write
  File "site-packages\colorama\ansitowin32.py", line 141, in write
  File "site-packages\colorama\ansitowin32.py", line 169, in write_and_convert
  File "site-packages\colorama\ansitowin32.py", line 174, in write_plain_text
LookupError: unknown encoding: cp65001
Failed to execute script docker-compose

This SO answer may help.

I don't have Python installed on Windows at all. Compose is written in Python, so I'm assuming that there is something highly abstracted happening. (Pretty sure compose executes in containers.)

@412andrewmortimer: You're right - I just installed Docker and setting $env:PYTHONIOENCODING='UTF-8' before calling docker-compose --version didn't help.

Note that the problem is not PowerShell-related, however: running chcp 65001 in a cmd.exe window and then running docker-compose --version breaks too.

@412andrewmortimer: It's a known Composer issue.

I don't understand. How do I fix it?

@412andrewmortimer:

  • It's a bug in docker-compose and has nothing to with PowerShell.

  • You'll have to wait for the team behind docker-compose to release a fix (or try to contribute one yourself).

  • Glancing at the linked issue, it sounds like they'll have to switch to Python 3.x, which is probably a nontrivial undertaking and would explain why the issue is still not fixed, despite first having been reported more than 1 year ago. That, and the fact that using UTF-8 console windows on Windows is still an exotic and brittle proposition, unfortunately.

I still don't understand why.

Shouldn't this be something Powershell does? Isn't it a bug in docker-compose and Powershell? I mean... it's not like docker-compose is magically running without Powershell. Docker is a command line tool and Powershell is a command line. I don't think blaming docker-compose is a good thing to say unless the point is to sweep it under the rug. Maybe it's the fact that "using UTF-8 console windows on Windows is still an exotic and brittle proposition, unfortunately." Is there somewhere to find out why Windows command line is this way when it's generally not at all like this in other terminals?

Unfortunately I _have_ to use Windows as a host OS at the very least. Is there something I can do to fix this in this repository? Because Powershell and development on Windows is pretty painful in general and if I can fix some of these issues, I would love to be able to. Because I highly doubt that I am alone in this sentiment.

I think it's obvious when generally the first step in nearly any development I do on Windows is "figure out how to run one command to not have to deal with Windows and that will virtualize or emulate an environment that actually works." It's just really perplexing.

Have you tried to develop with Docker, Ruby, and JavaScript on Windows? It's a pretty painful experience in general. I mean Git on Windows even ships MINGW. I assume to avoid Powershell. I mean... why?

I know this has become larger than Powershell not rendering UTF-8... but it's all tied in as far as I'm concerned. Even if it requires other issues to be opened.

(I'm not trying to be snarky or entitled. I get it. All of this is very complex, and sometimes you can only have one or the other when it comes to some decisions. All in all I just want development (I suppose _the type of development I do_) to be better on Windows.)

I get your frustration, and while PowerShell itself has work left to do with respect to UTF-8 support - and getting it right is the aim of the yet-to-be-released v6 - the problem is related to a layer _below_ the shell and incomplete support for it in the _Python_ version that the current docker-compose is built on:

  • Shells (such as cmd.exe or PowerShell) as well as console applications need to be told by the _environment_ what character encoding to use.

  • In the world of Windows consoles (console windows), it is the active Windows code page that determines the character encoding, as reported by the chcp utility, for instance.

  • Code pages are identified by numbers, and the official Windows code page for UTF-8 is 65001.

  • The Python 2.x version that docker-compose is built on doesn't recognize this particular code page when printing to the console (it doesn't know what encoding cp65001 means).

    • This is indicative of Python's own slow, painful evolution toward complete Unicode support.
    • In a stand-alone 2.x Python version you can fix the problem by setting the PYTHONIOENCODING environment variable, but this doesn't work for _embedded_ versions, which is the case for docker-compose.
    • It seems that the problem was fixed in Python 3.5, so using that as the embedded version for docker-compose would probably fix the issue, but is a non-trivial effort.

There's nothing that any shell invoking docker-compose can do to fix that directly.


However, there is a workaround:

Seemingly, if you don't output to the _console_ directly, the problem doesn't surface.

Using a pipeline in PowerShell that seemingly just passes the lines through seems to help:

Powershell decodes the input before sending it through the pipeline, and re-encodes it on output - and it _does_ recognize code page 65001 as UTF-8 (but note that when you send output to a _file_ you need to be aware of PowerShell's quirky, cmdlet-dependent encoding defaults).

docker-compose --version | % ToString  # Doesn't break with code page 65001

You can define a function wrapper (be sure to invoke it without the .exe extension):

 function docker-compose { docker-compose.exe $args | ForEach-Object ToString }

@mklement0 I appreciate your help and understanding! You are awesome, knowledgeable and helpful! I will try your fix ASAP.

I've been watching the compose issue you posted too: https://github.com/docker/compose/issues/2775

Was this page helpful?
0 / 5 - 0 ratings