Runtime: System.Console unexpectedly uses a UTF-8 encoding *with BOM* on Unix

Created on 28 Aug 2018  路  4Comments  路  Source: dotnet/runtime

On Unix-like platforms, the character encoding to use by default in System.Console instances is determined as follows:

  • If any of the following environment variables are set (which is the norm), the character encoding is derived from the first one that is defined: LC_ALL, LC_MESSAGES, LANG); e.g., if
    LANG="en_US.UTF-8" is present, string "UTF-8" is extracted and the Encoding instance is obtained by passing that string to Encoding.GetEncoding().

  • Otherwise, the default is _BOM-less_ UTF-8, as expected.

The problem is that calling Encoding.GetEncoding("UTF-8") yields a UTF-8 encoding _with BOM_, whereas Unix-like platforms neither expect nor do most utilities there know how to process a UTF-8 BOM.

Here's a quick demonstration in PowerShell:

PS> [Text.Encoding]::GetEncoding('UTF-8').GetPreamble().Count
3  # encoding *with BOM* (3 bytes) was returned.

As an aside: Curiously, even though on _Windows_ the BOM is also present, switching to UTF-8 in a cmd.exe console with chcp 65001 and using > to redirect a non-ASCII string to a file results in a UTF-8-encoded file _without_ BOM.

area-System.Console bug os-linux

Most helpful comment

Fixing milestone - this was fixed in master (3.0) branch.

All 4 comments

os-linux label - Seems MacOs is also affected.

Curiously, even though on Windows the BOM is also present, switching to UTF-8 in a cmd.exe console with chcp 65001 and using > to redirect a non-ASCII string to a file results in a UTF-8-encoded file without BOM.

Redirecting on Unix also gives a file without a BOM. The Console strips the BOM from the encoding for output:

https://github.com/dotnet/corefx/blob/775c9f1415be3403864fe2804a9c5e16074d2d68/src/System.Console/src/System/Console.cs#L116-L125

We can do the same thing to the Console.InputEncoding and Console.OutputEncoding.

Fixing milestone - this was fixed in master (3.0) branch.

I've since realized that this should be fixed on Windows as well - please see dotnet/corefx#35950.

Was this page helpful?
0 / 5 - 0 ratings