Powershell: [System.Text.Encoding]::GetEncodings() doesn't show full cogepage list

Created on 6 Apr 2018  路  11Comments  路  Source: PowerShell/PowerShell

I experimented with this in previous year and it worked. Seems we have a regression.

We register additional provider in our code so this should works.

Steps to reproduce

[System.Text.Encoding]::GetEncodings()

Expected behavior

Show list full list (~140 values) like Windows PowerShell

Actual behavior

Show only standard values

CodePage Name       DisplayName
-------- ----       -----------
    1200 utf-16     Unicode
    1201 utf-16BE   Unicode (Big-Endian)
   12000 utf-32     Unicode (UTF-32)
   12001 utf-32BE   Unicode (UTF-32 Big-Endian)
   20127 us-ascii   US-ASCII
   28591 iso-8859-1 Western European (ISO)
   65000 utf-7      Unicode (UTF-7)
   65001 utf-8      Unicode (UTF-8)

Environment data

> $PSVersionTable
Name                           Value
----                           -----
PSVersion                      6.1.0-preview.1
PSEdition                      Core
GitCommitId                    v6.1.0-preview.1-31-g879fcd27b8f66ef40dbeb750ade6332cdb10f27a
OS                             Microsoft Windows 10.0.10240
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
Issue-Enhancement Waiting - DotNetCore

All 11 comments

I think it is not a regression.

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); dose not affects Encoding.GetEncodings() results.

This additional provider introduced from 6.0 beta.1, but there is no change in the return value from that time.

PS C:\Program Files\PowerShell\6.0.0-beta.1> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      6.0.0-beta
PSEdition                      Core
BuildVersion                   3.0.0.0
CLRVersion
GitCommitId                    v6.0.0-beta.1
OS                             Microsoft Windows 10.0.16299
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

PS C:\Program Files\PowerShell\6.0.0-beta.1> [System.Text.Encoding]::GetEncodings()

CodePage Name       DisplayName
-------- ----       -----------
    1200 utf-16     Unicode
    1201 utf-16BE   Unicode (Big-Endian)
   12000 utf-32     Unicode (UTF-32)
   12001 utf-32BE   Unicode (UTF-32 Big-Endian)
   20127 us-ascii   US-ASCII
   28591 iso-8859-1 Western European (ISO)
   65000 utf-7      Unicode (UTF-7)
   65001 utf-8      Unicode (UTF-8)

Incidentally, c# code also returns similar results.

//sample code 
//using System.Linq;
//using System.Text;
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding.GetEncodings()
        .ToList()
        .ForEach(e => Console.WriteLine(string.Format("{0}\t{1} : {2}", e.CodePage, e.Name, e.DisplayName)));

results.

1200    utf-16 : Unicode
1201    utf-16BE : Unicode (Big-Endian)
12000   utf-32 : Unicode (UTF-32)
12001   utf-32BE : Unicode (UTF-32 Big-Endian)
20127   us-ascii : US-ASCII
28591   iso-8859-1 : Western European (ISO)
65000   utf-7 : Unicode (UTF-7)
65001   utf-8 : Unicode (UTF-8)

Unfortunately, the behavior appears to be by design (emphasis added):

Note

The list of supported encodings returned by the GetEncodings method does not include any
additional encodings made available by any EncodingProvider implementations that
were registered by calls to the RegisterProvider method.

And at least at first glance it's not obvious how to work around that, given that [System.Text.CodePagesEncodingProvider]::Instance lacks a method for enumerating the encodings registered later.

@iSazonov:

Interesting, and while this brute-force workaround (enumerating integers 0 through 65535 and calling .GetEncoding() for each) certainly _works_, you should grab a cup of coffee while it runs... (it took about 45 secs. on my machine).

Based on the URL cited as a source in the linked page, @stknohg may have originally published the workaround; @stknohg: any new insights since?

I'm not sure we necessarily need to solve this problem, however and if it does get solved, it should probably be in CorefFX.

Another option, assuming that the list of code pages is now _frozen_ (they are _legacy_ technology, after all): simply hard-code the list somewhere (141 entries).

@mklement0 No. I don't have new insights. I think that it is .NET Core design, me too.

Incidentally, I wrote the following code for checking the effect of [System.Text.Encoding] :: RegisterProvider ([System.Text.CodePagesEncodingProvider] :: Instance), not for a workaround.

for ($i = 0; $i -lt 65535; $i++){
    try{
        $enc = [System.Text.Encoding]::GetEncoding($i)
        Write-Output ("{0}, {1}, {2}" -f $i, $enc.WebName, $enc.EncodingName)
    }
    catch{}
}

Understood; thanks for letting us know, @stknohg.

Make sense to open a request for the enhancement in CoreFX repo? Seems it is useful.

@iSazonov: Worth a try: https://github.com/dotnet/corefx/issues/28915 (obsolete, see link in next comment).

The new .Net API was implemented and we get it in .Net 5.0 Preview7/8.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JohnLBevan picture JohnLBevan  路  3Comments

SteveL-MSFT picture SteveL-MSFT  路  3Comments

concentrateddon picture concentrateddon  路  3Comments

lzybkr picture lzybkr  路  3Comments

garegin16 picture garegin16  路  3Comments