This API proposal arose out of dotnet/runtime#25804.
This proposal was approved before but we discovered the original proposal was returning EncodingInfo object while we didn't expose any public constructor for this type.
The delta change in this proposal is exposing a public constructor for EncodingInfo to allow creating such objects
As of this writing:
System.Text.Encoding.GetEncodings()
only ever enumerates the _default_ encodings available.
System.Text.CodePagesEncodingProvider.Instance
lacks a method for enumerating _additional_ encodings registered via System.Text.Encoding.RegisterProvider(EncodingProvider)
, which - via a call to System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance)
- makes the Windows code-page character encodings available to .NET Core.
However, it is desirable to have the ability to enumerate _all_ available encodings:
Even in the abstract it seems like an awkward omission not to be able to reflect on the available encodings; more concretely, consider the following use cases, both related to PowerShell Core:
PowerShell Core commands such as Get-Content
for reading text files support an -Encoding
parameter that directly accepts System.Text.Encoding
instances, so users expect to be able to _discover_ all available encodings and/or be assisted in selecting a specific encoding with _tab completion_:
Get-Content file.txt -Encoding <tab-keypress> # should show all available encodings
Convert-TextFile
command will allow conversion between all available character encodings, so their discovery / ease of selection is important there too.Add a GetEncodings()
method to abstract class System.Text.EncodingProvider
:
namespace System.Text
{
public abstract class EncodingProvider
{
// ...
// New virtual method, analogous to System.Text.Encoding.GetEncodings()
public virtual System.Text.EncodingInfo[] GetEncodings ();
// ...
}
public sealed class EncodingInfo // already exposed class
{
// This is internal constructor and we are making it public
public EncodingInfo(int codePage, string name, string displayName);
}
}
And System.Text.CodePagesEncodingProvider
would then implement that method to return the encodings it provides.
Specifically, a call to System.Text.CodePagesEncodingProvider.Instance.GetEncodings()
would then return the array of System.Text.EncodingInfo
instances representing that provider's encodings.
As with System.Text.Encodings.GetEncodings()
, no particular enumeration order is guaranteed.
System.Text.Encodings.GetEncodings()
Once System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance)
has been called, System.Text.Encodings.GetEncodings()
will enumerate the additional encodings too, using the newly introduced method on the provider _if_ registered.
In other words: System.Text.Encodings.GetEncodings()
will return whatever encodings are _currently available_ - whether just the default set by default or the _union_ of the default set and the additional encodings after provider registration.
As with the existing enumeration, the encodings (EncodingInfo
instances) are returned in no guaranteed order.
If there is more than one registered provider that supports a given encoding, the returned list will contain only one entry for it.
GetEncodings
method virtual
rather than abstract
.
public abstract System.Text.EncodingInfo[] GetEncodings()
This cannot be abstract method. Adding abstract method to existing type is a breaking change.
Either way, a new overload is needed for GetEncodings()
I do not think we need a new overload for GetEncodings()
. GetEncodings()
should just return all encodings available.
@mklement0 thanks for the proposal.
For the open question, System.Text.Encodings.GetEncodings() should report all encodings including the provider encodings. I am not sure why this will be a compatibility issue? I cannot think in any breaking scenario here as we are going to return a superset from what we used to return. could you please elaborate more on that?
@tarekgh, @jkotas:
To be clear, my preference too is for System.Text.Encodings.GetEncodings()
to return all _available_ encodings, at least by default.
If there's no concern about backward compatibility (see below) and no need to support discovering _just_ the default encodings _while additional ones happen to be registered_, then I'm happy to
make do without a new System.Text.Encodings.GetEncodings()
overload.
let System.Text.Encodings.GetEncodings()
unconditionally return whatever encodings are _currently_ available.
I'm personally not worried about backward compatibility in this case, but _conceivably_ there is code out there that relies on System.Text.Encodings.GetEncodings()
only ever returning the _default_ encodings; in other words: someone may rely on System.Text.Encodings.GetEncodings()
as a reliable way to discover the _default encodings_.
So: Do wee need to worry about retaining a predictable, invariant way to discover the set of default encodings?
@jkotas:
This cannot be abstract method. Adding abstract method to existing type is a breaking change.
Thanks for pointing that out; I need guidance as to what to propose instead.
please remove the open question then.
Thanks for pointing that out; I need guidance as to what to propose instead.
make the method virtual instead of abstract.
@tarekgh: Thanks - done. I've also added details re return behavior (enumeration order); let me know if that makes sense.
@mklement0
System.Text.CodePagesEncodingProvider would then have to implement that method to return the encodings it provides, in ascending order of the CodePage property values of the EncodingInfo instances, analogous to System.Text.Encodings.GetEncodings().
System.Text.Encodings.GetEncodings documentation doesn't promise for any sort order. From where you got that?
https://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings(v=vs.110).aspx#Remarks
@tarekgh:
You're right - I didn't check the docs, I just inferred it from the de facto behavior.
Do you want me to say that the order is unspecified? What about when forming the union between default and registered encodings?
I prefer not mentioning anything about the order. we just promise to return the whole set supported by the core and the providers.
@tarekgh: Done (I've added a note that no particular enumeration order is guaranteed).
Thanks @mklement0
I have added the following statement too to the proposal integration section. let me know if you have any concern with that.
If there is more than one registered provider support same specific encoding, the returned list will contain only one entry for that specific encoding.
CC @krwq
Thanks, @tarekgh - good addition (I've streamlined the wording a bit).
There may be another scenario if the user wants to get only the default list:
```c#
public virtual System.Text.EncodingInfo[] GetEncodings (bool IncludeDefaultsOnly = false);
@iSazonov: This was discussed above, but both @jkotas and @tarekgh think it's not necessary.
(On a side note: My guess is that _separate method overloads_ are preferred to _optional parameters_; may be worth adding a clarification to the coding style document.)
EncodingProvider.GetEncodings()
should be IEnumerable<EncodingInfo>
EncodingProvider.GetEncodings()
should be virtual and return an empty list (instead of throwing). While this isn't correct it means that one provider that isn't updated doesn't spoil the enumeration for everyone else.Encoding.GetEncodings()
should return all registered encodings across all providers and de-dupe them if necessary.@mklement0 do you want to help in the implementation?
@tarekgh: Sounds good; I'll get started soon (unlike last time, where Anipik thankfully eventually jumped in).
@mklement0 Ping me too if you want.
I gave it a shot, but hit roadblocks (see below - some may require renewed discussion and sign-off).
Given my inexperience, I suggest someone more experienced take this on - @iSazonov, please feel free to take over.
Implementing this API requires CoreCLR (CoreLib) changes too, and I don't know how to coordinate that (even just adding a new virtual System.Text.EncodingProvider.GetEncodings()
method breaks CoreFX-only builds):
Encoding.cs
, whose Encoding.GetEncodings()
method needs modifying, is part of CoreLib.
Encoding.GetEncodings()
would need to iterate over all registered encoding providers in order to ask each for the encodings it provides, but EncodingProvider
currently exposes no method for enumerating them.
On the CoreFX side:
It's unclear how to instantiate EncodingInfo
: There's only an internal
constructor in CoreCLR, internal EncodingInfo(int codePage, string name, string displayName)
, which CoreFX cannot access.
CodePagesEncodingProvider.cs
_itself_ contains only the IDs of a small set of the encodings it supports (the so-called _rare_ ones); the rest will have to be gleaned indirectly from EncodingTable.Data.cs
, via a yet-to-be-added EncodingTable
method such as internal static EncodingInfo[] GetEncodings()
.
@mklement0 thanks for trying. we'll try to get into this at some point.
t's unclear how to instantiate EncodingInfo: There's only an internal constructor
I think you have raised a good point too. we may need to revisit discussing EncodingInfo class to enable constructing it outside coreclr and provide the needed functionality. we'll try to think more about this one.
C#
public sealed partial class EncodingInfo
{
internal EncodingInfo() { }
public int CodePage { get { throw null; } }
public string DisplayName { get { throw null; } }
public string Name { get { throw null; } }
public override bool Equals(object value) { throw null; }
public System.Text.Encoding GetEncoding() { throw null; }
public override int GetHashCode() { throw null; }
}
@mklement0 I haven't an expirience to contribute in CoreCLR/CoreFX :confused:
breaks CoreFX-only builds
I believe it will be two PRs - in CoreCLR repo and in CoreFX repo. After the first one will be merged it will be automatically replicated in Core FX repo. After that the second PR in CoreFX can be continue.
@tarekgh Have you any news about EncodingInfo?
Have you any news about EncodingInfo?
No update yet because it is kind of low priority for now. But I suggest adding the following public constructor to EncodingInfo
C#
public EncodingInfo(int codePage, string name, string displayName) { }
EncodingProvider.GetEncodings()
should be IEnumerable<EncodingInfo>
EncodingProvider.GetEncodings()
should be virtual and return an Enumerable.Empty<EncodingInfo>()
(instead of throwing). While this isn't correct it means that one provider that isn't updated doesn't spoil the enumeration for everyone else.Encoding.GetEncodings()
should return all registered encodings across all providers and de-dupe them if necessary.EncodingInfo.GetEncoding()
calls the static Encoding.GetEncoding()
method, which means getting encoding infos and using EncodingInfo.GetEncoding()
might create an encoding that isn't tied to that encoding provider.EncodingInfo
should take an EncodingProvider
that we'll use in EncodingInfo.GetEncoding()
and call EncodingProvider.GetEncoding(codePage)
.```C#
namespace System.Text
{
public partial class EncodingProvider
{
public virtual IEnumerable
}
public partial class EncodingInfo
{
public EncodingInfo(EncodingProvider provider, int codePage, string name, string displayName);
}
}
```
Most helpful comment
Video
EncodingProvider.GetEncodings()
should beIEnumerable<EncodingInfo>
EncodingProvider.GetEncodings()
should be virtual and return an empty list (instead of throwing). While this isn't correct it means that one provider that isn't updated doesn't spoil the enumeration for everyone else.Encoding.GetEncodings()
should return all registered encodings across all providers and de-dupe them if necessary.