Using Windows 1252 encoding, create a file "test.txt" that contents this sentence :
cette fonction doit 锚tre appel茅e avant l'initialisation de l'API
Try to convert the file "test.txt" from Windows 1252 to UTF8 using this script.
Param (
[Parameter(Mandatory=$True)][String]$SourcePath
)
Get-ChildItem $SourcePath* -recurse -Include *.txt | ForEach-Object {
$content = $_ | Get-Content
Set-Content -PassThru $_.Fullname $content -Encoding UTF8 -Force}
In UTF8 :
cette fonction doit 锚tre appel茅e avant l'initialisation de l'API
In UTF8:
cette fonction doit 锟絫re appel锟絜 avant l'initialisation de l'API
Name Value
---- -----
PSVersion 6.1.0-preview.1
PSEdition Core
GitCommitId v6.1.0-preview.1
OS Microsoft Windows 6.1.7601 S
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
Powershell 4.0 does not have this issue
The default encoding in PowerShell Core is now UTF-8 (without a BOM when creating files).
That means that a Windows 1252-encoded file - in the absence of a BOM defining it as such (there is none for Windows 1252) - is now interpreted as _UTF-8_.
The upshot is that you must now tell Get-Content what encoding to assume - unless it is UTF-8 or there is a BOM.
Regrettably, Get-Content doesn't currently allow you to specify Windows 1252, because Default now represents UTF-8 and no longer the active "ANSI" code page (such as Windows 1252), as on Windows PowerShell, and you cannot pass a [System.Text.Encoding] instance directly.
This is an oversight that must be corrected.
My suggestion: add an ANSI encoding enumeration value on Windows that represents the system's legacy "ANSI" code page (e.g., Windows 1252 on US-English systems).
The - cumbersome - workaround to use in the meantime requires use of the .NET framework directly:
$content = [IO.File]::ReadAllText($_.FullName, [text.encoding]::GetEncoding(1252))
Or, more generically:
$content = [IO.File]::ReadAllText($_.FullName, [text.encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage))
@mklement0
PowerShell Core 6.0 accepts System.Text.Encoding class in -Encoding parameter. (#5080)
We can write as follow.
$content = $_ | Get-Content -Encoding ([System.Text.Encoding]::GetEncoding(1252))
# or
$content = $_ | Get-Content -Encoding ([System.Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage))
Additionally, WindowsLegacyg is proposed in RFC.
(but WindowsLegacyg is not implemented yet...)
It is better to discuss this RFC if compatibility is necessary.
Maybe #5204 related.
@stknohg:
Ah, thanks. Somehow I had wrongly convinced myself that you couldn't directly pass a System.Text.Encoding instance - thanks for clarifying that.
I think the discussion around the linked RFC eventually led to the current Core behavior of globally defaulting to BOM-less UTF-8 - see https://github.com/PowerShell/PowerShell-RFC/issues/71
The WindowsLegacy meta-setting was intended for a never-implemented $PSDefaultEncoding preference variable, and was meant to _globally_ revert to the old, inconsistent encoding behavior for the sake of backward compatibility - an approach that I personally think is not worth pursuing.
Again, given that OEM - the OEM code page implied by the legacy system locale - already exists as a predefined encoding enumeration value, it should be complemented with an ANSI identifier for the "ANSI" code page implied by the system locale (on Windows only; the equivalent of what Default represents for _Windows_ PowerShell).
Certainly, to introduce ANSI is simpler and not globally as you say.
I think it's good.
The workaround proposed by mklement0 works for me.
I propose to close this issue since the rest of the discussion is mainly focused on BM-less UTF8 which is indeed treated in PowerShell/PowerShell-RFC#71.
Thanks.
@Calimerou: Alternatively, we could retitle your issue and modify the initial post to propose the missing ANSI encoding-enumeration value, as discussed. If you prefer my creating a new issue instead, let me know.
I would prefer yours.
Thanks in advance.
For now, I work around this issue in my scripts as follows:
````
$iswinps = ($null, 'Desktop') -contains $PSVersionTable.PSEdition
if (!$iswinps)
{
$encoding = [System.Text.Encoding]::GetEncoding(1252)
}
else
{
$encoding = [Microsoft.PowerShell.Commands.FileSystemCmdletProviderEncoding]::Default
}
Get-Content -Encoding $encoding ...
````
HTH
Most helpful comment
@stknohg:
Ah, thanks. Somehow I had wrongly convinced myself that you couldn't directly pass a
System.Text.Encodinginstance - thanks for clarifying that.I think the discussion around the linked RFC eventually led to the current Core behavior of globally defaulting to BOM-less UTF-8 - see https://github.com/PowerShell/PowerShell-RFC/issues/71
The
WindowsLegacymeta-setting was intended for a never-implemented$PSDefaultEncodingpreference variable, and was meant to _globally_ revert to the old, inconsistent encoding behavior for the sake of backward compatibility - an approach that I personally think is not worth pursuing.Again, given that
OEM- the OEM code page implied by the legacy system locale - already exists as a predefined encoding enumeration value, it should be complemented with anANSIidentifier for the "ANSI" code page implied by the system locale (on Windows only; the equivalent of whatDefaultrepresents for _Windows_ PowerShell).