Powershell: Encoding issue in handling output of git diff

Created on 7 Sep 2018  Â·  13Comments  Â·  Source: PowerShell/PowerShell

Before dismissing this as a potential issue with git and not with powershell, please read to the end.

Steps to reproduce

Create an empty git repository (git 2.21.0.windows.1) and put a file there that contains german umlauts, e.g. ä

mkdir gitrep
cd gitrep
git init .
"ä" | Out-File -Encoding utf8 file.txt
git add file.txt
git diff --cached > output.patch
Get-Content output.patch

Expected behavior

diff --git a/file.txt b/file.txt
new file mode 100644
index 0000000..8be8316
--- /dev/null
+++ b/file.txt
@@ -0,0 +1 @@
+ä

Actual behavior

diff --git a/file.txt b/file.txt
new file mode 100644
index 0000000..8be8316
--- /dev/null
+++ b/file.txt
@@ -0,0 +1 @@
+ä

Environment data

> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      6.1.3
PSEdition                      Core
GitCommitId                    6.1.3
OS                             Microsoft Windows 10.0.17134
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

The the umlaut does display correctly in the terminal, see the screenshot:

image

But, whenever the output of "git diff" is redirected to a file, the umlaut character becomes garbage. This works without problems in the windows commandline (cmd). To me, an indication that the problem is rather within powershell.

I have created a respective question at stackoverflow, but I think this may rather be a bug that should be brought to attention: https://stackoverflow.com/questions/52205297/the-output-of-git-diff-is-not-handled-correctly-in-powershell

There is a related Q&As, but I think this issue is much simpler and easier to reproduce
https://stackoverflow.com/questions/13675782/git-shell-in-windows-patchs-default-character-encoding-is-ucs-2-little-endian/13751617#13751617
https://stackoverflow.com/questions/36494026/git-diff-does-not-handles-character-encoding-other-than-utf-8

Issue-Question

Most helpful comment

@powercode That did it!

Setting [console]::OutputEncoding = [System.Text.Encoding]::UTF8 solved the issue. LESSCHARSET is not needed. I would think that UTF8 should be the default these days, but okay.

All 13 comments

I tested with version 6.1.3 of powershell core and latest git and the problem is still exactly the same.

Redirection to file uses Out-File and doesn't allow you to specify the encoding. I believe it should be defaulting to UTF8 w/o BOM in 6.1+

Might I suggest in the meantime using | Set-Content instead of >?

But regardless, whatever's going on here should still be sorted out. 🙂

Might I suggest in the meantime using | Set-Content instead of >?

I tried git diff --cached | Set-Content output.patch with and without --no-pager but the result is the same as with output redirection.

But regardless, whatever's going on here should still be sorted out. 🙂

For regular Powershell 5.1 the results are even stranger, there it makes a difference whether output redirection > or Set-Content is used: +ñ with Set-Content and ├ñ with output redirection, obviously both are wrong. In both Powershell 5.1 and Powershell core 6.1 the result looks good when it is printed to the terminal (using --no-pager or setting $Env:LESSCHARSET="utf8").

It works in good old cmd.exe. Still, I noted that executing cmd /c "git --no-pager diff --cached > output.patch" and then viewing the file with Get-Content .\output.patch in the console window looks okay in Powershell core 6.1, but looks wrong in Powershell 5.1 (ä)

What are your settings for the following values?

  • [console]::OutputEncoding
  • [console]::InputEncoding
  • $OutputEncoding
  • InputEncoding
> [console]::OutputEncoding
Preamble          :
BodyName          :
EncodingName      : Western European (DOS)
HeaderName        :
WebName           : ibm850
WindowsCodePage   :
IsBrowserDisplay  :
IsBrowserSave     :
IsMailNewsDisplay :
IsMailNewsSave    :
IsSingleByte      : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : False
CodePage          : 850

> [console]::InputEncoding
Preamble          :
BodyName          :
EncodingName      : Western European (DOS)
HeaderName        :
WebName           : ibm850
WindowsCodePage   :
IsBrowserDisplay  :
IsBrowserSave     :
IsMailNewsDisplay :
IsMailNewsSave    :
IsSingleByte      : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 850

> $OutputEncoding
Preamble          :
BodyName          : utf-8
EncodingName      : Unicode (UTF-8)
HeaderName        : utf-8
WebName           : utf-8
WindowsCodePage   : 1200
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : True
CodePage          : 65001

InputEncoding - I don't know

Oops, forgot the $ on that last one, sorry. Should be $InputEncoding

Oops, forgot the $ on that last one, sorry. Should be $InputEncoding

$InputEncoding returns nothing

The output comes from less, so try setting

$env:LESSCHARSET='UTF-8'

@powercode Could you reproduce the issue?

The problem does not seem to be related to pagers. You can always use the --no-pager option in git, which still shows the problem.

For me, the combination of setting LESSCHARSET and [console]::OutputEncoding to utf8 worked.

@powercode That did it!

Setting [console]::OutputEncoding = [System.Text.Encoding]::UTF8 solved the issue. LESSCHARSET is not needed. I would think that UTF8 should be the default these days, but okay.

Setting [console]::OutputEncoding = [System.Text.Encoding]::UTF8 solved the issue.

This fixed it for me too. Thanks!

The output comes from less, so try setting

$env:LESSCHARSET='UTF-8'

This has bugged me for YEARS and settings $env:LESSCHARSET='UTF-8' fixed it in git log output for me (e.g. author name).

Its still not perfect though, using

git log -1 --show-signature

to show a gpg signed commit, German Umlaute, (and Line Breaks it seems) are not displayed correctly, but I can live with that. The linebreak issue may come from using gpg4win ¯\_(ツ)_/¯

Behavior is that same for powershell 5, powershell 7 and vscode integrated powershell terminal and git bash for windows (MINGW64).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MaximoTrinidad picture MaximoTrinidad  Â·  3Comments

lzybkr picture lzybkr  Â·  3Comments

andschwa picture andschwa  Â·  3Comments

JohnLBevan picture JohnLBevan  Â·  3Comments

manofspirit picture manofspirit  Â·  3Comments