[x] I was not able to find an open
or closed issue
matching what I'm seeing
Which version of Git for Windows are you using? 32-bit or 64-bit? Include the
output of git version as well.
$ git --version
git version 2.8.2.windows.1
Windows7 64bit
defaults
user.name in ~/.config contains a 'german double s' / 'sharp S' / 'ß'.
Relevant entries in ~/.config:
[user]
email = ...
name = ...ß...
[gui]
spellingdictionary = de_DE
encoding = utf-8
gcwarning = false
Bash
A commit in Git Bash works fine.
A commit in Git Gui translates der autor name from '...ß...' in '...Ã??...'. The date of change is also wrong.
Correct autor name and time stamp in the repository.
A commit with Git Gui should have the same result as the command 'git commit' in Git Bash.
Git Gui destoys autor name and date.
The problem occurse in all repositories.
I am using some repositories since ten year.
I suspect this is due to (what appears to be) a default encoding setting in Git GUI.
Edit | Options....git config --local ...). The second column applies globally (e.g. git config --global ...).ß show up as Ã?? -- the ß is encoded as UTF-8, but ISO-8859-1 is a different byte encoding. Change the default file contents encoding to a UTF-8 compatible encoding and you should be good to go.NOTE: If you change this setting, it may not "fix" the author name of past commits. But future commits made by Git GUI should result in the author's name appearing correctly. It is possible to go back and fix the author name of those past commits if they don't appear correctly after changing this setting. It's a bit involved, depending on your level of knowledge about Git, but it can be done.
Thanks fourpastmidnight, but the default file encoding was already set to utf-8.
(see obove, 'Relevant entries in ~/.config')
Full test case:
#!/bin/bash
#Full test case
# install current git version (2.8.3)
# use only default options
# open git bash
# show some properties
jo@eld MINGW64 /t/tmp
$ uname
MINGW64_NT-6.1
jo@eld MINGW64 /t/tmp
$ git --version
git version 2.8.3.windows.1
jo@eld MINGW64 /t/tmp
$ echo $LANG
de_DE.UTF-8
# remove global gitconfig
jo@eld MINGW64 /t/tmp
$ mv ~/.gitconfig ~/.gitconfig.org.$(date +'%Y%m%dT%H%M%S')
# create new repository
jo@eld MINGW64 /t/tmp
$ git init test
Initialized empty Git repository in T:/tmp/test/.git/
jo@eld MINGW64 /t/tmp
$ cd test
# set encoding to utf-8, users name and email
# and show some file properties and git config
jo@eld MINGW64 /t/tmp/test (master)
$ git config gui.encoding utf-8
jo@eld MINGW64 /t/tmp/test (master)
$ file .git/config
.git/config: ASCII text
jo@eld MINGW64 /t/tmp/test (master)
$ git config user.name Groß
jo@eld MINGW64 /t/tmp/test (master)
$ git config user.email [email protected]
jo@eld MINGW64 /t/tmp/test (master)
$ file .git/config
.git/config: UTF-8 Unicode text
jo@eld MINGW64 /t/tmp/test (master)
$ git config --list
core.symlinks=false
core.autocrlf=input
core.fscache=true
color.diff=auto
color.status=auto
color.branch=auto
color.interactive=true
help.format=html
http.sslcainfo=C:/Program Files/Git/mingw64/ssl/certs/ca-bundle.crt
sendemail.smtpserver=/bin/msmtp.exe
diff.astextplain.textconv=astextplain
rebase.autosquash=true
credential.helper=manager
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
core.hidedotfiles=dotGitOnly
gui.encoding=utf-8
user.name=Groß
[email protected]
# commit a change
jo@eld MINGW64 /t/tmp/test (master)
$ echo 'Test file' > t.txt
jo@eld MINGW64 /t/tmp/test (master)
$ git add t.txt
jo@eld MINGW64 /t/tmp/test (master)
$ git commit -m 'a test file' -m 'Groß'
[master (root-commit) 7cdf009] a test file
1 file changed, 1 insertion(+)
create mode 100644 t.txt
jo@eld MINGW64 /t/tmp/test (master)
$ git log
commit 7cdf00988023305d1701dcde5a8e27dfc91a49c3
Author: Groß <[email protected]>
Date: Sat May 21 15:43:09 2016 +0200
a test file
Groß
# amend the commit
jo@eld MINGW64 /t/tmp/test (master)
$ git commit --amend -m 'A test file' -m 'Groß'
[master 95d73cc] A test file
Date: Sat May 21 15:43:09 2016 +0200
1 file changed, 1 insertion(+)
create mode 100644 t.txt
jo@eld MINGW64 /t/tmp/test (master)
$ git log
commit 95d73cc05de08a54e4659e76ce929de802ec6424
Author: Groß <[email protected]>
Date: Sat May 21 15:43:09 2016 +0200
A test file
Groß
# open git gui
# and amend the commit
jo@eld MINGW64 /t/tmp/test (master)
$ git gui
# show commit
# author name is destroyed
jo@eld MINGW64 /t/tmp/test (master)
$ git log
commit ce95aa942ad9330baebfee2dea3889f14043b7de
Author: GroÃ? <[email protected]>
Date: Sat May 21 15:43:09 2016 +0200
A test file / Amend Last Commit in Git Gui
Groß
jo@eld MINGW64 /t/tmp/test (master)
$
@dscho
It is a regression may be caused by commit 30395c645adf21828bbccbdcd3c5c36c39f07050
tagv2.8.1.windows.1 works.
(But the author date is reset.)
Commit 30395c645adf21828bbccbdcd3c5c36c39f07050 should be reverted.
It causes to a different behavior than https://github.com/git/git.
If necessary, this change would have to occur in https://github.com/git/git (for all platforms).
@bitjo: Ah, good find. And I apologize, I missed your configuration entry for the Gui in your original post.
Commit 30395c6 should be reverted.
So you ask me to reintroduce a bug?
So,@bitjo, can you fix it and create a pull request? If there are tests for Git GUI, could you also write a test?
@dscho
Commit 30395c6 should be reverted.
So you ask me to reintroduce a bug?
Yes, please.
In my opinion, the commit implemented a bug:
A commit with the option 'Amend Last Commit' in Git Gui behaves like 'git commit --reset-author --amend' (tested on Linux using git version 2.7.3 and Windows git version 2.8.1).
Under Windows git version 2.8.2 or 2.8.3, the commit acts like 'git commit --amend' (without '--reset-author').
The implementations under different environments (Linux, Windows, older versions) behave not the same.
(One side effect of the change for Windows since git version 2.8.2 is that UTF-8 characters in the author's name are not handled correctly.)
revert commit 30395c645adf21828bbccbdcd3c5c36c39f07050
Then the implementations under different environments (Linux, Windows, older versions) behave the same.
The proposal of http://article.gmane.org/gmane.comp.version-control.git/243921 can be discussed.
But a user of Git Gui will use the function generally to the correction of the current commit. '--reset-author' is then the correct behavior.
For a 'git commit --amend' (without '--reset-author') in Git Gui we should introduce rather be a new option (like: 'Amend Last Commit and preserve comitter information').
Changes must be done in the master repository of Git Gui.
@fourpastmidnight
So,@bitjo, can you fix it and create a pull request? If there are tests for Git GUI, could you also write a test?
I'm sorry, but I have no development environment under Windows.
Therefore, I do not want an extremly simple change (it's just a 'git revert') submit.
I can not really test the change.
I would like to also write a test case.
But I do not know how, because interactions are necessary in the Git Gui.
How can I deal with this in a bash script?
But for this kind of discussion we should open another point.
Or send me references, as that is to do.
Hmm, no, I don't think reverting that commit is the answer. That commit clearly corrects incorrect behavior in Git GUI with respect to amending a commit. Having said that, it _appears_ (from a cursory glance) that the solution does not handle UTF-8 in the commit author fields appropriately, which would be causing the problem you're experiencing. The _real fix_ for the problem you're experiencing, then, is handling UTF-8 appropriately in Git commit author fields. NOTE: This is the first time I'm looking at the TCL language, so I'm not real familiar with how it deals with character encodings.
I see on lines 30 and 31 when they initially load the commit for amending, that they attempt to use UTF-8 as the default encoding for reading the commit blob. But when they go to actually set the various author fields starting on new line 122, there's no mention of character encoding. Again, I'm not real familiar with TCL and how it deals with character encodings.
@fourpastmidnight
Hmm, no, I don't think reverting that commit is the answer. That commit clearly corrects incorrect behavior in Git GUI with respect to amending a commit.
Why?
I have never seen a formal definition of Git Gui.
Why should this function be a 'git commit --amend' (without '--reset-author')?
In many years I have seen: Git Gui -> Commit Amend ... : This is 'git commit --reset-author --amend'.
(For not Git professionals is this a plausible behavior. Git professionals will never use Git Gui.)
We can not break with this experience in Git Gui for Windows only.
This change must possibly done in Git Gui for all environments.
The UTF-8 problem is more subordinate.
The implementations under different environments (Linux, Windows, older versions) behave not the same.
The patch in question was submitted for inclusion upstream, and when it gets accepted, your desired revert would cause the rift to deepen.
One side effect of the change for Windows since git version 2.8.2 is that UTF-8 characters in the author's name are not handled correctly.
Is this not the problem you want to see fixed? I do not see how the revert is a solution to that end. Instead, it will only pile up technical debt.
@bitjo so, are you prepared to work on a real fix that does not regress Git GUI even further?
@dscho
The implementations under different environments (Linux, Windows, older versions) behave not the same.
The patch in question was submitted for inclusion upstream, and when it gets accepted, your desired revert would cause the rift to deepen.
I had not known that the patch has been submitted for inclusion upstream.
I thought that cross-platform extensions are carried out in the git repository and then taken to git for windows. I thought that only platform-dependent changes in git for windows to be performed.
Under these circumstances, is a discussion of the patches on this issue, of course, wrong.
My above comments about the patch should be ignored.
This issue should then only treat the UTF-8 problem.
@bitjo so, are you prepared to work on a real fix that does not regress Git GUI even further?
I'm not familiar with TCL and the git gui implementation.
But I can help to test.
I’ve experienced the same behavior of Git GUI as described by @bitjo. Moreover, Git GUI seems to keep cached commit information (date and wrongly encoded author) and put it to any ongoing commit (not limited to amends) done in Git GUI until I relaunch it.
@Melebius do you have any experience with Tcl?
@dscho A very little one, only with maintaining Expect scripts. I don’t understand why @bitjo and I as the affected users are asked for a fix and not the original author.
@Melebius do you know who the original author is? It's not me.
In fact, the original author of Git GUI was swallowed by the black hole known as Google. It will come as no surprise to you that he will never fix this.
The thing is: this is Open Source. You get to use the software. Just like that. You can install it on as many machines as you want. You do not pay for it. You are never asked to support the lives of those who wrote the software. You will never contribute to their being able to pay their rent, as low as it might be. And you do not get to tell them what they should or should not do.
If you are interested in seeing this fixed, and if you are prepared to do a little something to that end yourself, let me know, then I will invest some time to point you to the correct code location.
@dscho I meant the author of the discussed commit 30395c6. It was not made a long time ago. However, I might be able to do some work on it if time permits. In the meantime, I would like to ask you to defer the incorporation of the faulty commit into upstream. You mentioned it may happen and I am not familiar with the process of submitting and reviewing patches in the Git project.
I think that problem is in passing author name as set env(...) from Tcl.
Someone should test if changing this to something like this (line 395)
if {[catch {set cmt_id [eval "GIT_AUTHOR_NAME='$author_name'; " git $cmd]} err]} {
would fix this.
I think that problem is in passing author name as set env(...) from Tcl.
Someone should test if changing this to something like this (line 395)
if {[catch {set cmt_id [eval "GIT_AUTHOR_NAME='$author_name'; " git $cmd]} err]} {
would fix this.
This doesn't work.
I wonder why nobody tagged me here, I just stumbled upon this bug report.
I'll try to fix it.
I wonder why nobody tagged me here
Probably because you not associated your e-mail orgad.shaneh[.at.]audiocodes.com with your GitHub account 😉
hmm... I've been trying to investigate this for 2 days. It looks like an inherent bug in TCL.
I have a test script that demonstrates the problem:
#!/usr/bin/env tclsh
encoding system utf-8
# Groß
set a [encoding convertfrom utf-8 [binary decode hex 47726fc39f]]
set env(GIT_AUT) $a
puts [binary encode hex $env(GIT_AUT)]
puts $a
puts [encoding convertto utf-8 $a]
puts [exec env | grep GIT_AUT]
puts [exec env | grep GIT_AUT | hexdump -c]
The output of this is:
47726fdf # good
Groß # good
Groà # double conversion?
GIT_AUT=GroÖ³ # garbage
0000000 G I T _ A U T = G r o 326 263 302 237 \n # garbage
0000010
If I remove encoding system utf-8, then all I get is a single question mark (Gro?). If I add it then it looks like TCL sets the environment variable correctly, but over-translates it on exec...
It looks like there's no way to pass a utf-8 string in an environment variable.
Does anyone has a suggestion?
H Orgad,
Maybe https://www.tcl.tk/doc/howto/i18n.html which says:
The system encoding is the character encoding used by the operating system for items such as file names and environment variables. Text files used by text editors and other applications are usually encoded in the system encoding as well, unless the application that produced them explicitly saves them in another format (for example, if you use a Shift-JIS text editor on an ISO 8859-1 system).
Tcl automatically converts strings from UTF-8 format to the system encoding and vice versa whenever it communicates with the operating system. For example, Tcl automatically handles any encoding conversion needed if you execute commands
...
Aside: I just did the thing of googling for the final "string of frustration" which is often the best summary of the search question. I find that if I hit 'send' then when I read back the posted email I get a new impetus...
The quoted section _looks_ to explain how TCL massages the i18n strings.
Hope it helps
Philip
----- Original Message -----
From: Orgad Shaneh
To: git-for-windows/git
Sent: Wednesday, September 28, 2016 8:38 PM
Subject: Re: [git-for-windows/git] commit problem with Git Gui / Windows7 / UTF-8 (#761)
hmm... I've been trying to investigate this for 2 days. It looks like an inherent bug in TCL.
I have a test script that demonstrates the problem:
encoding system utf-8
set a [encoding convertfrom utf-8 [binary decode hex 47726fc39f]]
set env(GIT_AUT) $a
puts [binary encode hex $env(GIT_AUT)]
puts $a
puts [encoding convertto utf-8 $a]
puts [exec env | grep GIT_AUT]
puts [exec env | grep GIT_AUT | hexdump -c]
The output of this is:
47726fdf # good
Groß # good
Groà # double conversion?
GIT_AUT=GroÖ³ # garbage
0000000 G I T _ A U T = G r o 326 263 302 237 \n # garbage
0000010
If I remove encoding system utf-8, then all I get is a single question mark (Gro?). If I add it then it looks like TCL sets the environment variable correctly, but over-translates it on exec...
It looks like there's no way to pass a utf-8 string in an environment variable.
Does anyone has a suggestion?
--
lmgtfy "tch pass a utf-8 string in an environment variable" ;-)
@PhilipOakley Thanks for your help. I already read this documentation, that's why I tried to use encoding system utf-8.
Still, this doesn't work as expected. I tried many permutations of encodings and conversions.
If you're able to find a way that works, share it please. The expected output for the last line is:
0000000 G I T _ A U T = G r o 303 237 \n
@patthoyts any ideas?
Okay, after a couple experiments and a couple of web searches, I came to the conclusion that one should never set the system encoding. Period. It simply does not do what you expect it to. From https://www.tcl.tk/doc/howto/i18n.html:
The system encoding is the character encoding used by the operating system for items such as file names and environment variables.
If you change the system encoding in Tcl, you do not change the _system_ encoding. just Tcl's idea of it. Not a good idea.
It also appears that the results are different depending whether we run in Mintty or in a Win32 console...
My guess is that we should make sure that the environment variables are set using SetEnvironmentVariableW()...
@dscho, being related to Tcl/Tk development some time ago, I concur that you should never touch encoding system of a running interpreter.
Basically, the Tcl's idea of string processing is quite _modern:_
UTF-8 but the interpreter is free to convert strings internally to UTF-16; this all happens transparently).UTF-8 or cp1250 etc) or decode it from that format--to that internal string representation.GetACP()) on Windows), and it can be changed to any other encoding or to binary which is "identity" encoding. CRLF translation can also be specified, and by default it's what's sensible for the underlying platform.This is a problem in the handling of the Windows environment in Tcl.
The environment is read when the interpreter is initialized from the C runtime provided _environ array. This is unfortunately all using the narrow character encoding so these are encoded using the system encoding. Any unicode character that cannot be represented in the system encoding will be lost.
Tcl needs to be updated to use the unicode environment on Windows. A test using an extension to try this out (tclenv) shows this should work once a patch to tcl is ready. This will need a bit of a rework in this part of Tcl as it doesn't currently do any per-platform environment initialization.
C:\Code\tcl.git\win>set AA=мир
C:\Code\tcl.git\win>tclsh
% set env(AA)
???
% load tclenv.dll
% dict get [env::get] AA
мир
% load tclenv.dll % dict get [env::get] AA
Or should we introduce an abstraction layer for setting environment variables in Git GUI and use tclenv.dll for Windows? That would fix the issue even with current (or older) Tcl/Tk version, yes?
This is just a test extension, it doesn't correctly maintain the 'env' variable linkage so could lead to confusion without extra works to keep things in sync. As your main concern is git-for-windows, we really want a patched or fixed version of wish in the git-for-windows distribution.
I'm rather surprised this hasn't been raised before since we went all unicode in 2000.
The changes in #917 will already help with this problem. It will work as long as the information we put into the environment can be represented in the system encoding. My system uses Windows-1252, and the example given by @bitjo, using Groß as user name, works with the proposed changes.
Git GUI seems to keep cached commit information (date and wrongly encoded author) and put it to any ongoing commit (not limited to amends) done in Git GUI until I relaunch it.
@Melebius : I am experiencing this too, not just with Git GUI, but also when I do rebase from Git Bash (the mintty one), and possibly elsewhere as I have not yet tracked down the exact chain of events leading to the errors in my project database (it tends to be discovered some undefined time after the fact). And it is not only wrongly encoded author, but in fact sometimes wrong author(!)
This is however a critical flaw, as it means that the contents of the database cannot be trusted to show the true history. And as such it should be reported in a separate issue, as I think it will be overlooked in this one.
Do you know if this have been done already? I tried searching for it, but found nothing.
@superole : Are you sure you don't have GIT_AUTHOR_NAME environment variable set?
@orgads : yes, I am quite sure. I use the config settings user.name and user.email. There is no GIT_AUTHOR_NAME env var on my system currently, but I guess git could have set it at the time of the error and failed to clean up after. Although I fail to see any good reason why git would tamper with that variable.
But as I said this is a separate, and possibly unrelated bug, and should be reported as its own issue.
@orgads _Are you sure you don't have GIT_AUTHOR_NAME environment variable set?_
IMHO that’s what the commit 30395c6 does and what I was trying to point out. See new lines 384–386 of commit.tcl.
@superole Agree. However, I haven’t filed any separate issue on that, the best matching I found was this one. Do you launch Git GUI from Git Bash? I usually launch Git GUI directly using Explorer’s context menu and run Git Bash (or even Git in cmd.exe) separately, so the issue does not affect CLI operations in my case.
@Melebius : yes, I normally launch it as a backround process from git bash ($ git gui&), and keep it open.
Aha, so Git GUI may have altered my bash environment, and thereby messing up for my CLI operations...
This seems like really bad behaviour to me.
@orgads : patthoyts commit cfe616b ammends your pull request to the git-gui project to fix this
by ensuring the environment variables are reset
and the author information reset once the commit is completed.
should this fix not also be included here?
should this fix not also be included here?
The fix, as I understand it, should be in the next G4W (the aforementioned 'here'?) release.
The cascade from the git-gui project to the Git project to the G4W project does take a finite time ;-)
Philip
The cascade from the git-gui project to the Git project to the G4W project does take a finite time ;-)
Yes, and it is very possible to help. For example by checking out the latest of the prereleases at https://github.com/git-for-windows/git/releases/ whether they have the fixes, and by preparing appropriate PRs if they do not.
FYI this issue reproduces for me on Linux. Git gui "amend" breaks Unicode characters in the Author field. git-gui version 0.20.0.44.gccc98 git version 2.11.0
@ilor
Logically that the error is now platform-independent.
Since Git 2.11 the broken commit (see https://github.com/git-for-windows/git/issues/761#issuecomment-220792200) is contained in platform-independent repos.
My workaround under Windows (as root / administrator in Git-Bash):
curl https://raw.githubusercontent.com/git/git/v2.10.2/git-gui/lib/commit.tcl> /mingw64/share/git-gui/lib/commit.tcl
The path /mingw64/share/git-gui/lib/ is platform dependent.
The workaround provides the file git-gui/lib/commit.tcl from Git version 2.10.2.
With the workaround, you can use git-gui as before.
I have the same as @ilor with git-gui version 0.21.GITGUI and git version 2.14.1 on Ubuntu Linux.
Relevant bits of my ~/.gitconfig:
[user]
name = Sybren A. Stüvel
email = [email protected]
[color]
ui = auto
[gui]
encoding = utf-8
As it has been determined that the issue is not Windows-specific, I'll close this ticket, suggesting to move the discussion to the Git mailing list (no HTML, not even alternate part, just plain text, otherwise the mail will be rejected).
Most helpful comment
FYI this issue reproduces for me on Linux. Git gui "amend" breaks Unicode characters in the Author field. git-gui version 0.20.0.44.gccc98 git version 2.11.0