git status returns different results when core.fscache enabled

Created on 1 Mar 2016  路  19Comments  路  Source: git-for-windows/git

  • [x] I was not able to find an open
    or closed issue
    matching what I'm seeing

    Setup

Win10, x64
git version 2.6.3.windows.1
Default install as far as I recall
Problem repros in all shells

Details

  • Clone https://github.com/latkin/filetest

    • foo:bar file has invalid Windows filename. It is copied as size-0 file foo.

    • Add foo to gitignore

  • With core.fscache false, working directory is shown as clean by git status
  • With core.fscache true, file foo:bar is flagged with deleted status.

Expected: Working directory status is reported the same with/without core.fscache, in all cases.

Motivation: I work on windows but most of my team is on OSX. They put files named like foo:bar in the repo. I can happily ignore them with a private gitignore when core.fscache is turned off, but they pollute my working directory when I try to use core.fscache true (which I'd like to do for perf reasons).

Most helpful comment

I am experiencing a similar problem when using sparse checkout. Is it the same or should I open a new issue?

core.sparseCheckout is set to true.
core.fscache can be either true or false
core.preloadindex can be either true or false
git version shows git version 2.9.0.windows.1
A sparse-checkout file is used to exclude a folder with many small files in it.

I checkout a branch. git status shows no changes. I cherry-pick a commit from a different branch. The cherry-pick succeeds without needing to correct any conflicts. Now running git status shows a deleted file for every single file which was excluded by the sparse-checkout file. None of these files were changed by the cherry-picked commit.

Running git reset --hard HEAD at this point (since there are no actual changes and it's fine to reset) works as a workaround to fix the broken git status results.

Does this sound like the same problem?

edit: Sorry, after reading through the comments it seems obvious that this doesn't have the same root cause. Not sure why I posted this here. Opened a new issue #811

All 19 comments

@latkin gitignore is only for untracked files. Use git update-index --assume-unchanged foo:bar instead, see git-update-index for more information.

The behavior with core.fscache on seems to be correct, there is no file foo:bar, so it is reported as deleted. That being said, having git deal with paths in your tree, that it can't represent in the working copy seems futile.

git update-index --assume-unchanged solves my problem perfectly, thank you for the pointer.

I will be on my merry way, but it does seem like something to investigate that fscache true/false changes results here. FWIW I agree the fscache=true result seems more accurate in this case.

Poked around a bit and see where the root cause is (but not how to fix it).

For whatever reason, _wfopen creates files on Windows even if there are bogus chars in the path (at least :, haven't tested others), and you can write content into such file.

If you look up such a file by full path with GetFileAttributesEx (which is what mingw_lstat does), Windows tells you the file is there, has nonzero length, etc. Thus in fscache = false case, Windows says the file is there so git does not report it as deleted.

If you instead enumerate files in a directory via FindFirstFile/FindNextFile (which is what fscache_lstsat ultimately does), the file is _not reported_. Thus with fscache = true, the file is reported as deleted.

There is also a non-hidden file that's created, matching the filename before the : char. e.g. create file foo:bar and a standard file foo is created with 0 content, and also a phantom file foo:bar that's invisible to FindNextFile

Alright, finally have real clarity. Git for windows doesn't currently handle Alternate File Streams correctly, this issue is simply a compilcation due to that.

File paths in Windows can't truly contain a : character, but : can be used in a path-like way when calling various system APIs, in order to access alternate streams. e.g. fopen("C:\\src\\foo:bar") opens the bar stream of the file C:\src\foo. The file c:\src\foo will be created if it doesn't already exist.

As mentioned above, if you call APIs like fopen, GetFileAttributesEx, etc using such a stream-qualified path, they will not report an invalid file name, they will simply start working with the custom stream. An API like FindFirstFile will only return proper file names, i.e. stream-unqualified. Thus there is inconsistency between behavior of fscache true/false.

Without being too expert on mingw or git internals, it seems to me like it would make sense for the various mingw_XYZ and fscache_XYZ functions to check for : in filenames that could be misinterpreted as a stream qualifier, and report the same error as if it was an invalid file path. I don't see any scenario where direct access to alternate file streams is the right behavior in git.

I'm happy to provide such a patch if this sounds right.

We already disallow tracking such illegal file names in Git for Windows, so I do not really see why we need to care all that much: you are talking about untracked files which are completely uninteresting to Git until the time when they are staged.

Please read this as my vote to close this ticket as WONTFIX.

I think the original problem is that the 'file with colon' is already in the repo he is trying to use - see the original

Motivation:I work on windows but most of my team is on OSX. They put files named likefoo:barin the repo. I can happily ignore them with a privategitignorewhencore.fscacheis turned off, but they pollute my working directory when I try to usecore.fscache` true (which I'd like to do for perf reasons).'

However, he has an existing workaround, which isn't great but works, so it may be useful if it's possible to give @latkin a pointer as to if his line of reasoning as to a suggested 'fix' (to the particular problem of ignoring those colon-files under core.fscache true) is in the right direction (it's his itch to scratch after that ..)

@latkin Seeing as you have a possible solution, a draft pull request may be a suitable method of exploring the issue (of ignoring colon-files when core.fscache is true). At least then @dscho can see if it is a small tweak, or a major code churn, and how far it reaches.

@PhilipOakley yes, that's correct - the issue is mysterious/confusing behavior for unsuspecting Windows users when working with repos of unix origin. I'm not concerned with malicious/determined Windows users who are deliberately trying to do bogus stuff.

Consider this toy repo: https://github.com/latkin/filetest It contains 3 files: foo:bar1, foo:bar2, and foo:bar3.

If you clone that repo today on Windows, the clone succeeds and no errors are reported. However there are problems:

  • A mysterious untracked, size-0 file foo has been created
  • Although none of the actual files has been created, for some reason nothing is reported as deleted

    • Until you change to core.fscache = true and now suddenly all 3 files are reported as deleted

This is very confusing and unintuitive. Expected experience is to simply report all 3 files as having invalid paths.

I've sent a proposal PR which is very tiny and fixes this for the 99% case - it simply blocks checkout of files with : in the name. Malicious windows users can still screw with things by deliberately creating alternate streams, but that's probably not worth worrying about.

----- Original Message -----

@PhilipOakley yes, that's correct - the issue is mysterious/confusing behavior for unsuspecting Windows users when working with repos of unix origin. I'm not concerned with malicious/determined Windows users who are deliberately trying to do bogus stuff.

Consider this toy repo: https://github.com/latkin/filetest It contains 3 files: foo:bar1, foo:bar2, and foo:bar3.

If you clone that repo today on Windows, the clone succeeds and no errors are reported. However there are problems:

a.. A mysterious untracked, size-0 file foo has been created 
b.. Although none of the actual files has been created, for some reason nothing is reported as deleted 
  a.. Until you change to core.fscache = true and now suddenly all 3 files are reported as deleted 

This is very confusing and unintuitive. Expected experience is to simply report all 3 files as having invalid paths.

I've sent a proposal PR which is very tiny and fixes this for the 99% case - it simply blocks checkout of files with : in the name. Malicious windows users can still screw with things by deliberately creating alternate streams, but that's probably not worth worrying about.

I think the commit message is missing the original "why" explanation that in the wild, in mixed communities, G4W should be able to cope with repositories where files-with-colons are already present.

The commit message should also cover the "how to use it" after the fix - does it still require a local gitignore to be set?, and confirm the patch fixes the core.fscache=true issue (and it still works when false).

Finally, as an aside, have you considered the sparse checkout options to mark those files as --assume-unchanged, so they can be in the index, but not in the file system (most descriptions on the web get that option wrong - it's actually about personal FS 'memory' management).

If you amend the commit message you can force push the commit to github and re-use the existing PR - this keeps the thread together.

If some of the confirmatory points don't need to be in the main part of the commit, then stick them after a three dashes line.

Philip

Thanks @PhilipOakley, I've amended the commit message and PR comment.

The change doesn't fix the underlying inconsistency between standard lstat and fscache lstat, it simply blocks problematic files from being checked out in the first place. Not perfect, but it makes the fix considerably less invasive, while still covering the primary scenario.

A full fix would require changes to a bunch of mingw APIs, which seemed too much for an admittedly minor bug.

I am experiencing a similar problem when using sparse checkout. Is it the same or should I open a new issue?

core.sparseCheckout is set to true.
core.fscache can be either true or false
core.preloadindex can be either true or false
git version shows git version 2.9.0.windows.1
A sparse-checkout file is used to exclude a folder with many small files in it.

I checkout a branch. git status shows no changes. I cherry-pick a commit from a different branch. The cherry-pick succeeds without needing to correct any conflicts. Now running git status shows a deleted file for every single file which was excluded by the sparse-checkout file. None of these files were changed by the cherry-picked commit.

Running git reset --hard HEAD at this point (since there are no actual changes and it's fine to reset) works as a workaround to fix the broken git status results.

Does this sound like the same problem?

edit: Sorry, after reading through the comments it seems obvious that this doesn't have the same root cause. Not sure why I posted this here. Opened a new issue #811

Same here when checking out cBioPortal on Windows.
--> https://github.com/cBioPortal

So in case anybody wondered why I did not do anything about this ticket yet: you can thank the people who gave me the thumbs down on a wontfix, without spending any time nor effort to work toward a real fix. That was a real demotivator.

The real fix may actually be a fallout of something I try to work on, but do not get the time: it would appear that accessing paths via the \\?\ name space can actually be faster than via plain files, as the \\?\ name space sidesteps the Win32 file name checks (including the test for a colon, IIUC). This would allow Git for Windows to write those files as a side effect of a performance enhancement.

The major holdup in that endeavor is that the current core.longpaths feature (which does use the \\?\ name space to create files with an overly long path), which would almost provide everything needed for that performance improvement, uses a slow Win32 API to normalize the file path first, which goes against the goal of speeding everything up.

Something to keep in mind though is that even if git does successfully create "foo:bar" (using "\?\" or whatever), you're still likely to have problems if the other tools in your tool chain don't support them too. I'm thinking here about your compiler, editor, make/build scripts, and etc.

It may be better to spend the time educating the Mac folks.

Something to keep in mind though is that even if git does successfully create "foo:bar" (using "\?" or whatever), you're still likely to have problems if the other tools in your tool chain don't support them too. I'm thinking here about your compiler, editor, make/build scripts, and etc.

Right. I still want to support core.longpaths = false (maybe with a synonym that explains better what this does), but I would like to change the default to true. Granted, many tools do not handle those files gracefully. But that's not Git for Windows' problem, and hence it does not add to my maintenance burden anymore :grin:

It may be better to spend the time educating the Mac folks.

I do not believe that will work.

Just wanted to say this appears to still be an issue. I was going through my usual "I know how to solve git problems" troubleshooting (ahem rm -rf and reclone) but it persisted. I could not reproduce it on my linux vm doing the same "troubleshooting", just to make sure I wasn't crazy. Went to update git-for-windows and found a new build cut just 3 hours ago and am still encountering this.

It would be nice if there is not a long term fix to add something which detects and warns users of this behavior. I realize it's not strictly gits "job" to warn users like this, but it's certainly unexpected behavior for most users.

Thanks! I do really like the availability of git-for-windows, it certainly helps my workflow out.

$ git --version
git version 2.19.1.windows.1

@h110hawk Would you have any spare time to help with curating this issue? Even if it is just nailing down a specific aspect of the issue to allow the multiple views to be cross-compared.

A 'good' solution is better than one that only resolves the issue(s) for half the users. The Windows compatibility (relative to Linux expectations) is part of the problem.

I still want to support core.longpaths = false (maybe with a synonym that explains better what this does), but I would like to change the default to true.

In the meantime, I performed some testing and it seems that the performance benefits I have come to expect did not materialize. Worse, since Git for Windows typically works on relative paths, switching to longpaths=true would impact performance negatively.

Coming back to the reported problem, I would tend to mark this as a "Known Issue". And the underlying issue is not that git status returns different results when core.fscache is enabled vs when it is disabled, but the underlying issue is that Git thinks it just wrote a file called with:colons when it wrote a stream called colons for a file called with.

  • With core.fscache false, working directory is shown as clean by git status

  • With core.fscache true, file foo:bar is flagged with deleted status.

In the next version, foo:bar will trigger an error: the colon is not a legal character in a file name on Windows.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

daxelrod picture daxelrod  路  4Comments

tldzyx picture tldzyx  路  3Comments

jesterKing picture jesterKing  路  3Comments

educhana picture educhana  路  5Comments

yegorich picture yegorich  路  3Comments