Git: Commiting files larger than 4 GB

Created on 15 Feb 2017 · 37Comments · Source: git-for-windows/git

[x] I was not able to find an open or closed issue matching what I'm seeing

Setup

Which version of Git for Windows are you using? Is it 32-bit or 64-bit?

$ git --version --build-options

git version 2.11.1.windows.1
built from commit: 1c1842bcba45569a84112ec64f72b08eb2d57c68
sizeof-long: 4
machine: x86_64

Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?

$ cmd.exe /c ver

Microsoft Windows [Version 6.1.7601]

What options did you set as part of the installation? Or did you choose the
defaults?

# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Path Option: Cmd
SSH Option: OpenSSH
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
Enable Builtin Difftool: Enabled

Any other interesting things about your environment that might be related
to the issue you're seeing?

Details

Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

CMD

What commands did you run to trigger this issue? If you can provide a
Minimal, Complete, and Verifiable example
this will help us understand the issue.

git commit
del filename
git checkout .

I was managing my large files with the git-lfs-extension. Some of them were more than 4GB in size. After deleting one of those files from my working tree and do a normal git checkout I ended up with a somehow crippled file with a size of only 46 MB left.

For testing reasons I tried to commit a 4,3 GB file to my git repository without the LFS extension.
After deleting that file from the working tree and checking out again, I expected the 4,3 Gb file to
be present again. Intead I ended up with the same small file.
Seems like the file was never committed correctly. The .git directory is about 100 MB in size.
Reinstalling Git and changing machines did not change the issues.
Files smaller than 4GB are not affected.
After that I tried to search the gitconfig for some settings realted to 64-bit. I found the core.gitPackedLimit, which should default to 8GB on 64-bit systems. I manually set it to 8g myself. Git told me that the value is out of range. Only after setting it to a value smaller than 4 GB I could use git normally again.

If the problem was occurring with a specific repository, can you provide the
URL to that repository to help us with testing?

Issue is not repository-specific

bug git-upstream

Source

elmorisor

👍7 ❤1 🎉1

Most helpful comment

Dear dscho
It appears to me that you are frustrated about this. I can understand that. But open source does not only work by "if I want a bug to be fixed, I indulge in whatever project and do it myself". If some people with experience (like you) maintain a project others should not be put down because they politely ask for a bugfix. We all have our own projects to maintain and put effort in it for others should not be forced to do that work themselves. But you can run that, as you like and that includes closing a ticket that is still not fixed.
Thx for your comments

JohnFrampton on 1 Mar 2019

👍4

All 37 comments

The memory address room does not imply whether large files are supported or
not. 32 bit processes can very well handle files larger 4gb if the
developer decides to implement such :). You anyway would not try to load a
whole file into memory (would not scale well), but rather operate on chunks.

So the question is: Does the 32bit git for Windows support large files?

On Feb 15, 2017 10:45 AM, "J Wyman" notifications@github.com wrote:

From the data you provided:

$ git --version --build-options

git version 2.11.1.windows.1
built from commit: 1c1842bcba45569a84112ec64f72b08eb2d57c68
sizeof-long: 4
machine: x86_64

It appears that you are using a 32-bit version of Git. 4-byte longs can
address 4 GiB of memory, which is the most likely source of your problem.
Have you tried a 64-bit version of Git? If so, does your problem still
reproduce?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/git-for-windows/git/issues/1063#issuecomment-280046911,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHJ-yPM0jEWRMVJVYXnyNkQFWhMHbbkyks5rcx2IgaJpZM4MBvCT
.

mfriedrich74 on 16 Feb 2017

👍1

I used the 64-bit installer from git-for-windows... thought I'd get a 64-bit git with it:
see my output: machine: x86_64

elmorisor on 16 Feb 2017

👍1

That was my mistake, I misread the output of the git --version --build-options. I realized it nearly immediately (hence the deletion of the post). Apologies about that. 😔

whoisj on 16 Feb 2017

👍1

@elmorisor @whoisj the red herring was sizeof long: 4. It is perfectly legitimate for 64-bit compilers to define the long type as 32-bit, and that is the case for GCC on Windows (which Git for Windows uses to compile the source code).

The problem is the Git source code, which uses unsigned long in places where size_t would be correct. I think that that is the issue here.

dscho on 17 Feb 2017

👍2

I have also hit this problem.
Originally I raised it as a BitBucket support issue, then Git-LFS (https://github.com/git-lfs/git-lfs/issues/2434).
size_t may also be platform specific - __uint64 or longlong?

obe1line on 25 Jul 2017

size_t may also be platform specific

Yes, it most certainly is. On 32-bit platforms, for example, you simply cannot map 4GB files into memory via mmap() (or Windows' equivalent). On those platforms, you still can read and write such large files, of course, using off_t as data type.

dscho on 25 Jul 2017

Would really appreciate a fix for this.

polygonica on 7 Nov 2017

@polygonica I warmly welcome you to work on this. (While it may seem convenient to expect others, including myself, to fulfill your wishes, it rarely works.)

There is already code to stream large objects (so that they do not have to be mapped into memory), and it should be possible to at least fall back to that option in git add.

dscho on 7 Nov 2017

Just to confirm, this is only reproducible on 32 bit builds correct?

@dscho I can take a stab at it this weekend

isometric on 7 Nov 2017

Just to confirm, this is only reproducible on 32 bit builds correct?

@isometric I am not really sure, as I have not followed the recent developments on the size_t vs unsigned long issue closely. It used to be the case, and may still be the case, that Git internally handles memory buffers using unsigned long (which is 32-bit even in 64-bit Windows). If that has not changed, you will likely run into the same issues with 64-bit Git for Windows.

As a quick first glance, you may want to run git grep EXPENSIVE in git.git's t/ subdirectory (I usually do that with the -O option to see the files in the pager, so that I can scroll back and forth to see more context). Some of them expensive tests work on large files. Other prereqs related to large files are LONG_IS_64BIT and TAR_HUGE.

It is always worth a look to see whether Git's test suite has something related, because then it is relatively easy and quick to run a test to validate possible fixes (or prove that they don't fix the issue).

I also stumbled across the plug_bulk_checkin() function yesterday, which you may want to have a closer look at when you take that stab this weekend (for which I am very grateful!). I could imagine that it solves at least part of the problem reported in this ticket.

dscho on 7 Nov 2017

Had a number of power issues in the neighbourhood this weekend so didn't get a chance to take a look. I'll try to find some time next weekend.

isometric on 14 Nov 2017

BTW, I was able to hash-object/cat-file a 5GB blob successfully w/ git in Ubuntu on Windows. It turns out that hash-object produces the same (correct?) result for but linux/Windows, but cat-file fails on Windows only. I'm using 64-bit for both versions of Git.

I used this repro instead:

git hash-object -w --no-filters M:\tmp\5gb.bin
ecc0720b2a71b74c0980dbdf31556097355883ef

git cat-file -p ecc0720b2a71b74c0980dbdf31556097355883ef > m:\tmp\ignore2.bin
error: bad object header
fatal: Not a valid object name ecc0720b2a71b74c0980dbdf31556097355883ef

congyiwu on 5 Dec 2017

It looks like this error boils down to unpack_object_header_buffer() which uses unsigned longs everywhere instead of size_t's. In Windows, this is 32-bit but in Linux it is 64, hence the difference.

There are many places in Git that use unsigned long for a size.

derrickstolee on 6 Dec 2017

It looks like this error boils down to unpack_object_header_buffer() which uses unsigned longs everywhere instead of size_t's.

Right. And Git also uses unsigned long in other places where off_t would be appropriate. There is no excuse for that, really.

dscho on 3 Jan 2018

Can I do anything to help? Is this a bug in git for windows, or does it need to be fixed upstream? This bug prevents me not only committing 4GB files directly, but also trying to use LFS.

ksulli on 24 Apr 2018

Can I do anything to help?

@ksulli help is always welcome. Please note that there had been a couple patches flying about on the mailing list, to try to address the unsigned long vs off_t/size_t issue (which is mostly at play here).

However, for the concrete purpose of resolving this here issue, I think there is some sort of streaming mode available in the internal Git API. That would make it possible to, say, generate an object larger than 4GB via git hash-object -w --stdin and everything should work correctly. The trick will then be to activate that mode automatically when calling git add.

How's your C fu?

I see that there is already something called big_file_threshold in index_fd(): https://github.com/git-for-windows/git/blob/918fa5c06c9d7f5e8a2d980e4e2744c63b1d7cbd/sha1_file.c#L1900-L1922

So the trick would be to first test whether it works now, and if it does not, investigate in that code (possibly using a debugger and/or inserting debug statements) where things go south.

If you need to debug this, that is really easy: install the Git for Windows SDK (it'll clone about half a gig worth of Git objects, though), then call sdk cd git, edit the Makefile therein and delete the -O2 from the CFLAGS, then run make -j15 install. After that, you should be able to test this using gdb.

Please let me know when/where you get stuck.

dscho on 25 Apr 2018

Thanks for the pointers, I'm a bit rusty with compiled languages in general but this issue really irks me so I'll do my best.

ksulli on 27 Apr 2018

Thanks! As I said, any help is welcome. If you get stuck, just holler (and provide details ;-)).

dscho on 27 Apr 2018

I also have a problem on windows with files >4 GB via Git LFS.
From the Git lfs-thread I learned, there is a unsolved problem in the git engine on windows which causes the big-file-problem see here https://github.com/git-lfs/git-lfs/issues/2434

Can you please tell me, when we can expect a bugfix for that in git?

JohnFrampton on 7 Nov 2018

It really needs someone to help the upstream git with the migration to a streaming interface, if I understand dscho's well informed comment above

If you are able to help with coding that would be great. (many codez make all issues shallow ;-)

PhilipOakley on 7 Nov 2018

Note in the referenced git lfs issue there is a workaround for using >4GB files with git-lfs on windows. It is just a slight change in workflow for those that can't wait or don't have the time to fix directly.

aggieNick02 on 6 Dec 2018

long vs size_t difference doesn't matter, because neither of them should be used for file sizes or offsets. Instead, off_t must be used. Indeed, off_t used all throughout the code for this purpose. If there's a case where long or size_t is incorrectly used for file size or offset, it must be changed to off_t

alegrigoriev on 22 Dec 2018

There is plenty of discussion on the upstream mailing list about the issue of the size of various types on different systems, and their incompatibilities. The archive https://public-inbox.org/git/?q= is probably the most useful one for searching.

PhilipOakley on 22 Dec 2018

👍1

Can you please tell me, when we can expect a bugfix for that in git?

I think it might be this mindset that turned the discussion in this ticket away from a useful course: if you want something, you gotta put some effort behind it, not just wait for others to miraculously fulfill your wishes without getting anything in return.

So I'll close this ticket, and let those who are putting in more effort than mere words (you know who you are) be active elsewhere (you know where), being grateful for it (you know I am).

dscho on 27 Feb 2019

👎1

I'm sorry some users either don't realize or don't appreciate that much of the work on git-for-windows is done by volunteers. I think we lose a lot by closing this issue though - it is still an issue, it contains information about the root cause, and it is linked to as the cause of a git-lfs bug (https://github.com/git-lfs/git-lfs/issues/2434). Would you consider reopening? Perhaps someone will pick it up someday (perhaps even me); while closing it may send a message, I also think it will create a good bit of confusion from folks watching the issue or dealing with it.

aggieNick02 on 27 Feb 2019

extending on previous comment, try
https://public-inbox.org/git/[email protected]/

The current code for detecting zlib decode length errors is full of poorly defined behaviour because the up/down casting of the different variable types on different architectures produces different results (as opposed to undefined behaviour..).

I expect that some 'C language lawyer' action is needed to cast the zlib stream length to ptrdiff and then use that (ptr arithmetic) ubiquitously to get consistent results on all platforms.

I think the git_lfs link is a red herring because it fails to get to the bottom of the problem for systems where Windows can handle proper 64 bit addresses.

PhilipOakley on 28 Feb 2019

I think we lose a lot by closing this issue

I disagree. The valuable technical discussion with people following up with patches was not happening here. There are people putting their money where their mouth is, making sure that their wishes come true by putting some energy and effort behind it. Just not here. So: Let's just draw the curtain of charity over the rest of this ticket, and let it rest in peace.

dscho on 28 Feb 2019

👎6 👍1

I understand the frustration, but following that logic means all the real issues that aren't seeing active investigation and/or fixing should be closed. Is that the plan moving forward?

To someone who experiences the bug and ends up here via google, etc, there will be confusion. They'll think "oh, this is a known issue, cool - wait, it's closed - why am I still encountering it"?

It would help users if something visible about the issue (perhaps title) could at least be updated to indicate that this issue is not fixed and users should not have any expectation that it ever will be.

aggieNick02 on 28 Feb 2019

👍4

@aggieNick02 are you really trying your best to bind our time here? Is that what you want? To keep talking, talking, talking, and not get anything done?

It would help users if something visible about the issue (perhaps title) could at least be updated to indicate that this issue is not fixed and users should not have any expectation that it ever will be.

I am totally not on board with this idea. Why? Because it makes you feel that you are a strict user and not responsible for anything while others should do all that.

How about getting involved instead? How about you update this ticket with the progress? How about you pay attention to the discussion on the mailing list, summarizing where the progress is at?

That is easily something you can do. And something that takes away the burden from others. Rather than piling and piling even more responsibility on those few who take care of the issues you want to see resolved. Or better put: trying to pile, because really, it is not the responsibility of anyone to take care of your wishes, not if you do not give them money or time or anything in return.

So: while I see what you are saying about the confusion and about opaque progress, I have to point out that this is a community effort, and if you choose not to be part of that effort, you have no say in how it is run. If you choose to be part of the effort, your contributions will be appreciated. And even better: you can then have what you want, because you make it so.

dscho on 1 Mar 2019

👎3

JohnFrampton on 1 Mar 2019

👍4

Dear @JohnFrampton thank you for speaking up. However, your speaking up does not help getting the issue at hand resolved, does it? What can you do to help?

dscho on 1 Mar 2019

Well I downloaded the code and have a look and have to find out what I understand and how I can deal with that. I will give it a try. But currently i'm payed to work on something else, so ... lets see ...
I will report as soon as I have anything achieved.

JohnFrampton on 1 Mar 2019

Well I downloaded the code and have a look and have to find out what I understand and how I can deal with that.

That's good. Now let's also get you into the conversation with the people who are already working on this: please head over to https://github.com/gitgitgadget/git/pull/115.

dscho on 1 Mar 2019

Hi just for others to know this issue still occurs in git for windows 2.29.2 - the linked gitgitgadget#115 is also closed but as far as I can tell not "complete" - it links to this which is still open however been quiet for over a year

srothery on 9 Dec 2020

Thanks for the update @srothery . If you want/need to work with larger files on windows, it is possible, but involves workarounds. There is a bit of discussion at https://github.com/git-lfs/git-lfs/issues/2434, with the workaround explained in a post there by @technoweenie.

It isn't perfect, but it is workable. We run a self-hosted git-lfs server with >4GB files both committed from and pulled to windows machines.

aggieNick02 on 9 Dec 2020

Thanks @aggieNick02 - @technoweenie 's fix was to do with the smudge filter - should I also disable the clean filter too? If I do both of those does that mean for the whole repo all lfs files won't go into my working folder but the .git/lfs/objects right? I was looking to see if I could disable smudge/clean just for my files that are >4GB but can't spot examples or hints if this is possible.

srothery on 9 Dec 2020

So it's been a little while since I've configured this, but here's what I remember/have settled on:

I don't know of any way to only apply the workaround to files that are >4GB. So for me it applies to all files in git-lfs
You can't disable the clean filter. This means that after committing and pushing a >4GB file, your file will be locally malformed. You will need to then perform a git checkout of the file followed by a git-lfs pull to fix this. If you forget to do this, you'll be reminded, as git status will notice your local corrupt file is different from what it should be.
My .gitconfig has the following lfs section (filter-process has to be skipped too):
[filter "lfs"] smudge = git-lfs smudge --skip -- %f process = git-lfs filter-process --skip required = true clean = git-lfs clean -- %f