We're looking to switch from SVN to Git but are experiencing poor LFS performance on large projects.
The repo is stored on an SSD on a local GitLab Server (latest) with an E5-2623v4 and 64GB of RAM. Using top
and other commands show the hardware not being maxed out so I think it can be ruled out as the issue.
Reading through other issues, it seems most people's slow performance was fixed by Lar's code which my limited knowledge understands to be a git-lfs-smudge that runs only once, as opposed to once for every file. Is this the default behavior of the git lfs clone
command or does it need to be specified in the attributes file?
Is there any other information/testing I can provide to help narrow down this issue?
unity-macminiv7-02:PullTestMac UnityDev$ git lfs clone http://Username:Password@URL/Project ProjectFolder
Cloning into 'ProjectFolder'...
remote: Counting objects: 21075, done.
remote: Compressing objects: 100% (21038/21038), done.
remote: Total 21075 (delta 33), reused 21072 (delta 33)
Receiving objects: 100% (21075/21075), 2.94 MiB | 0 bytes/s, done.
Resolving deltas: 100% (33/33), done.
Checking out files: 100% (21220/21220), done.
Git LFS: (20726 of 20726 files) 18.70 GB / 18.70 GB
Hi @CopyRunStart, thanks for opening this issue and sorry that you're having trouble. I'm not immediately sure what's going on and causing the later transfers to take a long time, but let's dive in and see what we can find.
The first 8 GB and 1,000 files go very fast, especially over 10Gb Ethernet, operating at _near wire speed_.
The last 19,000 files on the other hand take over an hour and transfer at less than 1MB/s with small spikes to 25MB/s (to be clear, Bytes not Bits).
By the sound of it, this sounds like a memory (?) leak. If the problem is that the transfer throughput consistently and monotonically decreases over time, I think that would be an issue with some sort of generalized resource leak over time. I'm not _sure_ that this is a memory leak, since I don't think that less available memory would contribute to that significant a slowdown of the transfers, but there could be some sort of other generalized resource leak that is occurring over time.
I'd be curious about a few things:
ps aux -o rss | grep lfs | grep -v grep | awk '{ print $4 }'
)ps aux -o rss | grep lfs | grep -v grep | awk '{ print $3 }'
)lsof -p "$(ps aux | grep lfs | grep -v grep | awk '{ print $2 }')" | wc -l
) You can run these periodically with the following script:
#!/usr/bin/env bash
LFS="$(ps aux | grep git-lfs | grep -v grep | awk '{ print $2 }')";
SLEEP_INTERVAL="10";
while true; do
PCPU="$(ps -p "$LFS" -o pcpu | tail -n 1 | awk '{ print $1 }')";
PMEM="$(ps -p "$LFS" -o pmem | tail -n 1 | awk '{ print $1 }')";
LSOF="$(lsof -p "$LFS" | wc -l | awk '{ print $1 }')";
echo "$(date) - $PCPU\t$PMEM\t$LSOF";
sleep "$SLEEP_INTERVAL";
done
Reading through other issues, it seems most people's slow performance was fixed by Lar's code which my limited knowledge understands to be a git-lfs-smudge that runs only once, as opposed to once for every file. Is this the default behavior of the
git lfs clone
command or does it need to be specified in the attributes file?
I don't think that this is applicable to your situation of running git-lfs-clone(1)
directly. The process filter that Lars contributed primarily improves non overhead-related speed in checkout operations where the files are cached (i.e., switching from one branch to another). The git-lfs-clone(1)
command disables the smudge (& process) filter entirely, and performs a parallelized version of the checkout itself, so I don't think that this affects that.
Is there any other information/testing I can provide to help narrow down this issue?
I'd be curious if you could run the script above against a running clone operation and post some of the output over time.
Thanks for getting back to me so quickly @ttaylorr.
First I want to say I was incorrect about it taking _over an hour_ for the last 19,000 files. That seems to have only happened once. The rest of my post still seems to be true. The first 8GB and 1,000 files copy at wire-speed and the rest copies in the high KB/s and low MB/s range with occasional spikes to 25MB/s. (This is observed/measured in Activity Monitor on OSX and Task Manager Performance in Windows).
Does LFS process/download large files first? I'm wondering if this could be an issue of the large sequential transfers going first, and then it slowing down with the thousands of smaller files.
I'm assuming this script is to be run on the client side because ps aux | grep git-lfs
on the server didn't find any PIDs. I don't have access to a 10GbE Mac today and obviously that script can't be run on Windows but I think the results on 1GbE are still relevant. The same behavior occurs: wire speed for the first ~8GB, etc, etc.
I had to edit the script from
LFS="$(ps aux | grep git-lfs | grep -v grep | awk '{ print $2 }')";
to
LFS="$(ps aux | grep 'git-lfs clone' | grep -v grep | awk '{ print $2 }')";
as it was confused by two PIDs (git-lfs clone and git-lfs filter-process). If you need the results of git-filter-process as well, I can run two scripts.
unity-macminiv7-02:PullTestMac UnityDev$ time git lfs clone http://Username:Password@URL/Project PullTest7
Cloning into 'PullTest7'...
remote: Counting objects: 21075, done.
remote: Compressing objects: 100% (21038/21038), done.
remote: Total 21075 (delta 33), reused 21072 (delta 33)
Receiving objects: 100% (21075/21075), 2.94 MiB | 0 bytes/s, done.
Resolving deltas: 100% (33/33), done.
Checking out files: 100% (21220/21220), done.
Git LFS: (20726 of 20726 files) 18.70 GB / 18.70 GB
real 14m27.127s
user 4m32.126s
sys 4m35.612s
Thu Jun 15 16:02:17 EDT 2017 - 86.7\t0.1\t22
Thu Jun 15 16:02:27 EDT 2017 - 89.8\t0.1\t21
Thu Jun 15 16:02:38 EDT 2017 - 108.7\t0.1\t26
Thu Jun 15 16:02:48 EDT 2017 - 85.8\t0.1\t24
Thu Jun 15 16:02:58 EDT 2017 - 91.7\t0.1\t24
Thu Jun 15 16:03:08 EDT 2017 - 128.3\t0.1\t26
Thu Jun 15 16:03:18 EDT 2017 - 103.3\t0.1\t26
Thu Jun 15 16:03:28 EDT 2017 - 90.1\t0.1\t24
Thu Jun 15 16:03:38 EDT 2017 - 14.0\t0.1\t24
Thu Jun 15 16:03:48 EDT 2017 - 22.6\t0.1\t24
Thu Jun 15 16:03:58 EDT 2017 - 10.8\t0.1\t24
Thu Jun 15 16:04:08 EDT 2017 - 9.5\t0.1\t24
Thu Jun 15 16:04:18 EDT 2017 - 13.8\t0.1\t22
Thu Jun 15 16:04:28 EDT 2017 - 10.7\t0.1\t21
Thu Jun 15 16:04:38 EDT 2017 - 110.9\t0.1\t28
Thu Jun 15 16:04:49 EDT 2017 - 11.4\t0.1\t24
Thu Jun 15 16:04:59 EDT 2017 - 11.6\t0.1\t24
Thu Jun 15 16:05:09 EDT 2017 - 11.1\t0.1\t24
Thu Jun 15 16:05:19 EDT 2017 - 9.9\t0.1\t24
Thu Jun 15 16:05:29 EDT 2017 - 14.5\t0.1\t24
Thu Jun 15 16:05:39 EDT 2017 - 10.4\t0.1\t21
Thu Jun 15 16:05:49 EDT 2017 - 12.7\t0.1\t24
Thu Jun 15 16:05:59 EDT 2017 - 104.4\t0.1\t24
Thu Jun 15 16:06:09 EDT 2017 - 15.8\t0.1\t24
Thu Jun 15 16:06:19 EDT 2017 - 55.7\t0.1\t26
Thu Jun 15 16:06:29 EDT 2017 - 12.5\t0.1\t24
Thu Jun 15 16:06:39 EDT 2017 - 24.9\t0.1\t23
Thu Jun 15 16:06:49 EDT 2017 - 27.5\t0.1\t24
Thu Jun 15 16:06:59 EDT 2017 - 19.8\t0.1\t24
Thu Jun 15 16:07:09 EDT 2017 - 18.2\t0.1\t24
Thu Jun 15 16:07:19 EDT 2017 - 25.1\t0.1\t24
Thu Jun 15 16:07:29 EDT 2017 - 19.1\t0.1\t24
Thu Jun 15 16:07:39 EDT 2017 - 18.7\t0.1\t24
Thu Jun 15 16:07:50 EDT 2017 - 39.7\t0.1\t24
Thu Jun 15 16:08:00 EDT 2017 - 98.1\t0.1\t24
Thu Jun 15 16:08:10 EDT 2017 - 108.3\t0.1\t24
Thu Jun 15 16:08:20 EDT 2017 - 87.8\t0.1\t22
Thu Jun 15 16:08:30 EDT 2017 - 11.4\t0.1\t24
Thu Jun 15 16:08:40 EDT 2017 - 19.7\t0.1\t21
Thu Jun 15 16:08:50 EDT 2017 - 12.3\t0.1\t24
Thu Jun 15 16:09:00 EDT 2017 - 11.1\t0.1\t24
Thu Jun 15 16:09:10 EDT 2017 - 10.5\t0.1\t24
Thu Jun 15 16:09:20 EDT 2017 - 12.0\t0.1\t24
Thu Jun 15 16:09:30 EDT 2017 - 13.9\t0.1\t26
Thu Jun 15 16:09:40 EDT 2017 - 10.7\t0.1\t24
Thu Jun 15 16:09:50 EDT 2017 - 11.1\t0.1\t24
Thu Jun 15 16:10:00 EDT 2017 - 18.0\t0.1\t30
Thu Jun 15 16:10:10 EDT 2017 - 66.2\t0.1\t24
Thu Jun 15 16:10:20 EDT 2017 - 11.6\t0.1\t24
Thu Jun 15 16:10:31 EDT 2017 - 13.7\t0.1\t24
Thu Jun 15 16:10:41 EDT 2017 - 11.2\t0.1\t24
Thu Jun 15 16:10:51 EDT 2017 - 14.2\t0.1\t22
Thu Jun 15 16:11:01 EDT 2017 - 12.2\t0.1\t24
Thu Jun 15 16:11:11 EDT 2017 - 11.8\t0.1\t25
Thu Jun 15 16:11:21 EDT 2017 - 11.0\t0.1\t24
Thu Jun 15 16:11:31 EDT 2017 - 10.7\t0.1\t24
Thu Jun 15 16:11:41 EDT 2017 - 11.2\t0.1\t24
Thu Jun 15 16:11:51 EDT 2017 - 11.5\t0.2\t24
Thu Jun 15 16:12:01 EDT 2017 - 11.6\t0.2\t24
Thu Jun 15 16:12:11 EDT 2017 - 13.2\t0.2\t24
Thu Jun 15 16:12:21 EDT 2017 - 10.5\t0.2\t24
Thu Jun 15 16:12:31 EDT 2017 - 10.8\t0.2\t24
Thu Jun 15 16:12:41 EDT 2017 - 98.4\t0.2\t24
Thu Jun 15 16:12:51 EDT 2017 - 89.7\t0.2\t24
Thu Jun 15 16:13:01 EDT 2017 - 12.0\t0.2\t24
Thu Jun 15 16:13:11 EDT 2017 - 20.5\t0.2\t24
Thu Jun 15 16:13:21 EDT 2017 - 14.7\t0.2\t24
Thu Jun 15 16:13:32 EDT 2017 - 15.0\t0.2\t24
Thu Jun 15 16:13:42 EDT 2017 - 12.3\t0.2\t24
Thu Jun 15 16:13:52 EDT 2017 - 66.4\t0.2\t24
Thu Jun 15 16:14:02 EDT 2017 - 12.2\t0.2\t24
Thu Jun 15 16:14:12 EDT 2017 - 18.5\t0.2\t24
Thu Jun 15 16:14:22 EDT 2017 - 16.9\t0.2\t24
Thu Jun 15 16:14:32 EDT 2017 - 28.4\t0.2\t23
Thu Jun 15 16:14:42 EDT 2017 - 13.9\t0.2\t24
Thu Jun 15 16:14:52 EDT 2017 - 12.8\t0.2\t24
Thu Jun 15 16:15:02 EDT 2017 - 11.4\t0.2\t24
Thu Jun 15 16:15:12 EDT 2017 - 10.9\t0.2\t24
Thu Jun 15 16:15:22 EDT 2017 - 12.1\t0.2\t24
Thu Jun 15 16:15:32 EDT 2017 - 13.1\t0.2\t24
Thu Jun 15 16:15:42 EDT 2017 - 66.7\t0.2\t24
Thu Jun 15 16:15:52 EDT 2017 - 98.9\t0.2\t26
Thu Jun 15 16:16:02 EDT 2017 - 28.0\t0.2\t23
Thu Jun 15 16:16:13 EDT 2017 - 25.2\t0.2\t24
Thu Jun 15 16:16:23 EDT 2017 - 20.1\t0.2\t24
Thu Jun 15 16:16:33 EDT 2017 - 21.9\t0.2\t19
One thing that might help narrow the scope is using git lfs fetch
instead of git lfs clone
. The clone command wraps a git clone
. If you want to look at _just_ LFS download performance on a repository, you can do:
# in your repository working directory
$ rm -rf .git/lfs/objects
$ git lfs fetch
Example:
$ rm -rf .git/lfs/objects
$ git lfs fetch
Fetching master
Git LFS: (96 of 505 files) 39.62 MB / 1.17 GB
@technoweenie Thanks.
It seems like no change. I'm using the time
command to time the results. During every test, the first 8.09GB downloads at wire-speed. I'm extremely curious to find out what's different about that first 8.09GB that makes it faster.
unity-macminiv7-02:PullTest7 UnityDev$ rm -rf .git/lfs/objects
unity-macminiv7-02:PullTest7 UnityDev$ time git lfs fetch
Fetching master
Git LFS: (20726 of 20726 files) 18.70 GB / 18.70 GB
real 14m14.012s
user 1m54.300s
sys 2m45.042s
git -c lfs.concurrenttransfers=100 lfs clone http://Username:Password@URL/Project PullTest9
Nearly cut the time in half, which is great but it's still the same behavior of the first 8.09GB at wire-speed and the rest moving far slower. Average speed of the full clone is about 50MB/s. I feel we can still do better!
Additionally, this is not an lfs problem but SourceTree doesn't seem have a lfs.concurrenttransfers option, and getting my 3D artists to use command line would be..... difficult.
@CopyRunStart AFAIK SourceTree respects the lfs.concurrenttransfers
option. What is the distribution of your LFS file size? Do they all have roughly the same size or do you have many small files and a few big files?
Try to clone like this:
GIT_TRACE=1 GIT_CURL_VERBOSE=1 git lfs clone http://Username:Password@URL/Project
This should give you a more detailed output that might hint the problem. If you redact the output (remove at least Authorization
field and your password from the URL) then you could post the output here, too.
@larsxschneider
Forgive my ignorance, but I only know how to set the lfs.concurrenttransfers
by command line. If I wanted to use it in SourceTree, would I put it in .gitattributes?
Here is the breakdown of file sizes.
> 1k: 13221
> 2k: 2321
> 4k: 8230
> 8k: 141
> 16k: 126
> 32k: 197
> 64k: 158
> 128k: 123
> 256k: 147
> 512k: 108
> 1M: 85
> 2M: 90
> 4M: 107
> 8M: 139
> 16M: 95
> 32M: 80
> 64M: 19
> 128M: 4
> 256M: 8
> 512M: 8
GIT_TRACE=1 GIT_CURL_VERBOSE=1 git lfs clone
returns: GIT_TRACE is not a recognized command on both *nix and Windows. I tried git -c GIT_TRACE=1 GIT_CURL_VERBOSE=1 git lfs clone
to no avail. What am I doing wrong with the syntax?
Please try to run this in git-bash
:
GIT_TRACE=1 GIT_CURL_VERBOSE=1 git lfs clone http://Username:Password@URL/Project
If you configure LFS in your global Git config then SourceTree should use it:
git config --global lfs.concurrenttransfers 10
You have more than 20k tiny files in LFS. That is not ideal. If you have time then you can watch this talk to learn why: https://www.youtube.com/watch?v=YQzNfb4IwEY
@larsxschneider
I watched that video and all your other videos last week, I'll go ahead and re-watch that one.
I have the output from that command but it is half a million lines long. Is there something in particular we are looking for?
@larsxschneider
Looking deeper into the files created by the project, Unity does not seem to be a perfect fit for LFS (or any other free VCS!).
Meta files can range in size anywhere from 8 Bytes to 200MBytes so I cannot fix this issue by excluding .meta
files from LFS. The same issue is true of .mat
files. Assuming I was able to get all .meta
files down to 16MBytes or less, would Git handle them better than LFS would?
As for .mat files
, what do you at Autodesk do with large projects that have a lot of .mat
files produced by 3DS Max
?
Been putting some time into Git LFS perf for video game repos recently as it is really a big problem from time to time.
@larsxschneider I just watched your most recently linked file - Great talk. However I had to facepalm at some of the 'work arounds' you offered to help solve the many small files performance issue. Forcing the user to name their small files specifically to enable them to be tracked or not in LFS? That isn't a practical solution.
I haven't gone and performed any robust performance tests, but I am assuming that the root cause of such extremely slow performance around many small files is that each file requires its own individual connection to be brought up and then torn down, correct?
I am assuming that the root cause of such extremely slow performance around many small files is that each file requires its own individual connection to be brought up and then torn down, correct?
It depends. In the class 'one smudge per checkout entry' environment, this is true. For each file in the checkout, LFS must:
This improves with the process
filter, and again with the delay
capability.
So I had a read of all of your recently linked threads on this @ttaylorr - Looks like some great steps towards resolving these scalability issues. :) Could be I don't fully understand the proposals, but none seem to yet tackle the restriction around having to spin up a new TCP connection per file. Is this right?
none seem to yet tackle the restriction around having to spin up a new TCP connection per file.
Correct. How do you think we could solve this? Maybe using HTTP/2?
I think there are actually quite a few different solutions on offer, depending on where we want to move the bottleneck at scale. I was discussing this with someone who works at GitHub recently actually, and they brought up the idea of archiving/packing the data on the server before sending it down. However I would worry about that - One could be asking the server to pack tens of GB, as a common case. That could put considerable strain on server memory or temporary storage resources.
Personally I think the best solution is a (relatively) simple file stream - Let the server just push one file after another down a single datastream. Let the client be responsible for cutting it back up into the requisite files.
It has the benefit of working transparently with future improvements, such as file compression done on the client/server, concurrency, etc.
Sadly I don't know too much about the underlying LFS framework - Is the protocol sitting on top of HTTP?
Personally I think the best solution is a (relatively) simple file stream - Let the server just push one file after another down a single datastream. Let the client be responsible for cutting it back up into the requisite files.
Interesting idea. I think this is a solid start, though we may run into additional challenges if we start talking about things like multiplexing file downloads. Maybe multiplexing file downloads isn't a goal, but if so, I think more discussion is warranted here.
Sadly I don't know too much about the underlying LFS framework - Is the protocol sitting on top of HTTP?
Yes.
Indeed a simple ordered file stream would be fine. If we wanted parallelism I think the better way to do it would be to run concurrent identical processes, each with their own set of unique files that the client/server negotiated to push up/down.
Does the LFS client currently know all of the files it needs to go get? Or is it still stuck behind Git slurping one file at a time?
Does the LFS client currently know all of the files it needs to go get? Or is it still stuck behind Git slurping one file at a time?
With the 'delay' capability (see: #2466) we can determine all of the files up front, or in a stream-like fashion.
Is this "process filter" implemented yet? I'm currently working on a project that uses git-lfs for a lot of files (tens of thousands) and after a rebase (and checkout of a branch with only a few differences), the git status
command is now so slow that I just break it off after 10+ minutes... I can see it spawning lots and lots of new processes (under windows) which doesn't look very efficient to me.
The git-lfs status
command did manage to complete (after 5 minutes) and did not seem to spawn any additional processes.
Is there anything that can be done? At this point I'm tempted to somehow disable it completely and fetch the resources in a different way. Using git 2.9.0.windows.1
and git-lfs git-lfs/2.2.1 (GitHub; windows amd64; go 1.8.3; git 621d1f82)
Absolutely! Please update to the latest Git for Windows version and the latest Git LFS version to get the speed up.
One word of warning: Unfortunately, Git 2.14.2+ was shipped with half the feature. That means you get the speed up but you don't get the progress output. When Git processes LFS files then it will appear hanging. This will be fixed with Git 2.15.0 which is scheduled to be released next week.
Thanks for the quick update, I will update to 2.15.0 as soon as it is available. EDIT: misread that, will update to 2.14.2+ right away to test this.
It is performing a lot better. The first git status
was still slow but completed in just under 10 minutes (and I saw no processes being spawned):
real 9m27.017s
user 0m0.015s
sys 0m0.000s
The second git status
took less than 3 seconds. Thanks a lot :)
It is still true, though, that git credential approve
is executed in a new process for each transferred file (on checkout), correct? Note that it is not re-prompting for credentials--it is getting cached credentials, but it seems to still re-authorize for each file transfer, and after each successful authorization, it calls 'credential approve' to re-cache the approved credentials. On Windows, this overhead raises the bar for which types of files are worth tracking with LFS. For example, it was not worth using LFS to track DLLs in a particular repo of mine, because (1) the DLLs often compressed moderately well, and (2) the file sizes varied widely enough that transferring many smaller/medium DLLs via LFS took longer, in large part due to the credential approve calls.
Woah. Really? That should be investigated. Unfortunately, I am swamped with other stuff right now and I am out on vacation the following two weeks.
@m-akinc it might help the devs if you could create a new issue for this specific problem if there isn't one already. Should help track it resolve it.
Created https://github.com/git-lfs/git-lfs/issues/2690 to track that.
Hey @m-akinc: Would you be able to confirm that the fix in #2695 works? I uploaded some sample builds for windows, linux, and mac to that PR.
@technoweenie, I really appreciate you addressing it so quickly, so I'm sorry I haven't had a chance to try it yet. I should get to it soon.
@m-akinc no worries! I already merged the PR, but feedback is appreciated whenever you have the time :)
I ran into some perf issues today with the github client on windows. Adding git config --global lfs.concurrenttransfers 20
as suggested above resolved the issue. And yeah it's rough these game engines use the same filename for all game assets.
I'm running to an issue with slow git status
as well. I have a machine learning codebase with the models stored as lfs objects. Git status is fast on my local machine, but I have it rsynced to a server, and every git status takes about 60 seconds. I've seen similar before where the first git status after a big change takes some time, but this usually goes away after the first or second git status
.
Since I'm using oh-my-zsh, git status runs at EVERY new CLI prompt, so this breaks my workflow.
Happens with both rsync -a
and scp -r
. What is interesting is on my dev machine (where it is fast), the branch is clean, but on the server, it shows "modified" on all the LFS tracked files. What might be complicating matters is that the server is on UTC but my dev machine is EST-US.
You're going to see this problem any time you copy data from one location to another, whether that's with Git LFS or not, because the stat information for Git includes the device and inode, which are almost certainly going to be different on a different system. When the stat information is dirty, Git will re-read all the files and filter them, which is expensive for large files.
If you're seeing them modified, then you should make sure that Git LFS is installed on the remote machine; otherwise, Git will treat them as regular files, which will show as modified. In addition, you may benefit from setting core.checkStat
to minimal
and core.trustctime
to false
so that the device and inode information and the ctime are not included. You should definitely prefer rsync -a
over scp
, since that will preserve modification times, which are stored in the index.
This problem isn't specific to Git LFS; it will happen with any situation where you use Git and a filtering mechanism and copy from one location to another. Git LFS doesn't manage the index: that's all Git.
@bk2204 Thanks! I always wondered about that but my google-fu apparently lacked the right keywords to figure out how to resolve that.
Edit: Just setting core.checkStat
to minimal
on my remote solved the issue with the extraneous files being flagged as modified.
@clifton
And yeah it's rough these game engines use the same filename for all game assets.
Prob you can try to use different directories with separated lfs configurations:
/store-mat-files-in-lfs/.gitattributes: *.mat filter=lfs diff=lfs merge=lfs -text
/store-mat-files-in-lfs/a.mat
/.gitattributes: *.mat -text
/a.mat
In this case only files in store-mat-files-in-lfs will be pushed to lfs. The rest files should be stored as usual.
Most helpful comment
Please try to run this in
git-bash
:If you configure LFS in your global Git config then SourceTree should use it:
You have more than 20k tiny files in LFS. That is not ideal. If you have time then you can watch this talk to learn why: https://www.youtube.com/watch?v=YQzNfb4IwEY