When there is a new commit available to pull on a branch, git pull seems to be taking over a minute and is using a consistent ~20% CPU on all cores of my 4 core (8 virtual core) machine, for the duration. This same behavior is occurring for another user/computer.
a new commit
Are you sure it's only pulling ONE commit?
This repo has like 45687465873465 commits per second... so I wouldn't be surprised if it's just taking a while to parse all the deltas.
@EmpireJones If you do a keybase log send, and post the resulting ID here, I can take a look.
@dabura667 To clarify, I'm referring to a team repo. But yes, only one commit.
@strib be070d5ea27fde18b31af71c
Thanks!
Is this the fetch you did around 18:05 your local time? The logs are a bit truncated unfortunately, so I can't see how long it took. It seems to be quite a big repo though, from what I see in the log it's looking at a ton of objects.
Do you still have the terminal output for what git said while you were fetching? I'm interested to see how much data was transferred.
Also, if you're willing to share more logs, that might be useful. To do so, you can try:
tar -czf /keybase/private/strib@github,empirejones/client-12730.tgz ~/.cache/keybase/keybase.git*
Yeah, that should be the one.
Log sent.
Initializing Keybase... done.
Syncing with Keybase... done.
Counting and decrypting: 16 objects... done.
Reading and decrypting metadata: (100.00%) 16/16 objects... done.
Fetching and decrypting objects: (100.00%) 16/16 objects... done.
Thanks! I don't see anything obviously unusual going on, other than all the data being read. A few more questions:
The repo has ~1500 commits.
Ok thanks. I wonder if the repo just has too many pack files and that's slowing everything down. One thing you can try is forcing garbage-collection on the repo. We do have a keybase git gc command that you can try, but I don't think we've turned on the ability to re-pack objects yet into a single pack file.
However, if you can be sure that no one will try to push to the repo for a few minutes, you can try running regular git gc. To do that, try cd /keybase/team/<teamname>/.kbfs_git/<reponame>. Then ls objects/packs/*.pack to see if you have a lot of packfiles. If so, you can try git gc. But if someone tries to push during that time, it might leave data in a weird state.
I wasn't able to do a git gc inside of the .kbfs_git/<reponame> folder. I am getting these errors:
First Run:
error: unable to create temporary file: Interrupted system call
fatal: unable to force loose object
error: failed to run repack
Additional runs:
Counting objects: 24304, done.
error: pack-objects died of signal 7
error: failed to run repack
The ./objects/pack/ folder currently has 79 files.
git count-objects -v reports this info:
warning: garbage found: ./objects/05/tmp_obj_7wObmh
warning: garbage found: ./objects/pack/tmp_pack_odRnG4
warning: garbage found: ./objects/pack/tmp_pack_8qMNAb
count: 14
size: 9
in-pack: 24360
packs: 38
size-pack: 100460
prune-packable: 14
garbage: 3
size-garbage: 3584
Running git gc in my local repo folder did not result in any performance improvements.
Ah sorry, try git gc -quiet, then you can probably avoid those system call errors.
If that doesn't work, I can try to get you a build with a working keybase-based gc next week.
Does that only need to be run from one machine? If so, it didn't seem to impact performance. It ran without errors though.
Thanks for continuing to assist with this!
I discovered a large file (~90mb) which was committed accidently by a team member. I'm working on getting that removed from history. It appears that just using a branch with the commit removed does not fix the performance issue (perhaps because it's still in the pack files from the other branches).
Does that only need to be run from one machine? If so, it didn't seem to impact performance. It ran without errors though.
Yes, just from one machine. Oh well -- is there only one pack file now in that objects/pack directory?
Does this sound accurate -- that a large file would impact the checkout performance of future commits?
My gut instinct is that it wouldn't be related, especially not after the first time you fetched it.
Is this a repo you'd be willing to share with me (by copying it from your .kbfs_git into a shared folder with me, not by giving me access to the original) for debugging purposes? Totally reasonable if the answer is no, but that might speed things up a bit.
Yes, after doing a git gc --quiet, there is only one file. I was able to get that file down from ~102mb to ~15mb by removing remaining branches with the large file (locally and remotely), but that didn't seem to help at all, so it seems the large file was a red herring.
Sorry, I can't share the repo. Thanks again.
@strib Any update on that "build with a working keybase-based gc"?
Not yet, but we're supposed to be working on it this week. But it won't be any better than the regular git gc, so I'm not sure it will help in your case. It's hard for me to know what's going on without a way to reproduce the issue myself.
Maybe now that you only have one pack file, you can send logs again and I can take a fresh look?
@strib Good news, I was able find an easily reproducible case of this on a relatively small repo with many commits.
Thanks so much for the repro! I've spent most of the day trying to figure this out, and I think I have a grasp of it now, though I'm not yet sure how to fix it. I'm going to open up an internal issue for it, but I'll summarize it here as well (though it's going to get a bit technical):
What I think we _want_ to happen in this case is one of two things:
I'll try to put some effort into the second option sometime in the next week or two.
Hey, any news on this thread? We are also experiencing the issue with git pull when there is at least one commit to pull, it takes several minutes to complete (~5 minutes). Whenever I do a push I see a message that says:
Tip: this repo could be sped up with some garbage collection. Run this command:
keybase git gc ?? --?? ??
But when I try it I get this:
This may take several minutes...
â–¶ ERROR Getting object d66c??????????????????8bfa failed: object not found
I have tried the above command a few times and I have gotten an error on a different hash each time.
There's been some discussion of it here, but it looks like it's going to take a lot of work so we haven't been able to prioritize it yet: https://github.com/src-d/go-git/issues/909. We're a small team that has a few other things going on right now.
Your "object not found" issue seems unrelated. Please open a new issue for it, and run keybase log send to include a log so we can debug it.
Ah ok, thanks for the reply. I thought it could somehow be related to the same issue. I just did another keybase git gc and it succeeded :)
@strib I see you're still pursuing this issue.
My pulls are always quick but fetching has become extremely slow for all machines. I've gc'd the KBFS repo, no change. If you think this issue is related then I'll just hold tight but otherwise I'm happy to send logs and create a new issue if necessary.
@soulofmischief probably related. I'll be working on a fix this week I hope...
The next Keybase release should have significant performance improvements for fetches/pulls on repos with large histories. In the helm example above, when pulling a single new commit, the time taken on my machine went from 3m48s to less than 5s. (It's available in the unstable Linux nightly build now, see https://keybase.io/docs/linux-user-guide#nightly-builds if you're interested.)
Thanks for everyone's patience here. I think our long national nightmare is almost over.
Most helpful comment
Thanks so much for the repro! I've spent most of the day trying to figure this out, and I think I have a grasp of it now, though I'm not yet sure how to fix it. I'm going to open up an internal issue for it, but I'll summarize it here as well (though it's going to get a bit technical):
What I think we _want_ to happen in this case is one of two things:
I'll try to put some effort into the second option sometime in the next week or two.