Client: git pull takes over a minute and 20% of CPU

Created on 7 Jul 2018  Â·  25Comments  Â·  Source: keybase/client

When there is a new commit available to pull on a branch, git pull seems to be taking over a minute and is using a consistent ~20% CPU on all cores of my 4 core (8 virtual core) machine, for the duration. This same behavior is occurring for another user/computer.

acked

Most helpful comment

Thanks so much for the repro! I've spent most of the day trying to figure this out, and I think I have a grasp of it now, though I'm not yet sure how to fix it. I'm going to open up an internal issue for it, but I'll summarize it here as well (though it's going to get a bit technical):

  • Our git layer uses the awesome https://github.com/src-d/go-git library, and creates a storage layer on top of KBFS to pass to go-git.
  • go-git has to treat the repo with the storage layer as its "local" repo, and communicates with the other one via the remote git protocol. Since Keybase doesn't (and can't) implement this remote protocol, the Keybase repo is always considered the "local" repo, which the local checkout on the client's local checkout is actually the "remote" repo for this purpose.
  • Because of this, when a client pulls from a Keybase repo, we actually turn that into a push from the Keybase repo into the client's local checkout.
  • A push first lists all the commits that are in use as references on the client's local checkout.
  • Then it tries to extrapolate, using it's "local" storage (i.e., KBFS) the COMPLETE list of commits and git objects that is implied by the initial list of commits. This entails walking the entire git commit history that precedes each commit.
  • That would be pretty fast if you're using local storage, but it turns out that's pretty slow in KBFS, and that's where the time and CPU is used up.

What I think we _want_ to happen in this case is one of two things:

  1. Fetch the entire list of objects from the client's local checkout, rather than trying to get them from KBFS. Unfortunately, this git remote protocol doesn't have a way to do this easily as far as I can tell, and we'd have to carve up go-git pretty badly to get around that.
  2. Re-write go-git's push algorithm to only inspect objects that differ between the asked-for commit and the client's advertised references. I started in on this a little bit today, but it's a pretty big change and seems to require some new machinery in go-git that I'm not prepared to work on right now.

I'll try to put some effort into the second option sometime in the next week or two.

All 25 comments

a new commit

Are you sure it's only pulling ONE commit?

This repo has like 45687465873465 commits per second... so I wouldn't be surprised if it's just taking a while to parse all the deltas.

@EmpireJones If you do a keybase log send, and post the resulting ID here, I can take a look.

@dabura667 To clarify, I'm referring to a team repo. But yes, only one commit.

@strib be070d5ea27fde18b31af71c

Thanks!

Is this the fetch you did around 18:05 your local time? The logs are a bit truncated unfortunately, so I can't see how long it took. It seems to be quite a big repo though, from what I see in the log it's looking at a ton of objects.

Do you still have the terminal output for what git said while you were fetching? I'm interested to see how much data was transferred.

Also, if you're willing to share more logs, that might be useful. To do so, you can try:

tar -czf /keybase/private/strib@github,empirejones/client-12730.tgz ~/.cache/keybase/keybase.git*

Yeah, that should be the one.

Log sent.

Initializing Keybase... done.
Syncing with Keybase... done.
Counting and decrypting: 16 objects... done.
Reading and decrypting metadata: (100.00%) 16/16 objects... done.
Fetching and decrypting objects: (100.00%) 16/16 objects... done.

Thanks! I don't see anything obviously unusual going on, other than all the data being read. A few more questions:

  • Has this happened for more than one commit, or did it happen just the once (on multiple devices) with the same commit?
  • If it was just one commit, did the commit contain an unusual amount of data? A few big binary files checked in or anything? The output above says only 16 objects, so I guess it wasn't a lot of files.
  • After you fetch the commit, if you pull a second time, is it fast?
  • Do you know anything about the origin of the repo? Was it imported from a pre-existing repo with a big pre-existing history, or was it empty initially and has grown only via keybase git commits?
  • More than one commit. I've noticed it during 4 pulls so far (which is every time for this repo), but didn't before that because I was working on a branch alone (nobody else commiting).
  • N/A as it was more than one commit, but it was just a handful of lines of code in 4 files
  • Yep, pulling again happens quickly.
  • It's a somewhat big repo, but the majority of it is text/code; I try to avoid checking in any large files (e.g. binary), although it might have some graphics. It was previously stored on BitBucket and then AWS CodeCommit. I didn't truncate the history when switching to keybase, which was recently.

The repo has ~1500 commits.

Ok thanks. I wonder if the repo just has too many pack files and that's slowing everything down. One thing you can try is forcing garbage-collection on the repo. We do have a keybase git gc command that you can try, but I don't think we've turned on the ability to re-pack objects yet into a single pack file.

However, if you can be sure that no one will try to push to the repo for a few minutes, you can try running regular git gc. To do that, try cd /keybase/team/<teamname>/.kbfs_git/<reponame>. Then ls objects/packs/*.pack to see if you have a lot of packfiles. If so, you can try git gc. But if someone tries to push during that time, it might leave data in a weird state.

I wasn't able to do a git gc inside of the .kbfs_git/<reponame> folder. I am getting these errors:
First Run:

error: unable to create temporary file: Interrupted system call
fatal: unable to force loose object
error: failed to run repack

Additional runs:

Counting objects: 24304, done.
error: pack-objects died of signal 7 
error: failed to run repack

The ./objects/pack/ folder currently has 79 files.

git count-objects -v reports this info:

warning: garbage found: ./objects/05/tmp_obj_7wObmh
warning: garbage found: ./objects/pack/tmp_pack_odRnG4
warning: garbage found: ./objects/pack/tmp_pack_8qMNAb
count: 14
size: 9
in-pack: 24360
packs: 38
size-pack: 100460
prune-packable: 14
garbage: 3
size-garbage: 3584

Running git gc in my local repo folder did not result in any performance improvements.

Ah sorry, try git gc -quiet, then you can probably avoid those system call errors.

If that doesn't work, I can try to get you a build with a working keybase-based gc next week.

Does that only need to be run from one machine? If so, it didn't seem to impact performance. It ran without errors though.

Thanks for continuing to assist with this!

I discovered a large file (~90mb) which was committed accidently by a team member. I'm working on getting that removed from history. It appears that just using a branch with the commit removed does not fix the performance issue (perhaps because it's still in the pack files from the other branches).

  • Does this sound accurate -- that a large file would impact the checkout performance of future commits?
  • Or is the slow performance more likely due to the number of commits/changes total?

Does that only need to be run from one machine? If so, it didn't seem to impact performance. It ran without errors though.

Yes, just from one machine. Oh well -- is there only one pack file now in that objects/pack directory?

Does this sound accurate -- that a large file would impact the checkout performance of future commits?

My gut instinct is that it wouldn't be related, especially not after the first time you fetched it.

Is this a repo you'd be willing to share with me (by copying it from your .kbfs_git into a shared folder with me, not by giving me access to the original) for debugging purposes? Totally reasonable if the answer is no, but that might speed things up a bit.

Yes, after doing a git gc --quiet, there is only one file. I was able to get that file down from ~102mb to ~15mb by removing remaining branches with the large file (locally and remotely), but that didn't seem to help at all, so it seems the large file was a red herring.

Sorry, I can't share the repo. Thanks again.

@strib Any update on that "build with a working keybase-based gc"?

Not yet, but we're supposed to be working on it this week. But it won't be any better than the regular git gc, so I'm not sure it will help in your case. It's hard for me to know what's going on without a way to reproduce the issue myself.

Maybe now that you only have one pack file, you can send logs again and I can take a fresh look?

@strib Good news, I was able find an easily reproducible case of this on a relatively small repo with many commits.

  1. Clone https://github.com/helm/helm
  2. Add keybase as remote and push to keybase
  3. Copy cloned ./helm folder to ./helm2
  4. Make a minor change in ./helm, and push
  5. Pull change from ./helm2

Thanks so much for the repro! I've spent most of the day trying to figure this out, and I think I have a grasp of it now, though I'm not yet sure how to fix it. I'm going to open up an internal issue for it, but I'll summarize it here as well (though it's going to get a bit technical):

  • Our git layer uses the awesome https://github.com/src-d/go-git library, and creates a storage layer on top of KBFS to pass to go-git.
  • go-git has to treat the repo with the storage layer as its "local" repo, and communicates with the other one via the remote git protocol. Since Keybase doesn't (and can't) implement this remote protocol, the Keybase repo is always considered the "local" repo, which the local checkout on the client's local checkout is actually the "remote" repo for this purpose.
  • Because of this, when a client pulls from a Keybase repo, we actually turn that into a push from the Keybase repo into the client's local checkout.
  • A push first lists all the commits that are in use as references on the client's local checkout.
  • Then it tries to extrapolate, using it's "local" storage (i.e., KBFS) the COMPLETE list of commits and git objects that is implied by the initial list of commits. This entails walking the entire git commit history that precedes each commit.
  • That would be pretty fast if you're using local storage, but it turns out that's pretty slow in KBFS, and that's where the time and CPU is used up.

What I think we _want_ to happen in this case is one of two things:

  1. Fetch the entire list of objects from the client's local checkout, rather than trying to get them from KBFS. Unfortunately, this git remote protocol doesn't have a way to do this easily as far as I can tell, and we'd have to carve up go-git pretty badly to get around that.
  2. Re-write go-git's push algorithm to only inspect objects that differ between the asked-for commit and the client's advertised references. I started in on this a little bit today, but it's a pretty big change and seems to require some new machinery in go-git that I'm not prepared to work on right now.

I'll try to put some effort into the second option sometime in the next week or two.

Hey, any news on this thread? We are also experiencing the issue with git pull when there is at least one commit to pull, it takes several minutes to complete (~5 minutes). Whenever I do a push I see a message that says:

Tip: this repo could be sped up with some garbage collection. Run this command:
keybase git gc ?? --?? ??

But when I try it I get this:

This may take several minutes...
â–¶ ERROR Getting object d66c??????????????????8bfa failed: object not found

I have tried the above command a few times and I have gotten an error on a different hash each time.

There's been some discussion of it here, but it looks like it's going to take a lot of work so we haven't been able to prioritize it yet: https://github.com/src-d/go-git/issues/909. We're a small team that has a few other things going on right now.

Your "object not found" issue seems unrelated. Please open a new issue for it, and run keybase log send to include a log so we can debug it.

Ah ok, thanks for the reply. I thought it could somehow be related to the same issue. I just did another keybase git gc and it succeeded :)

@strib I see you're still pursuing this issue.

My pulls are always quick but fetching has become extremely slow for all machines. I've gc'd the KBFS repo, no change. If you think this issue is related then I'll just hold tight but otherwise I'm happy to send logs and create a new issue if necessary.

@soulofmischief probably related. I'll be working on a fix this week I hope...

The next Keybase release should have significant performance improvements for fetches/pulls on repos with large histories. In the helm example above, when pulling a single new commit, the time taken on my machine went from 3m48s to less than 5s. (It's available in the unstable Linux nightly build now, see https://keybase.io/docs/linux-user-guide#nightly-builds if you're interested.)

Thanks for everyone's patience here. I think our long national nightmare is almost over.

Was this page helpful?
0 / 5 - 0 ratings