Checkout: suggestions for large repository

Created on 24 Aug 2019  路  17Comments  路  Source: actions/checkout

I'm experimenting with Github Actions on a large code repository at $WORK. We use a mix of Concourse ci and Jenkins and are looking towards Github Actions as a potential CI/CD tool with less hosting maintenance.

In some simple experiments we found the actions/checkout command to be slower that expected. A checkout step takes ~6m 53s using with.fetch-depth: 1, and that's before my job can do anything useful.

In our Jenkins set up we have a persistent clone that we do local clones from. In concourse we use the git resource. In both cases the process of fetching a given version of the code feels faster than actions/checkout and was wondering if there as any actions tuning parameters would could apply to speed up the process

I can see a glimpse into what the action is doing in the output.

git remote add origin https://github.com/{org}/{repo}
 git config gc.auto 0
git config --get-all http.https://github.com/{org}/{repo}.extraheader
git config --get-all http.proxy
git -c http.extraheader="AUTHORIZATION: basic ***" fetch --tags --prune --progress --no-recurse-submodules --depth=1 origin +refs/heads/*:refs/remotes/origin/*

Is there a way we can opt out of features like fetching all of the tags (we have a lot of tags) or submodules (we don't use submodules)?

I started looking at want concourse git resource does to draw some comparisons. Looking here I can see it does something like a clone + checkout operation

git clone --single-branch $depthflag $uri $branchflag $destination $tagflag

cd $destination

git fetch origin refs/notes/*:refs/notes/* $tagflag

if [ "$depth" -gt 0 ]; then
  "$bin_dir"/deepen_shallow_clone_until_ref_is_found_then_check_out "$depth" "$ref" "$tagflag"
else
  git checkout -q "$ref"
fi

I'm not sure if that's any better or worse our experience is that it "feels" more performance that what we see with github actions/checkout.

Most helpful comment

fixing this in v2, will hopefully merge early next week

All 17 comments

I have similar problem.

git fetch + git checkout takes ~4 minutes (TeamCity needs a few seconds, because it only applies the changes from previous run)

I solved it with:

steps:
    - name: Clone working branch
      run: git clone --single-branch --branch ${{ github.head_ref }} --depth 1 https://${{ secrets.CLONE_TOKEN }}:[email protected]/${{ github.repository }}.git .

Shallow clone takes ~ 30 seconds.

Maybe we need an official shallow-clone action?

@zoispag just curious what was the scope applied to secrets. CLONE_TOKEN vs the default GITHUB_TOKEN that GitHub provides to each action?

@softprops I gave repo access.
image
Because I needed to clone via https from a private repo, I used the OAuth key of an actual account/github user

Thanks. It would be great if GITHUB_TOKEN had access to do that. I feel hesitant about these personal access tokens with repo status because they give access to all of the repos your github user has access to. I believe GITHUB_TOKEN provided by GitHub is scoped to only permissions for the specific repo.

In any case I tried your solution and it worked! down from 7 minutes I'm at about 1m 22s. That's a huge savings. Thank for for posting a reply and sorry I was late to notice it!

No worries for late replying!
I feel the same about the CLONE_TOKEN, but GITHUB_TOKEN did not allow to perform this task. 馃槥Though I am using a "dedicated" GitHub account, which generally acts as a bot, so by definition has access only to this repo.

7m to 1m22s is a huge improvement, so happy to help! 馃帀 馃榿

Codified what worked for us here
https://github.com/meetup/express-checkout

@softprops I wanted to do the same, but didn't have the time yet.
It would be great to include me as a contributor in your README though 馃槈

This is definitely something that needs to be improved. Fetching all tags and branches can be slow.

fixing this in v2, will hopefully merge early next week

@softprops @XhmikosR checkout v2-beta (now in master). waiting for feedback/stabilization, then will push v2 tag. v2 fetches only a single commit by default

@ericsciple I've enabled it in our repo and we went from 30-40 seconds checkouts to 5-7 seconds 鉂わ笍 (even in our teeny tiny 12k commits repo) this is gut! I can only imagine what actual bug repo's will see as improvements!

Thanks for all the work from you and everyone involved!

I haven't had a change to take a look yet but would it be possible to post a comment to this issue when there's an official v2 release published?

I just ran a benchmark on our largest repo. This now comparable in performance to our work around. As soon as the beta version changes to v2 we're just going to switch to actions/checkout proper

published v2 tag

Works great! Thanks folks

with checkout@v2 it indeed saves a lot of bandwidth but seems much slower on the download speed. Please see the comparison bellow. I suspect it is because GitHub has to do the compression on the fly when fetching with --depth=1?

V1
image

V2
image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pdcmoreira picture pdcmoreira  路  3Comments

rster2002 picture rster2002  路  7Comments

KOLANICH picture KOLANICH  路  4Comments

bnb picture bnb  路  3Comments

gitfool picture gitfool  路  3Comments