Hi.
GitHub is internally evaluating partial-clone (Actually GitLab already provides the feature!).
This can be useful for a large monorepo with sparse-checkout(see git sparse-checkout --help) feature.
Thus, hereby I request support to those features.
Thanks.
workaround
Here is a workaround suggestion for folks who want to customize the whole process, though I haven't tested.
$ REPO="https://${GITHUB_ACTOR}:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"
$ git clone --your-options $REPO
$ git checkout --your-options <your-subcommand>
FWIW, Travis CI also supports sparse checkout, described at https://docs.travis-ci.com/user/customizing-the-build#git-sparse-checkout
It would be even better to have full-fledged partial-clone though.
actions/checkout@v2 is optimized to fetch only a single commit by default, so you should get some of the same benefits (no unnecessary history)
However doesn't solve the files-outside-of-the-user’s-work-area-in-the-tree problem.
It might be a good idea to add no-checkout option for action, so that people could customize checkout command directly without reimplementing initial setup
For now I'm using this
- name: Sparse checkout
shell: bash
run: |
REPO="https://${GITHUB_ACTOR}:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"
BRANCH="${GITHUB_REF/#refs\/heads\//}"
# Following code is based on logs of actions/checkout@v, with sparseCheckout stuff inserted in the middle
echo "Syncing repository: $GITHUB_REPOSITORY"
echo "Working directory is '$(pwd)' GITHUB_WORKSPACE=$GITHUB_WORKSPACE BRANCH=$BRANCH"
git version
git init $GITHUB_WORKSPACE
git remote add origin https://github.com/$GITHUB_REPOSITORY
git config --local gc.auto 0
# Now interesting part
git config core.sparseCheckout true
# Add here contents of sparse-checkout line by line
echo "..." >> .git/info/sparse-checkout
# echo ...
git -c protocol.version=2 fetch --no-tags --prune --progress --depth=10 origin +${GITHUB_SHA}:refs/remotes/origin/${BRANCH}
git checkout --progress --force -B $BRANCH refs/remotes/origin/$BRANCH
Not ideal but still saves quite a bit of time in my case
Not sure whether you are aware of the partial clone feature? It is still somewhat experimental, and I would be loathe to risk overloading our servers (_especially_ during COVID-19, when we're running at or over capacity, human-wise), but maybe it would be worth playing with it for a few moments (in coordination with @github/git-core and @github/git-systems, maybe)?
In git 2.26 partial clone does not play well with --depth, causing massive regression
In git 2.26 partial clone does not play well with --depth, causing massive regression
Right, I saw some discussion on the Git mailing list, but I wasn't able to monitor that (COVID-19 🌧️).
The thing is: typically the bulk of the payload consists of blobs. The trees and commits are usually pretty light-weight. In other words, a non-shallow partial clone (that is then populated via a sparse checkout) can be a lot faster than a shallow clone. At least that's what our friends over at Google reported internally.
Right, I saw some discussion on the Git mailing list
And I was the one who brought the topic
non-shallow partial clone (that is then populated via a sparse checkout) can be a lot faster than a shallow clone
I guess it may be true for some class of repositories, but definitely doesn't hold in my case.
In the context of actions/checkout I think it would be better to provide settings controlling checkout patterns and --filter, just like it allows controlling depth now
Right, I saw some discussion on the Git mailing list
And I was the one who brought the topic
Whoops, I did not realize that you were the one, sorry ;-)
This is what I did to get partial clone:
- name: Partial Clone
run: |
REPO="https://${GITHUB_ACTOR}:${{ secrets.GITHUB_TOKEN }}@github.com/${GITHUB_REPOSITORY}.git"
git clone --filter=blob:none --no-checkout --depth 1 --sparse $REPO .
git sparse-checkout init --cone
git sparse-checkout add "folder1" "folder2/folder3"
git checkout
Incase you want more power over what gets cloned, you might want to avoid the cone option
git config --global user.email [email protected] git config --global user.name github-actions
You don't really need to set the user.email or user.name, right?
git config --global user.email [email protected] git config --global user.name github-actionsYou don't really need to set the
user.emailoruser.name, right?
Yeah it's optional, only required if you want to commit and push the changes
Yeah it's optional, only required if you want to commit and push the changes
But shouldn't that be configured if/when a commit is to be created, rather than already during the checkout?
Yeah it's optional, only required if you want to commit and push the changes
But shouldn't that be configured if/when a commit is to be created, rather than already during the checkout?
Yep, you are right, I have updated the text, it was only relevant in my case, as I wanted to push the changes as well
Does anyone know how to checkout the PR in actions? This step only seems to work on push events and not during PR event
Does anyone know how to checkout the PR in actions?
I think you will have to use ${{github.head_ref}} (see https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions), probably guarding the step behind an if: github.event_name == 'pull_request' condition.
Does anyone know how to checkout the PR in actions?
I think you will have to use
${{github.head_ref}}(see https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions), probably guarding the step behind anif: github.event_name == 'pull_request'condition.
I was looking into those variables, what they were storing.
This is what they seems to store for this PR:
echo ${{ github.head_ref }} -> renovate/playwright-1.x
echo ${{ github.base_ref }} -> master
echo ${{ github.event.pull_request.head.sha }} -> 8795a56fa8a91017e212c0311c17b4e6df1df512
echo ${{ github.event.pull_request.head.ref }} -> renovate/playwright-1.x
I can checkout the PR using sha, but pushing the changes to the PR seems to be a problem.
I guess the right way of cloning the PR is by using PR ID number, as shown here, which none of the above context variables seems to store.
And yeah thanks for helping out
If you want to push to a PR, things get really awkward and sometimes impossible. To push successfully to a PR, you will have to know the URL of the originating repository (there might be a workflow variable to help you with that), but the contributor will _also_ need to have checked the checkbox "Allow edits by maintainers", and then I am _still_ uncertain that secrets.GITHUB_TOKEN would be enough to push there. And no, you cannot update the branch by pushing to refs/pull/<number>/head, that's prohibited.
If you want to push to a PR, things get really awkward and sometimes impossible. To push successfully to a PR, you will have to know the URL of the originating repository (there might be a workflow variable to help you with that), but the contributor will _also_ need to have checked the checkbox "Allow edits by maintainers", and then I am _still_ uncertain that
secrets.GITHUB_TOKENwould be enough to push there. And no, you cannot update the branch by pushing torefs/pull/<number>/head, that's prohibited.
I didn't really thought of "Allow edits by maintainers" checkbox, just tried out committing to this PR with checkbox unchecked and it give permission denied error at Github desktop
You are correct, this will get complicated, I think it's better to wait until sparse mode is added in checkout actions, and anyways I don't really need that right now, but that was a good thing to have as well.
Thanks
Any update here?
@mambax how about giving it a try yourself? Just
inputs: section in https://github.com/actions/checkout/blob/main/action.ymlIGitSourceSettings interface: https://github.com/actions/checkout/blob/main/src/git-source-settings.tsfetch() method according to the new parameter: https://github.com/actions/checkout/blob/25a956c84d5dd820d28caab9f86b8d183aeeff3d/src/git-command-manager.ts#L171-L197🤣 Can do but since the last update was 5 months ago I thought I ask 😭
@dscho Are you sure it is that straightforward? My first glance says that async fetch() uses git fetch. When I go and read the documentation of git-fetch I do not see a parameter (with my limited git knowledge) that would allow specifying a folder.
I although see https://git-scm.com/docs/git-sparse-checkout, which seems to support his case.
Are you sure it is that straightforward?
It should be pretty straight-forward.
My first glance says that
async fetch()uses git fetch. When I go and read the documentation of git-fetch I do not see a parameter (with my limited git knowledge) that would allow specifying a folder.
Right. I only pointed to the part where git fetch is executed, and obviously it would need to be called with those --filter=blob:none --no-checkout --depth 1 options mentioned here: https://github.com/actions/checkout/issues/172#issuecomment-689169138. (Even if git fetch's documentation does not talk about --filter, that command does support the option.)
What I forgot to say is that the git checkout part _also_ needs to be adjusted accordingly. That is, if input parameters for the sparse mode are provided, these commands still need to be run before git checkout (you can probably just add a sister function to checkout and call it just before checkout() is called:
git sparse-checkout init --cone
git sparse-checkout add "folder1" "folder2/folder3"
Feel free to point me to your code if you get stuck with this.
Most helpful comment
For now I'm using this
Not ideal but still saves quite a bit of time in my case