Hello,
Can you provide some info how to use checkout action with cache properly?
I wonder because frequent builds spend all the lfs download quota very fast (1GB per month). (Because it download everything from scratch every time, right?)
Also it would be great if there will be such option from the box!
Thank you in advance.
It looks like lfs info is under .git/lfs so you might be able to cache that directory
I was trying to do this just today but unfortunately this doesn't seem to work. I have a cache action for .git/lfs/objects (where all the data files are) and execute actions/checkout@v2 with lfs: true and clean: false and I get the following output
Syncing repository: owner/repo
Working directory is 'd:\a\repo\repo'
"C:\Program Files\Git\bin\git.exe" version
git version 2.25.1.windows.1
"C:\Program Files\Git\bin\git.exe" lfs version
git-lfs/2.10.0 (GitHub; windows amd64; go 1.12.7; git a526ba6b)
"C:\Program Files\Git\bin\git.exe" config --local --get remote.origin.url
##[error]fatal: --local can only be used inside a git repository
Deleting the contents of 'd:\a\repo\repo'
...
Looking at the code it seems like this wipes everything so the benefit of caching is lost.
Thanks @dabo248 for the solution.
However, it is quite verbose for something that should be (In my opinion) the default behavior.
Due the github's policy to bill for git lfs download bandwidth, git lfs is not usable with github actions in practice unless cached.
So it would make sense that the option lfs: true of actions/checkout caches the LFS data by default. Wouldn't it?
@ericsciple, can you re-open the issue?
@dabo248 Thanks for the solution, but I was curious to know if that key is even valid? From the documentation, the key cannot be a directory so your key is being interpreted as a string; the cache would not get invalidated upon new additions to the lfs directory. Correct me if I'm wrong?
but I was curious to know if that key is even valid
you're right @samesfahani-tuplehealth. Using .git/lfs for the key is not a good idea and will cause the cache to be useless as soon as the large files are changed.
But before calling git lfs pull, the files will be there as tiny text files containing a hash. And we can build a key based on that tiny text files.
Here's an example:
- name: Checkout repository
uses: actions/checkout@v2
- name: Cache git lfs
uses: actions/[email protected]
with:
path: .git/lfs
key: ${{ hashFiles('**/*.zip') }} # Adapt to target the type of the files committed with git lfs
- name: Pull lfs data, if not cached
run: git lfs pull
@jcornaz That's pretty much what I came to understand as well. However, even with that approach, if you have more than just zip files or you no longer want zip files to cache, whatever the use case, then you would need to update your CI file. How about this:
- name: Checkout code
uses: actions/checkout@v2
- name: Create LFS file list
run: git lfs ls-files -l | cut -d' ' -f1 | sort > .lfs-assets-id
- name: Restore LFS cache
uses: actions/cache@v2
id: lfs-cache
with:
path: .git/lfs
key: ${{ runner.os }}-lfs-${{ hashFiles('.lfs-assets-id') }}-v1
- name: Git LFS Pull
run: git lfs pull
Source: https://www.develer.com/en/avoiding-git-lfs-bandiwdth-waste-with-github-and-circleci/
The author's method was meant for CircleCI, but the same concept still stands; we create a file that has all the hashes tracked within LFS and we run hashFiles on that. Any time a file is added or removed from LFS, this file should get invalidated. I've also added a -v1 to the end in case you ever want to invalidate the cache manually, but you shouldn't need to.
@samesfahani-tuplehealth You're absolutely right, the static key is not useful. Setting a hash as the key is the way to go!
Most helpful comment
@jcornaz That's pretty much what I came to understand as well. However, even with that approach, if you have more than just zip files or you no longer want zip files to cache, whatever the use case, then you would need to update your CI file. How about this:
Source: https://www.develer.com/en/avoiding-git-lfs-bandiwdth-waste-with-github-and-circleci/
The author's method was meant for CircleCI, but the same concept still stands; we create a file that has all the hashes tracked within LFS and we run
hashFileson that. Any time a file is added or removed from LFS, this file should get invalidated. I've also added a-v1to the end in case you ever want to invalidate the cache manually, but you shouldn't need to.