Checkout: Cache for LFS

Created on 27 Feb 2020  路  7Comments  路  Source: actions/checkout

Hello,

Can you provide some info how to use checkout action with cache properly?
I wonder because frequent builds spend all the lfs download quota very fast (1GB per month). (Because it download everything from scratch every time, right?)

Also it would be great if there will be such option from the box!
Thank you in advance.

question

Most helpful comment

@jcornaz That's pretty much what I came to understand as well. However, even with that approach, if you have more than just zip files or you no longer want zip files to cache, whatever the use case, then you would need to update your CI file. How about this:

- name: Checkout code
  uses: actions/checkout@v2

- name: Create LFS file list
  run: git lfs ls-files -l | cut -d' ' -f1 | sort > .lfs-assets-id

- name: Restore LFS cache
  uses: actions/cache@v2
  id: lfs-cache
  with:
    path: .git/lfs
    key: ${{ runner.os }}-lfs-${{ hashFiles('.lfs-assets-id') }}-v1

- name: Git LFS Pull
  run: git lfs pull

Source: https://www.develer.com/en/avoiding-git-lfs-bandiwdth-waste-with-github-and-circleci/

The author's method was meant for CircleCI, but the same concept still stands; we create a file that has all the hashes tracked within LFS and we run hashFiles on that. Any time a file is added or removed from LFS, this file should get invalidated. I've also added a -v1 to the end in case you ever want to invalidate the cache manually, but you shouldn't need to.

All 7 comments

It looks like lfs info is under .git/lfs so you might be able to cache that directory

I was trying to do this just today but unfortunately this doesn't seem to work. I have a cache action for .git/lfs/objects (where all the data files are) and execute actions/checkout@v2 with lfs: true and clean: false and I get the following output

Syncing repository: owner/repo
Working directory is 'd:\a\repo\repo'
"C:\Program Files\Git\bin\git.exe" version
git version 2.25.1.windows.1
"C:\Program Files\Git\bin\git.exe" lfs version
git-lfs/2.10.0 (GitHub; windows amd64; go 1.12.7; git a526ba6b)
"C:\Program Files\Git\bin\git.exe" config --local --get remote.origin.url
##[error]fatal: --local can only be used inside a git repository
Deleting the contents of 'd:\a\repo\repo'
...

Looking at the code it seems like this wipes everything so the benefit of caching is lost.

Thanks @dabo248 for the solution.

However, it is quite verbose for something that should be (In my opinion) the default behavior.

Due the github's policy to bill for git lfs download bandwidth, git lfs is not usable with github actions in practice unless cached.

So it would make sense that the option lfs: true of actions/checkout caches the LFS data by default. Wouldn't it?

@ericsciple, can you re-open the issue?

@dabo248 Thanks for the solution, but I was curious to know if that key is even valid? From the documentation, the key cannot be a directory so your key is being interpreted as a string; the cache would not get invalidated upon new additions to the lfs directory. Correct me if I'm wrong?

but I was curious to know if that key is even valid

you're right @samesfahani-tuplehealth. Using .git/lfs for the key is not a good idea and will cause the cache to be useless as soon as the large files are changed.

But before calling git lfs pull, the files will be there as tiny text files containing a hash. And we can build a key based on that tiny text files.

Here's an example:

- name: Checkout repository
  uses: actions/checkout@v2

- name: Cache git lfs
  uses: actions/[email protected]
  with:
    path: .git/lfs
    key: ${{ hashFiles('**/*.zip') }} # Adapt to target the type of the files committed with git lfs

- name: Pull lfs data, if not cached
  run: git lfs pull

@jcornaz That's pretty much what I came to understand as well. However, even with that approach, if you have more than just zip files or you no longer want zip files to cache, whatever the use case, then you would need to update your CI file. How about this:

- name: Checkout code
  uses: actions/checkout@v2

- name: Create LFS file list
  run: git lfs ls-files -l | cut -d' ' -f1 | sort > .lfs-assets-id

- name: Restore LFS cache
  uses: actions/cache@v2
  id: lfs-cache
  with:
    path: .git/lfs
    key: ${{ runner.os }}-lfs-${{ hashFiles('.lfs-assets-id') }}-v1

- name: Git LFS Pull
  run: git lfs pull

Source: https://www.develer.com/en/avoiding-git-lfs-bandiwdth-waste-with-github-and-circleci/

The author's method was meant for CircleCI, but the same concept still stands; we create a file that has all the hashes tracked within LFS and we run hashFiles on that. Any time a file is added or removed from LFS, this file should get invalidated. I've also added a -v1 to the end in case you ever want to invalidate the cache manually, but you shouldn't need to.

@samesfahani-tuplehealth You're absolutely right, the static key is not useful. Setting a hash as the key is the way to go!

Was this page helpful?
0 / 5 - 0 ratings