Cache: Overriding cache

Created on 10 Oct 2020  路  6Comments  路  Source: actions/cache

Hi,

I would like an option to override the currently existing cache.
This is especially useful for building docker images where the original cache might useless if the base image is updated.

Thanks

Most helpful comment

As a workaround I use key with randomized suffux and restore-key without it.

That way cache is restored using the restore-key which matches all previously saved caches and according to the spec we get the latest one: "If there are multiple partial matches for a restore key, the action returns the most recently created cache." - https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

redisson$ git diff c5d170f3bec0fb38152d0dd22634cd2a03e7fb70~1 c5d170f3bec0fb38152d0dd22634cd2a03e7fb70
diff --git a/.github/workflows/maven.yml b/.github/workflows/maven.yml
index 6472f5313..2e0f96427 100644
--- a/.github/workflows/maven.yml
+++ b/.github/workflows/maven.yml
@@ -13,13 +13,16 @@ jobs:
       uses: actions/setup-java@v1
       with:
         java-version: 1.8
+    - name: Generate random cache key to ensure the cach directories are saved after build
+      id: random-cache-key
+      run: head /dev/random -c 32 > random-cache-key
     - name: Setup caching directories for local Maven repo and for DB of successfuly built hashversioned modules
       uses: actions/cache@v2
       with:
         path: |
           ~/.m2/repository
           ~/successful-hashvers
-        key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
+        key: ${{ runner.os }}-maven-${{ hashFiles('random-cache-key') }}
         restore-keys: |
           ${{ runner.os }}-maven-
     - name: Determine affected Maven projects

https://github.com/avodonosov/redisson/commit/c5d170f3bec0fb38152d0dd22634cd2a03e7fb70

So, every build restores the latest cache and repackages it as a new cache entry. It can probably result in redundant storage use, unless the implementation inherit old data intelligently, al-la OverlayFS. On the other hand, old caches will be garbage collected by github in 7 days, so maybe not an issue at all. Also, the latest cache in this arrangement only grows, never shrinks. It would be good if the action provided a way to purge files based on their last access time.

If this is how updating the cache is intended to be done, it would make sense to add this to documentation.

If cache updating it not intended at all, then this is big miss and the action does not deserve to be called cache.

Updatable caches can be very useful for build tools like Gradle and Bazel that have build cache functionality that allows to skip tasks whose inputs hasn't changed. Some people reported 10 times build speedup: https://redfin.engineering/we-switched-from-maven-to-bazel-and-builds-got-10x-faster-b265a7845854

See also: https://about.gitlab.com/blog/2020/09/01/using-bazel-to-speed-up-gitlab-ci-builds/

I am exploring something similar for Maven: https://github.com/avodonosov/hashver-maven-plugin

If examples in the github action docs included caching for Gradle, Bazel and alike, I suppose github could save a significant part of the compute resources spent on workflows created after the examples.

All 6 comments

This is crazy

Why's that?

I desperately need this as well. CI is failing and there seems to be no way to cache bust

Cache hit occurred on the primary key Linux-maven-2d15385c799c9a1e31be36551973654b2bb55900a19be80d07dcf3e716f5daa9, not saving cache.

Not saving?

That's why my build incrementality does not work. I hoped I can accumulate files in the cache directory with every build, that would allow me not to rebuild unaffected modules.

As a workaround I use key with randomized suffux and restore-key without it.

That way cache is restored using the restore-key which matches all previously saved caches and according to the spec we get the latest one: "If there are multiple partial matches for a restore key, the action returns the most recently created cache." - https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

redisson$ git diff c5d170f3bec0fb38152d0dd22634cd2a03e7fb70~1 c5d170f3bec0fb38152d0dd22634cd2a03e7fb70
diff --git a/.github/workflows/maven.yml b/.github/workflows/maven.yml
index 6472f5313..2e0f96427 100644
--- a/.github/workflows/maven.yml
+++ b/.github/workflows/maven.yml
@@ -13,13 +13,16 @@ jobs:
       uses: actions/setup-java@v1
       with:
         java-version: 1.8
+    - name: Generate random cache key to ensure the cach directories are saved after build
+      id: random-cache-key
+      run: head /dev/random -c 32 > random-cache-key
     - name: Setup caching directories for local Maven repo and for DB of successfuly built hashversioned modules
       uses: actions/cache@v2
       with:
         path: |
           ~/.m2/repository
           ~/successful-hashvers
-        key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
+        key: ${{ runner.os }}-maven-${{ hashFiles('random-cache-key') }}
         restore-keys: |
           ${{ runner.os }}-maven-
     - name: Determine affected Maven projects

https://github.com/avodonosov/redisson/commit/c5d170f3bec0fb38152d0dd22634cd2a03e7fb70

So, every build restores the latest cache and repackages it as a new cache entry. It can probably result in redundant storage use, unless the implementation inherit old data intelligently, al-la OverlayFS. On the other hand, old caches will be garbage collected by github in 7 days, so maybe not an issue at all. Also, the latest cache in this arrangement only grows, never shrinks. It would be good if the action provided a way to purge files based on their last access time.

If this is how updating the cache is intended to be done, it would make sense to add this to documentation.

If cache updating it not intended at all, then this is big miss and the action does not deserve to be called cache.

Updatable caches can be very useful for build tools like Gradle and Bazel that have build cache functionality that allows to skip tasks whose inputs hasn't changed. Some people reported 10 times build speedup: https://redfin.engineering/we-switched-from-maven-to-bazel-and-builds-got-10x-faster-b265a7845854

See also: https://about.gitlab.com/blog/2020/09/01/using-bazel-to-speed-up-gitlab-ci-builds/

I am exploring something similar for Maven: https://github.com/avodonosov/hashver-maven-plugin

If examples in the github action docs included caching for Gradle, Bazel and alike, I suppose github could save a significant part of the compute resources spent on workflows created after the examples.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

binkley picture binkley  路  4Comments

Cerberus picture Cerberus  路  5Comments

jcornaz picture jcornaz  路  4Comments

Lyeeedar picture Lyeeedar  路  5Comments

wrightak picture wrightak  路  4Comments