Cache: Same cache key not shared between different pull requests

Created on 10 Nov 2019  路  14Comments  路  Source: actions/cache

Hi, I'm not sure if this is a feature or a bug, but when we use the same cache key for different PRs (sequentially without a race condition), the second PR doesn't find the cache even though the keys are the same and the first PR completes the workflow successfully (including cache upload).

I looked through the actions/cache code and I think it may be related to the scope attribute of the ArtifactCacheEntry object. I weren't able to hack around the code to ignore this attribute, so I'm not 100% sure. The attribute does contain values that would point to the caches being scoped to different PRs, though (refs/pull/1660/merge and refs/pull/1661/merge on the two test pull requests I tried).

This is the setup we use:

      - uses: actions/cache@v1
        id: cache
        with:
          path: .github/vendor/bundle
          key: github-gems-${{ hashFiles('**/.github/Gemfile.lock') }}
          restore-keys: |
            github-gems-

The bundle is installing gems to the correct folder, since the cache is successfully fetched starting from the second commit on each pull request.

Let me know if I can provide more info. Thanks!

documentation

Most helpful comment

@chrispat

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo

My assumption (before finding this thread) was that caches would be shared between branches in a single repository. I kept wondering why my merge commits were not using cache even though the cache key was identical to the parent commit's build.

I would definitely love to see caches be shared between branches. I can understand wanting to avoid fork PRs from "poisoning the cache", but I think it should be safe for all branches in a given repository to use a shared cache.

EDIT: A compromise might be that when a feature branch is merged into the default branch it's cache could be merged into the default branch's cache, and then the least recently used of this cache can be purged to bring it down under the limit. Worst case this is no worse than now, but best case we can use the same cache as the final commit in the feature branch.

All 14 comments

Yes, it sounds like this behavior is caused by the scope. The PR refs/pull/1660/merge has read-write permissions to caches in the scope refs/pull/1660/merge and read-only permissions on the default branch scope (typically master). PR refs/pull/1661/merge does not have permissions to read the cache from refs/pull/1660/merge and vice versa. Essentially, you would need some action to trigger after merging the change into master which saves the cache. Once saved to the default (master) branch scope, it will be readable by PRs.

Interesting! I鈥檓 glad there鈥檚 a way for it to work in its current state, but I feel like sharing (writable) caches between PRs would also be nice. Are you thinking about expanding this functionality, e.g. by making the scope a parameter of the cache action?

I want to confirm that adding the cache persist to master push indeed gives pull request access to the same cache. This is the workflow I used (filename cache_persist.yml):

name: Cache persist

on: 
  push:
    branches:
      - master

env:
  BUNDLE_GEMFILE: .github/Gemfile

jobs:
  cache_persist:

    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v1

      - uses: actions/setup-ruby@v1
        with:
          ruby-version: '2.6'

      - uses: actions/cache@v1
        with:
          path: .github/vendor/bundle
          key: github-gems-${{ hashFiles('**/.github/Gemfile.lock') }}
          restore-keys: |
            github-gems-

      - name: bundle install
        run: |
          gem install bundler --no-document
          bundle check --path=vendor/bundle || bundle install --jobs=4 --retry=3 --path=vendor/bundle

Some good next steps for this issue:

  • acknowledging that this is the expected behavior by updating README.md or another document describing the scope functionality and the possibility to add read-only cache by using the master push, (and closing this issue), or
  • making the cache scope mechanism publicly modifiable with appropriate description and API, (and closing this issue), or
  • (the most low-effort one) using this issue as the reference for this functionality, and closing it with such confirmation

@dhadka I鈥檓 down for adding a PR to a documentation somewhere describing what鈥檚 up with the scoping, if you approve.

There is some documentation on scopes in the help docs - https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows#cache-scope

The scope was designed this way to prevent poisoning the cache of other branches. For example, without scoping, someone could create a PR and inject malicious libraries into the cache, which are subsequently read by other branches.

@chrispat I think we can help make the documentation clearer on how the scope works, particularly with respect to the benefit of creating the cache in the default branch so it can be shared among PRs.

Actually, the linked docs is slightly wrong in saying "You can restore a cache created in any branch of a repository, including base branches of forked repositories."

I've created an internal issue to update the help docs.

@gyfis does the cache persist workflow that you posted copy the cache from a feature branch to master?

@YashMathur I wouldn鈥檛 say copy, really -
the push if: master workflow is simply using the same cache as the pull request ones from feature branches, and in so making it available for newer pull request workflows.

@gyfis If I understand this correctly, you're running the bundle install every time you merge a PR to master so the cache lives in master's scope. It can then be accessed by other branches.

I don't think the Cache persist workflow will be able to restore any cache that is created by pull requests. Is that correct?

@YashMathur Yes, that鈥檚 exactly right.

This may be obvious but if the cache doesn鈥檛 change often, all workflows are fast (PR workflows read the master cache, and the master push workflows also read the master cache). If we don鈥檛 count parallelism, a new cache key in this setup is build exactly two times - once in a new feature branch that changes it, and once in a master push branch after merge - and then it鈥檚 saved and accessible everywhere.

(Trying to paraphrase the docs here to see if I鈥檓 understanding them correctly): Master branch workflow never reads any cache from feature branches, and there鈥檚 no cache passing between feature branches, either - that way no branch can pollute cache, unless merged into master.

WHOAH -- I will just add on that I found this _very_ confusing and have been spending a lot of hours trying to figure out why my cache hasn't been working as I'd expected. I definitely think the docs could be clearer that the cache is not accessible across branches by default, and adding an example of the job shown here as a way to achieve cache-sharing would be super helpful.

Thanks to those in the thread for laying this out so clearly -- this was driving me bananas coming from gitlab where cross-branch caching is the default. Since it takes about an hour to build the cache for my project, this is going to be a huge win for us!

Here is the excerpt from the documentation

A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually master). For example, a cache created on the default branch master would be accessible from any pull request. Also, if the branch feature-b has the base branch feature-a, a workflow triggered on feature-b would have access to caches created in the default branch (master), feature-a, and feature-b.

Access restrictions provide cache isolation and security by creating a logical boundary between different workflows and branches. For example, a cache created for the branch feature-a (with the base master) would not be accessible to a pull request for the branch feature-b (with the base master).

I think it pretty clearly lays out the access restrictions but I would be happy to take feedback on how to improve it.

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo, however, I don't think I would ever allow that for PRs coming from forks of public repos.

@chrispat

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo

My assumption (before finding this thread) was that caches would be shared between branches in a single repository. I kept wondering why my merge commits were not using cache even though the cache key was identical to the parent commit's build.

I would definitely love to see caches be shared between branches. I can understand wanting to avoid fork PRs from "poisoning the cache", but I think it should be safe for all branches in a given repository to use a shared cache.

EDIT: A compromise might be that when a feature branch is merged into the default branch it's cache could be merged into the default branch's cache, and then the least recently used of this cache can be purged to bring it down under the limit. Worst case this is no worse than now, but best case we can use the same cache as the final commit in the feature branch.

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo, however, I don't think I would ever allow that for PRs coming from forks of public repos.

@chrispat This would be great for us. I would prefer to not have to trick the flow by checking into master after each test run to get this kind of functionality. I would imagine forks are an edge case that few people care about in practice.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gladhorn picture gladhorn  路  4Comments

s-weigand picture s-weigand  路  5Comments

aleks-ivanov picture aleks-ivanov  路  5Comments

evandrocoan picture evandrocoan  路  3Comments

Lyeeedar picture Lyeeedar  路  5Comments