Hey team!
First, awesome job on this feature, it will immensely help our CI speed for our JavaScript projects, kudos!
I've been running on the "over the limit" error for a yarn project with workspaces enabled:
Post job cleanup.
/bin/tar -cz -f /home/runner/work/_temp/3c08f6f0-f11f-4d8f-bed5-d491e7d8d443/cache.tgz -C /home/runner/.cache/yarn .
##[warning]Cache size of 231440535 bytes is over the 200MB limit, not saving cache.
But when I run the same tar command locally, I get a 100.3 MB bundle. Is there anything I'm missing here?
Here's my workflow:
name: Test
on:
push:
branches:
- '**'
tags:
- '!**'
jobs:
test:
name: Test, lint, typecheck and build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Use Node.js 10.16.0
uses: actions/setup-node@v1
with:
node-version: 10.16.0
- name: Cache yarn node_modules
uses: actions/cache@v1
with:
path: ~/.cache/yarn
key: ${{ runner.OS }}-yarn-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.os }}-yarn-
- name: Install
run: yarn install --frozen-lockfile
# ...
Thanks a lot!
As a follow-up, if I use path: node_modules it does save the cache successfully:
Cache yarn node_modules1s
Cache not found for input keys: ["Linux-yarn-db9ae0b40b59cc89d84b533d906606822c72106dfc77455cf7eee198d6e55858","Linux-yarn-"].
Run actions/cache@v1
with:
path: node_modules
key: Linux-yarn-db9ae0b40b59cc89d84b533d906606822c72106dfc77455cf7eee198d6e55858
restore-keys: Linux-yarn-
Cache not found for input keys: ["Linux-yarn-db9ae0b40b59cc89d84b533d906606822c72106dfc77455cf7eee198d6e55858","Linux-yarn-"].
...
Post Cache yarn node_modules27s
Cache saved successfully
Post job cleanup.
/bin/tar -cz -f /home/runner/work/_temp/86ca5e8a-8003-4159-b3a0-36827075ec88/cache.tgz -C /home/runner/work/web-tools/web-tools/node_modules .
Cache saved successfully
And when I re-run the task, I get this, but yarn does not actually re-run from cache as it take same amount of time as with a clean yarn install:
Run actions/cache@v1
/bin/tar -xz -f /home/runner/work/_temp/ce74f1ec-ba00-4e1f-9508-f80715195305/cache.tgz -C /home/runner/work/web-tools/web-tools/node_modules
::set-output name=cache-hit,::true
Cache restored from key: Linux-yarn-db9ae0b40b59cc89d84b533d906606822c72106dfc77455cf7eee198d6e55858
Could that be related?
With the recent increase to 400MB now it does save the cache, but yarn still does not seem to pick up the cache from ~/.cache/yarn and still does a clean install. If I do yarn cache dir it outputs /home/runner/.cache/yarn/v6.
It's possible there's additional content in ~/.cache/yarn that's leading to the file size discrepancy between local and the runner. It could be something populated during actions/setup-node.
~/.cache/yarn may not work for all scenarios, and a better example could use yarn cache dir to determine the path and accessing that via an output.
Example:
- name: Get Yarn Cache Directory
id: yarn-cache
run: |
echo "::set-output name=dir::$(yarn cache dir)"
- uses: actions/cache@v1
with:
path: ${{ steps.yarn-cache.outputs.dir }}
key: ${{ runner.OS }}-yarn-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.os }}-yarn-
Trying it out right now, thanks a lot for the fast reply :+1:
So the cache folder is indeed different:
Run echo "::set-output name=dir::$(yarn cache dir)"
::set-output name=dir::/home/runner/.cache/yarn/v6
Then:
Run actions/cache@v1
/bin/tar -xz -f /home/runner/work/_temp/3b300616-8bbe-4e56-b10d-dcbc31bbca18/cache.tgz -C /home/runner/.cache/yarn/v6
::set-output name=cache-hit,::true
Cache restored from key: Linux-yarn-db9ae0b40b59cc89d84b533d906606822c72106dfc77455cf7eee198d6e55858
But install still takes 2.55min to run. Full thing:
- name: Use Node.js 10.16.0
uses: actions/setup-node@v1
with:
node-version: 10.16.0
- name: Get Yarn Cache Directory
id: yarn-cache
run: |
echo "::set-output name=dir::$(yarn cache dir)"
- uses: actions/cache@v1
with:
path: ${{ steps.yarn-cache.outputs.dir }}
key: ${{ runner.OS }}-yarn-${{ hashFiles('**/yarn.lock') }}
restore-keys: |
${{ runner.os }}-yarn-
- name: Install
run: yarn install --frozen-lockfile
But install still takes 2.55min to run
What install times are you seeing without the cache?
You could try inserting a view yarn cache list calls to verify what's in the cache before and after. See https://yarnpkg.com/en/docs/cli/cache#toc-yarn-cache-list-pattern
What install times are you seeing without the cache?
Same time. Let me try yarn cache list.
Hmmm interesting:
Run yarn cache list && yarn install --frozen-lockfile
yarn cache v1.19.1
error An unexpected error occurred: "ENOENT: no such file or directory, scandir '/home/runner/.cache/yarn/v6/@apollo/node_modules'".
Ok, so I renamed the cache key and now yarn cache list does show a bunch of packages. I did get 1min shaved off the install step too now (so install now takes around 2min instead of 3min), so I guess caching semi-works? What do you think? Should it be faster?
1 minute sounds reasonable, as yarn install might be doing other work besides just fetching packages.
Unfortunately, it all depends on specific ecosystems and use-cases. Some scenarios will have much larger savings when caching, while others won't see much benefit.
For comparison, how long does install take on your local machine?
Ignoring all differences in CPU speed and resources, that would be the ideal install time as it's a fully populated cache (assuming this isn't a fresh install).
So to replicate exactly the same scenario as with GitHub Actions, I鈥檇 need to remove my local node_modules and run yarn install 鈥攆rozen-lockfile and that should be very similar, correct?
In fact, would it be possible to pre-populate node_modules with the cache before doing the install step? With this it would be pretty much instantaneous. Or is there caveats with this approach?
By the way, we use workspaces so that might come into play as well.
Update:
yarn --frozen-lockfile after deleting node_modules: 157.49sNote that I'm currently tethering from my phone so network latency might come into play. Pretty sure in the end it would be pretty similar.
So my last question is: can we somehow keep or restore a cached node_modules folder instead of using the global cache? I know CircleCI somehow does this on the install step, could it be feasible with GitHub Actions?
You can definitely use a node_modules in your cache instead of caching the yarn cache directly, that's how we handle caching with npm (npm has a cache but it's not meant to be populated by the user).
I'd recommend trying it out and seeing which is faster
Using node_modules yields the same result as not using cache, which is weird. Probably missing something for yarn cache to happen correctly. Is it due to Docker maybe?
I even tried caching both node_modules and ~/.cache/yarn but no improvements either.
For now it seems only using the ~/.cache/yarn folder is the best approach. I was comparing with CircleCI but I think in their case they use Docker Layer Caching so node_modules are cached by default. Even in their docs they only mention caching ~/.cache/yarn: https://circleci.com/docs/2.0/caching/#yarn-node.
Caching node_modules is just weird (and would likely cause issues)... There's npm ci: https://docs.npmjs.com/cli/ci
@teohhanhui to be confirmed, we only use yarn so I can't tell. I'd say this is the subject for another issue.
In any case, yarn caching works to an extent here, so I'll close this issue. Thanks again for your help @joshmgross :+1: