Yarn: Can't leverage Docker cache when Yarn workspaces are used

Created on 19 Apr 2018 · 7Comments · Source: yarnpkg/yarn

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
We usually structure our Dockerfiles to take advantage of the Docker cache mechanism, so we first copy package.json and yarn.lock, then run yarn --pure-lockfile and then copy the rest of the package.

Doing so, the dependencies, that change less often than the package source code, can be cached by Docker, and we can often avoid to have Yarn run on each build.

With Yarn workspaces, you can't run yarn if you have just the package.json and the yarn.lock file, because it will complain about the missing workspace packages.

If the current behavior is a bug, please provide the steps to reproduce.

Have a Dockerfile like this:

FROM node:9-alpine
WORKDIR /app
COPY package.json .
COPY yarn.lock .
RUN yarn --pure-lockfile
COPY . .
RUN yarn test

What is the expected behavior?

There should be a way to tell Yarn to install all the dependencies and defer the packages linking to a later step once the whole source code is copied.
Doing so we can reduce the repetitive task to just the packages linking, and cache the installation/fetching of the dependencies.

triaged

Source

FezVrasta

👍49

Most helpful comment

This seems like it's still an unresolved challenge, yes?

duro on 1 Mar 2019

👍23

All 7 comments

I am currently struggling with caching in a multi-stage build. Since it's not possible to use COPY to match files with a glob pattern and preserve the directory layout, I've decided to manually copy each individual package.json from my workspace packages, do a yarn install and the copy everything into the final image.

langri-sha on 14 Jun 2018

😕1 👍1

Are you asking to install the dependencies listed in the yarn.lock rather than from the package.json? At least in our case, the root level package.json doesn't have any dependencies, the dependencies are listed in the other packages in the workspace.

Given a workspace containing appA, appB and moduleC, my Dockerfile looks like:

FROM node:8
WORKDIR /app
COPY package.json yarn.lock .
COPY appA/package.json appA/yarn.lock appA/
COPY moduleC/package.json moduleC/yarn.lock moduleC/
RUN yarn
COPY appA/ appA/
COPY moduleC/ moduleC/
RUN cd appA && yarn run build

If I understand your suggestion correctly, that could change to:

FROM node:8
WORKDIR /app
COPY package.json yarn.lock .
RUN yarn --install-from-yarn-lock
COPY appA/ appA/
COPY moduleC/ moduleC/
RUN cd appA && yarn install --link-only && yarn run build

But I'd get all of the dependencies for appB in my docker image.

Don't necessarily take this as criticism, though. If I didn't mind adding all the dependencies and code for appB into the image, your suggestion would make the Dockerfile much simpler and much less fragile. I like it, I'm just still struggling with how to correctly put things together myself. Because as you can imagine we have a lot more than just appA, appB & moduleC. :)

bryanlarsen on 12 Aug 2018

This seems like it's still an unresolved challenge, yes?

duro on 1 Mar 2019

👍23

It's not pretty, but I use a bash script docker-prepare.sh to copy all package.json files from all packages first, then run yarn in the root to install all dependencies, then I copy everything else.
There should be some official way to do this, even if it's just a - recommended and well-designed - script file to execute.

loopmode on 22 Jul 2019

👍1

+1 I'm thinking of adding a flag like

yarn --no-workspace

then

yarn --link-workspace

So that the prior command can be cached.
If this seems right, I'll raise a PR for the same.

abhijeet1403 on 24 Jul 2019

👍4

Any ideas/solutions on that?

Theoretically, there is yarn --focus that could help with this issue, but it requires the packages to be uploaded in the package register. So... this can't be the solution for private only packages inside a monorepo.

chrispanag on 17 Oct 2019

We were running into the same issue until we came across this. After implementing this suggestion, we saw a staggering 70% increase in the build time!
There is some issue with docker multistage builds and caches; hence it is necessary to cache individual stages and use --cache-from in subsequent builds.
Speeding up multistage builds in Docker