Yarn: [Documentation]: Recommended way to handle yarn and docker

Created on 11 Oct 2016  ยท  24Comments  ยท  Source: yarnpkg/yarn

Goal of this issue is to update the readme file how to use yarn and docker.

I want a docker image to build my node projects - and use yarn to install the packages as the installation is much faster (and more deterministic) than using npm.

One reason why yarn is fast is of course the local yarn cache. So the docker image needs to mount the yarn cache directory when building the projects. Any other hints how to use docker and yarn?

needs-discussion triaged

Most helpful comment

You wouldn't mount the Yarn cache directory. Instead, you should make sure you take advantage of Docker's image layer caching.

These are the commands I am using:

COPY package.json yarn.lock ./
RUN yarn --pure-lockfile

All 24 comments

For actually installing Yarn in a Docker or Vagrant image, you can use the Debian package repo (assuming you're using a Ubuntu or Debian Docker image). Enabling the package repo then doing apt-get install yarn will also install Node.js as a dependency.

As for mounting the cache, that's a pretty good idea, I'm not too sure how to do it though (I'm not very familiar with Docker myself).

You wouldn't mount the Yarn cache directory. Instead, you should make sure you take advantage of Docker's image layer caching.

These are the commands I am using:

COPY package.json yarn.lock ./
RUN yarn --pure-lockfile

It would depend on the environment and approach you want to take to building your assets.

repeated docker build

@kyteague's approach is one you could take if you didn't want to use a global cache and instead just cache the project's dependencies in a higher docker layer. (ie: if your development is going to be running docker build over and over without changing dependencies). If you change the package.json, you lose the cache at the higher layer and have to do a full reinstall.

manual docker run

A more sophisticated approach for development is to run a development container (node:6 or similar with yarn installed) and mount the cache in to do the install. note that the following uses a .docker-yarn-cache intended to be used with docker because libs like node-sass can have c-lib issues if you install them on OSX and then try to use them on Debian, etc. Something like:

docker run -itv ~/.docker-yarn-cache:/root/yarn-cache -v `pwd`:/opt/project --workdir /opt/project node:6 yarn

Typically I combine the docker run approach with some bash and volume mount caches in. Then at the end I copy my assets out of the container and ship them to S3, etc or COPY them into my production docker image.

yarn on host

You could also yarn on a host if it's similar to your container OS (debian/debian for example) and then write your Dockerfile to COPY the node_modules folder in with the rest of the project. This would allow you to have access to you host's .yarn-cache for speed and then you don't have to deal with installing in the docker build.

docker-compose (development)

If your project has a "watch mode" script, you can use a docker-compose file to alleviate some of the concerns of the "repeated docker build" approach by running something along the lines of yarn && yarn run watch as the command with the same volume mounts as the "manual docker run" approach.


So really it depends on your goals and build environment ("watch" development with Docker for Mac/CI from a dev image to an alpine-based prod image/etc).

@kyteague and @ChristopherBiscardi, great comments! I think it would be valuable to add a page to the documentation around best practices for using Yarn in Docker. Would you like to write a page about "Using Yarn in Docker" for our documentation? The website is in a separate repo: https://github.com/yarnpkg/website

Ideally, docker could use an external cache but there's some resistance to that ( https://github.com/docker/docker/issues/17745 ), so adding a --no-cache option would be good so the docker image isn't made needlessly large by a cache that won't ever be used.

This is the fastest setup I've get so far as yarn-cache can be reused by many containers:

Dockerfile

FROM node:6.7.0

RUN curl -o- -L https://yarnpkg.com/install.sh | bash

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

ARG NODE_ENV
ENV NODE_ENV $NODE_ENV

COPY . /usr/src/app

EXPOSE 8000

ENTRYPOINT ["sh", "./entrypoint.sh"]

CMD ["node", "./server"]

entrypoint.sh

$HOME/.yarn/bin/yarn install --pure-lockfile
exec "$@"

docker-compose.yml

app:
  build: .
  volumes_from:
    - yarn-cache

yarn-cache:
  image: busybox
  volumes:
    - /root/.yarn-cache

@rstuven That will do the yarn install at run time rather than build time and means that the docker image is not self contained/fully reproducible (which is the reason why we and I believe many others use docker).

@daveisfera Yes, that's why I stressed on its "fastest" quality. I missed to point out this is rather for development workflow where fast iterations matter most. On the other hand, the yarn.lock file should guarantee the reproducible aspect, but yes, it's not enough.

Another approach: https://medium.com/@mfornasa/using-yarn-with-docker-c116ad289d56

Am I wrong to assume that Yarn's global cache stores the package zips (which contain cross-platform code by which I mean it will be compiled at install-time)?

If the global cache only contains the downloaded archives and no build artifacts, would we not be able to at least mount the host's global cache so that the Docker container wouldn't have to download them?

@kyteague --pure-lockfile will not generate yarn.lock which means if I have changed package.json and rebuild the image, the old yarn.lock will be copied into image and not sync with the modified package.json?

We use rocker which allows build-time mounting of a yarn-cache with a custom MOUNT command that doesn't commit the cache to the final Docker image, while using the smart Docker build layer caching as well.

Looks like there are plenty of ways now.
If anyone wants to submit a good way to do this, feel free to send a PR for the docs website https://github.com/yarnpkg/website

another way is to mitm yarn traffic with caching proxy and self-signed cert using cafile option
here's a crude example: https://github.com/komlevv/docker-squid-cache
it has 2 services: caching proxy and root certificate server

  • this works during build stage - you don't lose the cache when package.json changes
  • other services can also take advantage of the cache, not limited to yarn

For production

COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile --no-cache --production

Note: You don't want dev dependencies in a production image, also you need to make sure that Yarn's cache folder is not bundled into the image.

For test and CI

COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile

Note: In a test / CI environment you still want to install NPM modules via Docker builder in order to utilize Docker layer caching. The next time your image is being built on a CI server, these two steps will be skipped in favor of using an existing layer, unless either package.json or yarn.lock was changed.

For local development

COPY package.json yarn.lock ./

Note: In development mode (locally) it would be faster to install NPM modules at run-time, this way you can attach a volume with Yarn cache to your container.


The approach above can be implemented by using a single Dockerfile:

FROM node:8.9.1-alpine

ARG NODE_ENV=production
ENV NODE_ENV=$NODE_ENV

# Set a working directory
WORKDIR /usr/src/app

# Install native dependencies
# RUN set -ex; \
#   apk add --no-cache ...

# Install Node.js dependencies
COPY package.json yarn.lock ./
RUN set -ex; \
  if [ "$NODE_ENV" = "production" ]; then \
    yarn install --no-cache --frozen-lockfile --production; \
  elif [ "$NODE_ENV" = "test" ]; then \
    yarn install --no-cache --frozen-lockfile; \
  fi;

...

Note: It's better to install native dependencies, if any, via a separate RUN command coming before yarn install.

docker-compose.yml:
version: '3'

volumes:
  yarn:

services:
  api:
    image: api
    build:
      context: ./
      args:
        NODE_ENV: "development"
    volumes:
      - yarn:/home/node/.cache/yarn
      - ./src:/usr/src/app/src
      - ./package.json:/usr/src/app/package.json
      - ./yarn.lock:/usr/src/app/yarn.lock
    ...

Source Node.js API Starter Kit - Node.js โค GraphQL

You can also do it in a multi stage Dockerfile for production, like the following for something that runs as a static front end and doesn't need node/yarn at runtime:

FROM node:alpine
WORKDIR /usr/src/app
COPY . /usr/src/app/

# We don't need to do this cache clean, I guess it wastes time / saves space: https://github.com/yarnpkg/rfcs/pull/53
RUN set -ex; \
  yarn install --frozen-lockfile --production; \
  yarn cache clean; \
  yarn run build

FROM nginx:alpine
WORKDIR /usr/share/nginx/html
COPY --from=0 /usr/src/app/build/ /usr/share/nginx/html

Note: Maybe with --no-cache since that seems to be added now and then we can skip the cache clean.

Source

The only way I can find to not have an extra 100MB of cache is to do this on latest version of yarn (1.5.1).

RUN yarn install --frozen-lockfile --production && yarn cache clean

Just in case, there's no --no-cache, not yet. So yarn cache clean for now.

https://github.com/yarnpkg/rfcs/pull/53#issuecomment-399678507

from this comment I like the concise nature of using /dev/shm as a volatile storage of the cache

@jeremejevs to prevent the yarn cache from winding up in docker layers, we would need to have them together in a single RUN yarn install && yarn cache clean command in our Dockerfile, right?
I don't know for sure but I assume after a RUN yarn cache clean command on its own would just mark the cache dir as deleted in a new docker layer, but the earlier RUN yarn install layer would still contain the entire cache.

@jedwards1211 That is correct, yes.

A tad off topic but I'm doing RUN yarn install --frozen-lockfile --production --no-cache && yarn build and I get a big delay before it transitions to the next layer. If I appended && rm -rf ./node_modules to the command (as I have no need of them after build) would that reduce the delay/produce a leaner layer?

AFAICT yarn still doesn't have a --no-cache option

Yes, yarn cache clean is the only way to avoid putting the cache in your layer. There's also an experimental feature that allows you to mount a cache directory during build (but I haven't had much success with making it work effectively yet, and multi-stage builds have a lot of promise but are still difficult to use

Was this page helpful?
0 / 5 - 0 ratings