Moby: Multi stage build leaves "<none>" images behind

Created on 18 Jul 2017  路  24Comments  路  Source: moby/moby

Description

When performing multi stage builds, last image is properly created, but also intermediate images are left behind as images.

We have project with JAVA Maven module (image) and standard PHP+JS website as separate image. Both of these are generating images when in multistage build

Reproduce with JAVA/Maven

Have JAVA/Maven application and do the same steps as described in multi stage build blog post:

This is dockerfile - 1 build step and 1 final image

FROM maven:latest AS buildstep
WORKDIR /usr/src/rcla-backend
COPY pom.xml .
RUN mvn -B -f pom.xml dependency:resolve
COPY . .
RUN mvn -B package -DskipTests

FROM java:8-jre-alpine
WORKDIR /rcla-backend
COPY --from=buildstep /usr/src/rcla-backend/target/*.jar rcla-backend.jar
ENTRYPOINT ["java", "-jar", "/rcla-backend/rcla-backend.jar"]
CMD ["--spring.profiles.active=dev"]

Execute command: docker build -t testjava .

Result from JAVA/Maven

After build is complete you will see that there is final image testjava but also

image

Reproduce with simple website image

In the example below, we have source code of the website that has:

  • PHP files are in root
  • JS files are in subfolder ./js
  • PNG files are in subfolder ./img

Dockerfile that is created is just a "mockup" of multistage build -> it has "useless" 3 build steps and final step that is actually meaningful

Dockerfile

FROM eboraas/apache-php AS buildstep1
RUN apt-get update && apt-get -y install php5-curl
ADD ./js/*.js /var/www/html/

FROM eboraas/apache-php AS buildstep2
RUN apt-get update && apt-get -y install php5-curl
ADD *.php /var/www/html/

FROM eboraas/apache-php AS buildstep3
RUN apt-get update && apt-get -y install php5-curl
ADD ./img/*.png /var/www/html/

FROM eboraas/apache-php
RUN apt-get update && apt-get -y install php5-curl
ADD * /var/www/html/

Result of website image

Result of this is one final image and 3 images

image

Expected result:

Only final image is created

Additional information you deem important (e.g. issue happens only occasionally):

After docker system prune those images are gone (as expected)

Output of docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:20:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:21:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 15
Server Version: 17.06.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.702GiB
Name: NLLD1-Luka.infodation.local
ID: LFGK:MH6I:6KWR:I3CF:XRIA:K6IC:3ANM:T6G7:EUJM:6S5T:XR2L:B3YA
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 10.4.1.218:5000
 10.4.1.188:8082
 10.4.1.188:8083
 127.0.0.0/8
Live Restore Enabled: false

WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior.
         Reformat the filesystem with ftype=1 to enable d_type support.
         Running without d_type support will not be supported in future releases.

Additional environment details (AWS, VirtualBox, physical, etc.):

VMWare virtual machine (internal environment) running CentOS 7

arebuilder versio17.06

Most helpful comment

This is not a bug; when using multi-stage builds, each stage produces an image. That image is stored in the local image cache, and will be used on subsequent builds (as part of the caching mechanism). You can _run_ each build-stage (and/or tag the stage, if desired).

In addition, if you want to _tag_ an intermediate stage as final image, set the --target option when running docker build (see Specifying target build stage (鈥搕arget))

If these intermediate images would be purged/pruned automatically, the build cache would be gone with each build, therefore forcing you to rebuild the entire image each time.

I'm closing this issue because this is not a bug, but feel free to continue the conversation or if more information is needed.

All 24 comments

This is not a bug; when using multi-stage builds, each stage produces an image. That image is stored in the local image cache, and will be used on subsequent builds (as part of the caching mechanism). You can _run_ each build-stage (and/or tag the stage, if desired).

In addition, if you want to _tag_ an intermediate stage as final image, set the --target option when running docker build (see Specifying target build stage (鈥搕arget))

If these intermediate images would be purged/pruned automatically, the build cache would be gone with each build, therefore forcing you to rebuild the entire image each time.

I'm closing this issue because this is not a bug, but feel free to continue the conversation or if more information is needed.

Do you think its a good idea that give a switch to us that we can decide whether leave these intermediate images?
In some builders which run multi stage builds only, these intermediate images are not used to run but have to discard for some security reasons.

@thaJeztah

@thaJeztah
thanks for the reply ... as @wrfly says, that would be an interesting feature

@lukastosic @wrfly
I agree that this behavior should be controlled by the user. For now, I'm using docker image prune -f after my docker build -t app . command to cleanup those intermediate images.

So there is no way to use prune and do not delete those intermediate images after a docker build of a Multi-Stage Dockerfile?

@albertom95 can you elaborate? Do you mean: run docker image prune afterwards, and have it remove all untagged, unused images _unless_ those were the result of a multi-stage build?

@thaJeztah Yes, that prune:

remove all untagged, unused images unless those were the result of a multi-stage build

Cause I have a script that runs every 1h or so, and that script doesdocker image prune but this will remove the images that were needed for create the multi-stage build, except the final image that has a name:tag

I'd love to see the switch added to auto-clean the intermediate build artifacts. We've got a bunch of automated builds that occur (always from source for unimportant reasons). We use multi-stage to build from source and then create a much smaller runtime image. The build image is huge in comparison to the runtime. So my top level build scripts always run this command after my docker build commands:

docker rmi $(docker images -q -f dangling=true)

Cheers!

@satchm0h @thaJeztah @albertorm95

After having several node (Angular) projects I fully appreciate keeping intermediate images :)

For example we do multi-stage build for angular like this:

# Just do NPM INSTALL
FROM node:6-alpine AS dependencies
WORKDIR /code
COPY package.json .
RUN npm install 

# Compile Angular
FROM dependencies AS build
COPY . .
RUN yarn webpack:prod

# Prepare final image
FROM nginx:alpine AS release
COPY --from=build /code/target/www /usr/share/nginx/html
COPY ./nginx/site.conf /etc/nginx/conf.d/default.conf

Resulting final nginx image with website is ~38MB, but intermediate image is ~800MB

Now the intermediate image "saves the day" because usually for long stretches (2-3 sprints) we don't update package.json so all node modules are the same. Here intermediate image kicks int and it doesn't repeat npm install every time. (Even when we use Nexus as repository manager, still it takes long time to finish it)

It will repeat this intermediate image only when we change package.json but as said, it happens usually after 2-3 sprints, many builds in that time finish quicker.

So, very useful indeed, but very annoying because we can't control when we want it and when we don't. Also after several sprints we will have several "old" intermediate images that will never be used again, and then we either remove them manually, or we do docker system prune -f (or docker image prune, we are fine with system prune because we don't use those docker hosts for anything else except for builds)

@thaJeztah, thanks for hinting --target + --tag combination.
It's useful workaround but it makes user to run docker build several times with different value of mentioned parameters to give a name to every stage image.
Subsequent runs are fast due to layer caching but this is suboptimal scenario, though.

I'm thinking about new docker build key that enables automatic image naming based on aliases provided with FROM .. AS .. instruction.
Would you mind?

Hi.

The problem with having intermediate build images as dangling images is that whenever someone runs e.g. docker system prune to clean up dangling images (our CI runs that command every 1h) the cached layers are gone.
And because cached layers are gone the next build will take more time.
Also seeing multiple dangling images coming seemingly out of nowhere is very strange, there is no hint from where those images come from.

If would be nice to be able to tell docker to either REMOVE the intermediate images or TAG them.
As @eugene-bright suggested the tag names for the intermediate images could come from the aliases used in the Dockerfile.
IMHO the default behaviour could be to tag all intermediate images.
Additionally a flag could be passed to docker to remove intermediate images automatically after the build is done OR to make them dangling images.
Having dangling images can still be useful (for saving build time) in specific cases e.g. on CI servers where someone wants to have caching for limited time until a cleaning job comes and removes dangling images.

Let's say we have a following Dockerfile:

FROM docker.io/node:8.12.0 AS build
WORKDIR /usr/local/src
COPY package.json .
COPY yarn.lock .
RUN yarn install --production --frozen-lockfile
COPY . .

FROM docker.io/node:8.12.0-alpine
WORKDIR /usr/local/src
COPY --from=build /usr/local/src .
CMD node src/app.js

We could use different flags provided to docker with following results:

  1. By default: tag the intermediate image with an alias from Dockerfile (in this case the build alias):
$ docker build --tag user/app:1.0.0 .
Successfully built d069a7d27039 (intermediate)
Successfully tagged user/app:build
Successfully built 5e5c3d0246ae
Successfully tagged user/app:1.0.0

$ docker images
REPOSITORY  TAG    IMAGE ID     CREATED        SIZE
user/app    1.0.0  5e5c3d0246ae 8 seconds ago  70MB
user/app    build  d069a7d27039 10 seconds ago 725MB
  1. Provide a --intermediate-images=dangling flag to leave intermediate images as dangling images:
$ docker build --tag user/app:1.0.0 --intermediate-images=dangling .
Successfully built d069a7d27039 (intermediate)
Successfully built 5e5c3d0246ae
Successfully tagged user/app:1.0.0
$ docker images
REPOSITORY  TAG    IMAGE ID     CREATED        SIZE
user/app    1.0.0  5e5c3d0246ae 8 seconds ago  70MB
<none>     <none>  d069a7d27039 10 seconds ago 725MB
  1. Provide a --intermediate-images=remove flag to remove intermediate images:
$ docker build --tag user/app:1.0.0 --intermediate-images=dangling .
$ docker images
REPOSITORY  TAG    IMAGE ID     CREATED        SIZE
user/app    1.0.0  5e5c3d0246ae 8 seconds ago  70MB

When using the new buildkit-based builder (currently opt-in, but can be either enabled as a daemon configuration, or using DOCKER_BUILDKIT=1 environment variable), the builder no longer uses images for the build-cache, which means that intermediate stages no longer show up as un-tagged images.

Buildkit also features (configurable) garbage collecting for the build-cache, which allows for a much more flexible handling of the cache (see https://github.com/moby/moby/pull/37846).

I don't think intermediate stages should automatically be tagged, as in many cases those intermediate stages are not used as actual images, however, having a way to build _multiple_ targets in a single build may be something to think about. Something like

docker build --target stage1=image:tag1, stage3=image:tag3 ...

but perhaps alternative syntaxes could be thought of, also in light of buildkit supporting different output-formats than just docker images

The following workaround can help removing just temporary stages:

  1. Add something like LABEL autodelete="true" to stages to delete
  2. After the build execute docker rmi $(docker images -q -f "dangling=true" -f "label=autodelete=true")

e.g.

FROM mcr.microsoft.com/dotnet/core/sdk:2.2 AS build
LABEL autodelete="true"
...

FROM mcr.microsoft.com/dotnet/core/aspnet:2.2-alpine3.9 AS runtime
LABEL description="My app"
COPY --from=build ...
...

and

list=$(docker images -q -f "dangling=true" -f "label=autodelete=true")
if [ -n "$list" ]; then
     docker rmi $list
fi

@dluc Thanks for the workaround. It is very disappointing that there is not something builtin to the build command. Multistage builds are great and help with quite a few use cases. One of those use cases is dealing with security where files need to be omitted from the final image. Another is the Angular case discussed (above) where intermediates need to stay around. Both of these cases seem quite valid, and I happen to use both. I don't understand the reluctance to add CLI that lets the user control the lifecycle of the intermediate image. From stackoverflow and several other threads, its obvious that people are having to create workaround like @dluc posted. Why not solve the problem properly?

What annoys me is docker images not showing from where those images are coming from. If I use something like AS builder I would expect them to be tagged with builder#1 or something similar. Also if I make a change to the layer, the old image stays, the new one is generated, so in the end I have tons of dangling images which I don't even know from which project they come from.

This won't be fixed for the classic builder; I'd recommend trying the next generation builder (BuildKit), which is still opt-in (tracking issue for making it the default is https://github.com/moby/moby/issues/40379)

The easiest way to enable buildkit is to set the DOCKER_BUILDKIT=1 environment variable in the shell where you run your build https://github.com/moby/moby/issues/34151#issuecomment-430695846

BuildKit build-cache that's separate from the image store (and can be cleaned up separately using docker builder prune)

This is not a bug; when using multi-stage builds, each stage produces an image. That image is stored in the local image cache, and will be used on subsequent builds (as part of the caching mechanism). You can _run_ each build-stage (and/or tag the stage, if desired).

In addition, if you want to _tag_ an intermediate stage as final image, set the --target option when running docker build (see Specifying target build stage (鈥搕arget))

If these intermediate images would be purged/pruned automatically, the build cache would be gone with each build, therefore forcing you to rebuild the entire image each time.

I'm closing this issue because this is not a bug, but feel free to continue the conversation or if more information is needed.

However, there is a new question. If I change the process of building intermediate images or some dependent files had changed, many new "none" images will be generated at this time. These redundant none images need to be cleaned up.
Maybe could have a command flags to decide whether to delete the intermediate images?

docker system prune or docker image prune will remove those images. Or enable buildkit https://github.com/moby/moby/issues/34151#issuecomment-663120902, then those images are not created (at least not as part of docker build)

the problem I'm having is every time I run docker build after changing a source file, it generates a <none> image, but there's no way, that I'm aware, of only deleting the old images (and keeping the latest one for caching purposes).

am I missing something?

edit: what I want is being able to delete all old and unused intermediate images, from previous builds but keeping the most recent one for caching.

@HeCorr no, there's no "one step" solution to only keeping the "last" intermediate steps when using the classic builder

if possible, I would recommend building with buildlkit enabled (DOCKER_BUILDKIT=1), which uses a separate store for the build-cache, and would allow you to cleanup the build-cache, but preserve (e.g.) cache for the last XX hours (docker builder prune --filter until=24h); I _think_ it also defaults to preserving "active" build-cache (so only removing build-cache for older builds)

@thaJeztah okay, I'll give that a try. thanks ;)

I actually came up with a dirty trick that works sometimes, which is putting LABEL package=pkgname on both FROM statements, run docker build (...), then docker rmi $(docker images --filter label=package=pkgname -q | sed 1,2d) (removes all images except the two newest ones).

@thaJeztah DOCKER_BUILDKIT=1 seems to solve my problem. thank you very much :)

actually.... upon further testing I concluded that buildkit presents the same space inefficiency..

here I used ncdu to analyze usage on /var/lib/docker:

all pruned ----- 2.9 GiB
first build ---- 8.0 GiB
second build --- 9.7 GiB
third build ---- 11.4 GiB
image prune ---- 11.4 GiB

@HeCorr did you run docker builder prune to cleanup the build-cache? Note that automatic garbage collection is also possible (but documentation is still missing currently; see https://github.com/docker/cli/issues/2325)

Was this page helpful?
0 / 5 - 0 ratings