moby 🚀 - build time only -v option

I'm looking for a similar solution.

Problem

Recently the enterprise I work enabled Zscaler proxy with SSL inspection, which implies having certificates installed and some environment variables set during build.

A temporarily solution was to create a new Dockerfile with certificates and environment variables set. But that doesn't seem reasonable, in a long term view.

So, my first thought was set a transparent proxy with HTTP and HTTPS, but again I need to pass a certificate during build.

The ideal scenario is with the same Dockerfile, I would be able to build my image on my laptop, at home, and enterprise.

Possible solution

# Enterprise
$ docker build -v /etc/ssl:/etc/ssl -t myimage .

# Home
$ docker build -t myimage .

tpires on 25 Jun 2015

👍13

I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.

I.e:

docker build -v /home/dokku/cache/dnx/packages:/opt/dnx/packages -t "dokku/aspnettest" .

yngndrw on 8 Jul 2015

👍96

@yngndrw what you propose would be OK for me too, i.e, we need to mount extra resources at build time that would not be necessary at run time as they have been installed in the container.

FWIW I saw somewhere in these pages somebody saying something along the line of (and I hope I'm paraphrasing it right) "resolve your compilation issue on a similar host machine then just install the deployable artifact or exe in the container".
I'm afraid it's not that simple guys. At times, I need to install in /usr/bin but I also need to edit some config file. I check for the OS I'm running on, the kernel params I need to tune, the files I need to create depending on variables or manifesto build files. There are many dependencies that are just not satisfied with a simple copy of a compiled product.

I re-state what I said when I open the issue: there is a difference between a manifest declaration file and its process and the run-time of an artifact.
If we truly believe in infrastructure-as-code and furthermore in immutable infrastructure, that Docker itself is promoting further & I like it btw, then this needs to be seriously considered IMO (see the bloating in post 1 herewith)

Thank you again

zrml on 16 Jul 2015

Another use case that is really interesting is upgrading software. There are times, like with FreeIPA, you should really test with a copy of the production data to makes sure that all of the different components can cleanly upgrade. You still want to do the upgrade in a "build" environment. You want the production copy of the data to live somewhere else so that when you move the new upgraded versions of the containers into production, they can mound the exact data that you did the upgrade on.

Another example, would be Satellite/Spacewalk which changes schema often and even changed databases from Oracle to Postgresql at version 5.6 (IIRC).

There are many, many scenarios when I temporarily need access to data while doing an upgrade of software in a containerized build, especially with distributed/micro services....

fatherlinux on 17 Aug 2015

Essentially, I am now forced to do a manual upgrade by running a regular container with a -v bind mount, then doing a "docker commit." I cannot understand why the same capability wouldn't be available with an automated Dockerfile build?

fatherlinux on 17 Aug 2015

👍7

Seconding @yngndrw pointing out caching: the exact same reasoning applies to many popular projects such as Maven, npm, apt, rpm -- allowing a shared cache can dramatically speed up builds, but must not make it into the final image.

stevenschlansker on 20 Aug 2015

👍61

I agree with @stevenschlansker. It can be many requirements for attach cache volume, or some kind of few data gigabytes, which must present (in parsed state) on final image, but not as raw data.

NikonNLG on 20 Aug 2015

I've also been bitten by the consistent resistance to extending docker build to support the volumes that can be used by docker run. I have not found the 'host-independent builds' mantra to be very convincing, as it only seems to make developing and iterating on Docker images more difficult and time-consuming when you need to re-download the entire package repository every time you rebuild an image.

My initial use case was a desire to cache OS package repositories to speed up development iteration. A workaround I've been using with some success is similar to the approach suggested by @fatherlinux, which is to just give up wrestling with docker build and the Dockerfile altogether, and start from scratch using docker run on a standard shell script followed by docker commit.

As a bit of an experiment, I extended my technique into a full-fledged replacement for docker build using a bit of POSIX shell scripting: dockerize.

If anyone wants to test out this script or the general approach, please let me know if it's interesting or helpful (or if it works at all for you). To use, put the script somewhere in your PATH and add it as a shebang for your build script (the #! thing), then set relevant environment variables before a second shebang line marking the start of your Docker installation script.

FROM, RUNDIR, and VOLUME variables will be automatically passed as arguments to docker run.
TAG, EXPOSE, and WORKDIR variables will be automatically passed as arguments to docker commit.

All other variables will be evaluated in the shell and passed as environment arguments to docker run, making them available within your build script.

For example, this script will cache and reuse Alpine Linux packages between builds (the VOLUME mounts a home directory to CACHE, which is then used as a symlink for the OS's package repository cache in the install script):

#!/usr/bin/env dockerize
FROM=alpine
TAG=${TAG:-wjordan/my-image}
WORKDIR=/var/cache/dockerize
CACHE=/var/cache/docker
EXPOSE=3001
VOLUME="${HOME}/.docker-cache:${CACHE} ${PWD}:${WORKDIR}:ro /tmp"
#!/bin/sh
ln -s ${CACHE}/apk /var/cache/apk
ln -s ${CACHE}/apk /etc/apk/cache
set -e
apk --update add gcc g++ make libc-dev python
[...etc etc build...]

wjordan on 20 Aug 2015

👍10 ❤1

So, after meeting the French contingent :) from Docker at MesoCon last week (it was a pleasure guys) I was made aware they have the same issue in-house and they developed a hack that copies over to a new slim image what they need.
I'd say that hacks are note welcome in the enterprise world ;) and this request should be properly handled.
Thank you for listening guys...

zrml on 24 Aug 2015

👍5

I'm also in favor of adding build-time -v flag to speed up builds by sharing a cache directory between them.

raine on 17 Sep 2015

👍43

@yngndrw I don't understand why you closed two related issues. I read your #59 issue and I don't see how this relates to this. In some cases containers become super-bloated when it's not needed at run-time. Please read the 1st post.
I hope I'm not missing something here... as it has been a long day :-o

zrml on 17 Sep 2015

@zrml Issue https://github.com/aspnet/aspnet-docker/issues/59 was related to the built-in per-layer caching that docker provides during a build to all docker files, but this current issue is subtly different as we are talking about using host volumes to provide dockerfile-specific caching which is dependent on the dockerfile making special use of the volume. I closed issue https://github.com/aspnet/aspnet-docker/issues/59 as it is not specifically related to the aspnet-docker project / repository.

The other issue that I think you're referring to is issue https://github.com/progrium/dokku/issues/1231, which was regarding the Dokku processes explicitly disabling the built-in docker layer caching. Michael made a change to Dokku in order to allow this behaviour to be configurable and this resolved the issue in regards to the Dokku project / repository, so that issue was also closed.

There is possibly still a Docker-related issue that is outstanding (I.e. Why was Docker not handling the built-in layer caching as I expected in issue https://github.com/aspnet/aspnet-docker/issues/59), but I haven't had a chance to work out why that is and confirm if it's still happening. If it is still an issue, then a new issue for this project / repository should be raised for it as it is distinct from this current issue.

yngndrw on 17 Sep 2015

@yngndrw exactly, so we agree this is different and known @docker.com so I'm re-opening it if you don't mind... well I cannot. Do you mind, please?
I'd like to see some comments from our colleagues in SF at least before we close it

BTW I was asked by @cpuguy83 to open a user case and explain it all, from log #3156

zrml on 18 Sep 2015

@zrml I'm not sure I follow - Is it https://github.com/aspnet/aspnet-docker/issues/59 that you want to re-open ? It isn't an /aspnet/aspnet-docker issue so I don't think it's right to re-open that issue. It should really be a new issue on /docker/docker, but would need to be verified and would need re-producible steps generating first.

yngndrw on 18 Sep 2015

no, no.. this one #14080 that you closed yesterday.

zrml on 18 Sep 2015

This issue is still open ?

yngndrw on 18 Sep 2015

@yngndrw I believe I mis-read the red "closed" icon. Apologies.

zrml on 21 Sep 2015

Heartily agree that build time -v would be a huge help.

Build caching is one use case.

Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault

lukaso on 22 Sep 2015

👍10

I'm commenting here because this is hell in a corporate world.
We have a SSL intercepting proxy, while I can direct traffic through it, heaps of projects assume they have good SSL connections, so they die horribly.

Even though my machine (and thus the docker builder) trusts the proxy, docker images don't.
Worst still the best practice is now to use curl inside the container, so that is painful, I have to modify Dockerfiles to make them even build. I could mount the certificates with a -v option, and be happy.

This being said. Its less the fault of docker, more the fault of package managers using https when they should be using a system similar to how apt-get works. As that is still secure and verifyable, and also cacheable by a http proxy.

btrepp on 8 Oct 2015

@btrepp thank you for another good use case.

zrml on 8 Oct 2015

I can think of another situation.

One of the things I would like to do with my dockerfiles is not ship the build tools with the "compiled" docker file. There's no reason a C app needs gcc, nor a ruby app need bundler in the image, but using docker build currently while have this.

An idea I've had is specifying a dockerfile, that runs multiple docker commands when building inside it. Psuedo-ish dockerfiles below.

Docker file that builds others

FROM dockerbuilder
RUN docker build -t docker/builder myapp/builder/Dockerfile
RUN docker run -v /app:/app builder
RUN docker build -t btrepp/myapplication myapp/Dockerfile

btrepp/myapplication dockerfile

FROM debian:jessie+sayrubyruntime
ADD . /app //(this is code thats been build using the builder dockerfile
ENTRYPOINT ["rails s"]

Here we have a temporary container that does all the bundling install/package management and any build scripts, but it produces the files that the runtime container needs.

The runtime container then just adds the results of this, meaning it shouldn't need much more than ruby installed. In the case of say GCC or even better statically linked go, we may not need anything other than the core OS files to run.

That would keep the docker images super light.

Issue here is that the temporary builder container would go away at the end, meaning it would be super expensive without the ability to load a cache of sorts, we would be grabbing debian:jessie a whole heap of times.

I've seen people to certain techniques like this, but using external http servers to add the build files. I would prefer to keep it all being build by docker. Though there is possibly a way of using a docker image to do this properly. Using run and thus being able to mount volumes.

btrepp on 9 Oct 2015

👍6

Here is another example. Say I want to build a container for systemtap that has all of the debug symbols for the kernel in it (which are Yuuuuge). I have to mount the underlying /lib/modules so that the yum command knows which RPMs to install.

Furthermore, maybe I would rather have these live somewhere other than in the 1.5GB image (from the debug symbols)

I went to write a Dockerfile, then realize it was impossible :-(

docker run --privileged -v /lib/modules:/lib/modules --tty=true --interactive=true rhel7/rhel-tools /bin/bash
yum --enablerepo=rhel-7-server-debug-rpms install kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r)
docker ps -a
CONTAINER ID        IMAGE                     COMMAND             CREATED             STATUS                        PORTS               NAMES
52dac30dc495        rhel7/rhel-tools:latest   "/bin/bash"         34 minutes ago      Exited (0) 15 minutes ago                         dreamy_thompson
docker commit dreamy_thompson stap:latest

https://access.redhat.com/solutions/1420883

fatherlinux on 14 Oct 2015

I'd like to repeat my use case here from #3949 as that bug has been closed for other reasons.

I'd really like to sandbox proprietary software in docker. It's illegal for me to host it anywhere, and the download process is not realistically (or legally) able to be automated. In total, the installers come to about 22GB (and they are getting bigger with each release). I think it's silly to expect that this should be copied into the docker image at build time.

jeremyherbert on 19 Nov 2015

Any news in this needed feature?
thank you

zrml on 19 Nov 2015

_USER POLL_

_The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right._

The people listed below have appreciated your meaningfull discussion with a random +1:

@vad

GordonTheTurtle on 30 Nov 2015

+1 for this feature!

Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault

This is our usecase as well (ssh keys rendered using tmpfs on the host in this case).

simonvanderveldt on 22 Dec 2015

Another usecase for this is for a local cache of the node_modules directory on a CI server to reduce build-times.
npm install is very slow and even in the current "best" case where the package.json is ADDed to the image, npm install is run and only then are the actual project sources added and built on changes to package.json all dependencies have to be redownloaded again.

See npm/npm#8836 for an issue about this on the Node/npm side.

simonvanderveldt on 23 Dec 2015

👍16

Related aspnet-docker issue regarding slow package restoration and the resulting image size of caching the current packages in the layer. Would be much better to use a mounted volume for the caching of the package.
https://github.com/aspnet/aspnet-docker/issues/123

This isn't a language-specific issue, it will affect many people given that package managers are now an accepted standard.

yngndrw on 23 Dec 2015

The OP has nailed the issue on the head, in that "docker build -v" would greatly help decoupling the build process from the runtime environment.

I've seen several projects which now build "Mulberry harbours" which are then used to build the actual docker that is then pushed/distributed. This is overly complex from both an administration and compute resource perspective, which in turn translates to slower CI and unit testing, and overall a less productive development workflow.

qoke on 24 Jan 2016

👍1

I've been thinking about this, and the other option I can think of is the ability to mark layers as "src" layers.

Something along the lines of those layers only being accessible during a docker build, but not pulled in the resulting image file.

This way docker can cache earlier layers/images, temporary build artifacts, but these aren't required to utilize the final image.

Eg.

FROM ubuntu
RUN apt-get install gcc
ADDPRIVATE . /tmp/src <--these can be cached by docker locally
RUNPRIVATE make     <-- basically these layers become scoped to the current build process/dockerfile
RUN make install <--result of this layer is required.

Of course this means you would need to know what you are doing better, as you could very well leave critical files out.

@yngndrw
A much better solution for situations like netcore would be for them to not use HTTPS for package management, then its trivial to set up iptables+squid to have a transparent caching proxy for docker builds.My personal opinion is that these package managers should up their game, they are terrible to use in corporate environments due to ssl resigning, whereas things such as apt-get work perfectly fine and are already cacheable with iptables+squid for docker.

I can also see a downside to using build time volumes, dockerfiles won't be as reproducible, and it's going to require extra setup outside of docker build -t btrepp/myapp ., It's also going to make automated builds on dockerhub difficult.

btrepp on 16 Mar 2016

👍1

@btrepp: I like your suggestion. I could even live for my use cases with an hardcoded (I know it's generally a bad thing) TMP dir that Docker tells us about so that they know when they build the final artifact from all the layers that they can forget/leave out the one mounted on /this_is_the_tmp_explosion_folder_that_will_be_removed_from_your_final_container_image
easy enough....

zrml on 16 Mar 2016

👍1

@btrepp I quite like your source layer idea.

However regarding package managers not using SSL, I would have to disagree.

If you were to want to cache packages like that, then you should probably use a (Local) private package feed instead which mirrors the official source. Reverting to HTTP seems like a bad idea to me, especially given that a lot of package managers don't seem to sign their packages and therefore rely on HTTPS.

yngndrw on 24 Mar 2016

There is a tool grammarly/rocker which can be used while this issue is not yet fixed.

sergle on 24 Mar 2016

👍14 🎉9 ❤7

@yngndrw

My point being the local proxy etc is a problem that was long solved. Package managers only need verification, they don't need privacy. Using https is a lazy way of providing verification, but it comes with the privacy attachment.

There's zero reason "super_awesome_ruby_lib" needs to be private when being pulled down via http(s). The better way would be for ruby gems to have a keyring. Or even a known public key, and for it to sign packages. This is more or less how apt-get works, and allows for standard http proxies to cache things.

Regarding a local private package feed, docker doesn't even support this well itself. There's zero way of disabling the standard feed, and it _rightly_ looses it if the https certificate is not in the cert store. I'm pretty sure docker always wants to at least check the main feed when pulling images too. Afaik the rocket/rkt implementation was going to use signing+http to get container images.

If the main motivation for build time volumes is just cache-ing of packages, then I think pressure should be placed on the package managers to better support cacheing, rather than compromising some of the automated/pureness of docker currently.

To be clear, i'm not advocating package managers switch to just using http and dropping https. They do need verification of packages to prevent against man in the middle attacks. What they don't need is the privacy aspect using https as a "security catch all sledgehammer" offers.

btrepp on 29 Mar 2016

👎2

That's a really narrow view. You're asking the entire universe of package managers to change how they behave to fit Docker's prescription of how they think applications will be built.

There's also a ton of other examples of why this is necessary in this thread. Saying "well you should just change how all the tools you use to build your applications work" doesn't drive the problem away, it'll only drive the users away.

(I also strongly disagree with Docker's attachment to the public registry -- I would very much prefer to forbid access to the public registry, and only allow our internal one to be used. But that's a different subject entirely.)

stevenschlansker on 29 Mar 2016

👍11

For me I also need docker build -v.

In our case we want to build an image which consists of a pre-configured installation of the concerned product, and the installer is over 2GB. Not being able to mount a host volume, we're not able to build the image with the installer even though we've already downloaded in the host OS, for which we can use various tools/protocols, say proxy with https cert/auth, or maybe even bit torrent.

As a workaround, we have to use wget to re-download the installer during docker build, which is a much restricted environment, much less convenient, more time consuming, and error prone.

Also because of the flexibility of the product installation/configuration options, it makes much more sense for us to ship the images with the product pre-installed, rather than shipping an image merely with the installer.

ryenus on 2 Aug 2016

👍7

@thaJeztah any chance of this happening?

graingert on 20 Oct 2016

Fwiw this is the sole reason I don't (or really, can't) use docker

jeremyherbert on 20 Oct 2016

We carry a patch in Red Hat versions of docker that include the -v option. But the true solution to this would be to build new and different ways to build OCI Container Images other then docker build.

rhatdan on 20 Oct 2016

👍5

@rhatdan RHEL or Fedora?

jeremyherbert on 20 Oct 2016

👍2

We also have implemented the -v option of docker build in our internal version of docker at resin.io. You can find the diff here https://github.com/resin-io/docker/commit/9d155107b06c7f96a8951cbbc18287eeab8f60cc

petrosagg on 20 Oct 2016

@rhatdan @petrosagg can you create a PR for this?

graingert on 20 Oct 2016

@jeremyherbert the patch is in the docker daemon that comes in all recent versions of RHEL, CentOS, and Fedora...

fatherlinux on 20 Oct 2016

👍3 🎉1

@graingert We have submitted it in the past and It has been rejected.

rhatdan on 20 Oct 2016

@rhatdan do you have a link to it?

graingert on 21 Oct 2016

@runcom Do you have the link?

rhatdan on 21 Oct 2016

@thaJeztah is this something you guys would have rejected?

graingert on 1 Nov 2016

Here's a list of existing issues that have been closed or not responded to:
https://github.com/docker/docker/issues/3949
https://github.com/docker/docker/issues/3156
https://github.com/docker/docker/issues/14251
https://github.com/docker/docker/issues/18603

Info about the Project Atomic patches used in RHEL/CentOS/Fedora can be found at:
http://www.projectatomic.io/blog/2016/08/docker-patches/

daveisfera on 1 Nov 2016

@daveisfera looks like they only add R volumnes not RW volumes, so it won't work for @yngndrw and my use case.

graingert on 1 Nov 2016

@graingert Why do you need RW volumes? I do understand read-only as a work-around for certain cases.

cpuguy83 on 1 Nov 2016

Testing schema migrations would be one good reason...

On 11/01/2016 10:36 AM, Brian Goff wrote:

@graingert https://github.com/graingert Why do you need RW volumes?
I do understand read-only as a work-around for certain cases.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257582035,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZdp0D6fAtuNglajPBIwnpWGq3slOks5q5050gaJpZM4FIdOc.

Scott McCarty

scott.[email protected]

http://crunchtools.com

@fatherlinux

fatherlinux on 1 Nov 2016

@cpuguy83 Another use-case for RW would be ccache

thebigb on 1 Nov 2016

@fatherlinux I'm not sure I follow. Why would you need a volume for this? Also why must it be done during the build phase?

cpuguy83 on 1 Nov 2016

I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.

I would bind mount for example:

docker build -v /home/jenkins/pythonapp/cache/pip:/root/.cache/pip  -t pythonapp .
docker build -v /home/jenkins/scalaapp/cache/ivy2:/root/.ivy2  -t scalaapp .

graingert on 1 Nov 2016

👍2

Because there are many times that schema migration has to be done when
the software is installed. If you run read-only containers, you should
never be installing software any time other than when you are in the
build phase.....

On 11/01/2016 10:42 AM, Brian Goff wrote:

@fatherlinux https://github.com/fatherlinux I'm not sure I follow.
Why would you need a volume for this? Also why must it be done during
the build phase?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257583693,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZfhBG8RUWtqPD-6RaLC7uoCNc-3nks5q50_TgaJpZM4FIdOc.

Scott McCarty

scott.[email protected]

http://crunchtools.com

@fatherlinux

fatherlinux on 1 Nov 2016

I know that the contents of this directories will not cause the build to be host dependant (missing these mounts will cause the build to work anyway, just slower)

graingert on 1 Nov 2016

NFS solved this like 30 years ago...

On 11/01/2016 10:45 AM, Thomas Grainger wrote:

I know that the contents of this directories will not stop the build
being idempotent or host dependant (missing these mounts will cause
the build to work anyway)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257584576,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZS75Vq0BSEvUjI2oXORsS0el2mwOks5q51CQgaJpZM4FIdOc.

Scott McCarty

scott.[email protected]

http://crunchtools.com

@fatherlinux

fatherlinux on 1 Nov 2016

NFS solved this like 30 years ago...

Not a helpful comment

graingert on 1 Nov 2016

@graingert sorry, that seriously came off wrong. I was trying to respond too quickly and did not give enough context. In seriousness, we are looking at NFS in combination with CRIO to solve some of these types of problems.

Both image registry and bulds have a lot of qualities in common. What you are talking about is basically a caching problem. NFS, and particularly the caching built in can make builds host independent and handle all of the caching for you.

Hence, even with a -v build time option, a build doesn't have to be locked to only one host. It might not be Internet scale independent, but it's quite enough for many people who control their build environment to a single site or location.

fatherlinux on 1 Nov 2016

@fatherlinux I'd use gitlab or travis caching to take the cache directory and upload/download into S3

graingert on 1 Nov 2016

@graingert yeah, but that only works on certain types of data/apps, also only at the bucket level right, not at the posix meta data and block level. For certian types of front end and middleware apps, no problem. For a database schema migration, you kinda need to test ahead of time and have the cache local for speed and it typically needs to be posix.

Imagine I had a MySQL Galera cluster with 1TB of data and I want to do an upgrade and they are all in containers. Containerized/Orchestrated multi-node, sharded Galera is really convenient. I don't want to have to manually test a schema migration during every upgrade.

I want snapshot the data volume (pv in Kube world), the expose it to a build server, then test the upgrade and schema migration. If everything works right and tests pass, then we build the production containers and let the schema migrationi happen in production....

fatherlinux on 1 Nov 2016

👍1

@graingert sorry, forgot to add, then discard the snapshot which was used in the test run... I don't want to orchestrate a build and test event separately, though that would be possible...

fatherlinux on 1 Nov 2016

@fatherlinux I think that's an orthogonal use case...

graingert on 1 Nov 2016

@graingert not a useful comment. Orthogonal to what? Orthogonal to the request for a -v during build which is what I understood this conversation to be about?

fatherlinux on 1 Nov 2016

There's a few different uses I see for this flag.

a cache, share between docker build servers
an ADD-like keyword that only applies 'during the build' of a single Dockerfile. Eg ADD a huge file, then exclude it from the images.
An ADD+RUN like keyword that ignores all output. Eg ADD a huge file, then RUN a step and ignore any changes to the image - ADD+RUN in one step then skip a layer.

graingert on 1 Nov 2016

👍2

The later two use cases could be solved more cleanly with two new keywords.

BUILDCONSTFILE <path>

Would run a COPY <path> before each RUN, and delete <path> from the image after.

TEST <cmd> WITH <paths>

Which would COPY the paths, run the command, then with 0 exit status continue the build from the parent image, otherwise would halt the build

graingert on 1 Nov 2016

👍1

Personally I think TEST ... WITH is better handled in another CI step that tests your container as a whole

graingert on 1 Nov 2016

Let me preface with this: I _think_ I'm ok with adding --mount to build ("-v" probably not so much). Not 100% sure on the implementation, how caching is handled (or not handled), etc.

For the docker project what we do is build a builder image.
It basically has everything we need, copies code in, but does not actually build docker.
We have a Makefile that orchestrates this. So make build builds the image, make binary builds the binary with build as a dependency, etc.

Making a binary runs the build image and does the build, with this we can mount in what we need, including package caches for incremental builds.
Most of this is pretty straight forward and easily orchestrated.
So there are certainly ways to handle this case today, just docker alone can't handle 100% of it (and that's not necessarily a bad thing) and you'll have to make this work with your CI system.

cpuguy83 on 1 Nov 2016

@cpuguy83 I think this would nail most of my use cases. Just so I understand, do you mean --mount to mean read only? and -v to be read/write?

fatherlinux on 1 Nov 2016

@cpuguy83 we are also mostly building "builder" images which IMHO is becoming a more and more common pattern...

fatherlinux on 1 Nov 2016

@fatherlinux swarm services and now (for 1.13) docker run supports --mount which is much more precise and flexible: https://docs.docker.com/engine/reference/commandline/service_create/#/add-bind-mounts-or-volumes

Looks like docs are missing the 3rd type of mount, tmpfs.

cpuguy83 on 1 Nov 2016

👍1

Ahh, very cool, thank you...

On 11/01/2016 02:20 PM, Brian Goff wrote:

@fatherlinux https://github.com/fatherlinux swarm services and now
(for 1.13) |docker run| supports |--mount| which is much more precise
and flexible:
https://docs.docker.com/engine/reference/commandline/service_create/#/add-bind-mounts-or-volumes

Looks like docs are missing the 3rd type of mount, |tmpfs|.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257648598,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZXv_VBfVi4WUAjVijE-SKR0ErRC4ks5q54L2gaJpZM4FIdOc.

Scott McCarty

scott.[email protected]

http://crunchtools.com

@fatherlinux

fatherlinux on 1 Nov 2016

@cpuguy83 we're also using the builder pattern a lot and we have a need for caching that is not persisted in the image and also survives layer invalidation.

We build Yocto images and we have a shared sstate cache on NFS storage. Another usecase is npm cache so that you can invalidate the whole RUN npm install layer but re-calculate it faster due to cached packages.

petrosagg on 1 Nov 2016

👍2

As a possible compromise based on @graingert's post, could one not have an optional hash of your huge file in the dockerfile, and then docker checks this when running the build? There would be no issues with deterministic builds then, and it would be obvious to the person building that they don't have the required dependencies, rather than it just exploding with a strange error at some point. Same thing goes for ssh keys, etc which would need to be distributed with the dockerfile anyway.

I also think that any idea which requires _copying_ of the huge file is less than ideal. File sizes I am interested in using are on the order of 10-40GB, and even with a good ssd that's at least a minute or two worth of copying. This is my problem with the ADD directive already in docker; I don't want to ADD a 30GB to my image every time it builds and have to deal with having all of that extra free space, as well as needing to squash images.

jeremyherbert on 2 Nov 2016

That would not work for what we're using this for. We have a volume that contains sstate caches from the yocto build system that is bind-mounted RW in the build because any cache miss will be calculated during the build and saved in sstate for future ones. Our directories are also at ~30GB so even calculating the hash would take a while.

I never understood the deterministic build concept. There are ways to shoot yourself in the foot even with today's semantics. For example you can curl something from an internal IP. Suddenly this Dockerfile doesn't work everywhere and it's host dependent. But there are legitimate cases of why you'd want to do that. For example a local HTTP cache.

So since builds are not deterministic anyway, and since one can emulate the bind-mounted volume over the network today, why not provide a native way of doing it with the appropriate warnings if need be?

petrosagg on 2 Nov 2016

👍1

@petrosagg @zrml @thaJeztah What we know is:

If we scourge through #7115 #3156 we can find a long list of issues that are many years old discussing this same problem
Most were closed with either reproduceability reason (or the old Dockerfile syntax is frozen comment, and then after HEALTHCHECK instruction was added, freeze was removed but issues remained closed)
This issue has stopped many teams from having good container usability / productivity in daily development for many years. As @btrepp said this is hell
Docker folks are aware of this issue, but this would bring a class of Oh my docker build broke! issues because of shared cache which is not good
But moving from Disk cache to Network cache doesn't seem to improve build reliability in any way, acts just as a glorified cache-over-http, and it has been found to make things worse (downloading the internet for every build, HTTPS, size of files, disk thrashing when building containers, layer caching, complicated build orchestration, etc)

Given all that we know, I think this will likely be closed as either Dupe or WontFix. It doesn't seem to matter what use cases we give. Update: I am happy to be wrong here. The proposal looks open :)

Our company moved to an agnostic container runtime, and will soon have to move to an agnostic image building experience as well. But this won't be the right place to discuss that because negativity doesn't help. That should be a separate post.

rdsubhas on 18 Nov 2016

👍4

@rdsubhas care to share the link when you are done?

jeremyherbert on 18 Nov 2016

👍2

@rdsubhas that's a nice summary. It doesn't look like this thread will be closed as dupe/wontfix since @cpuguy83 thinks he's ok with adding --mount during the build which covers most usecases.

What I'd like to know is given that the current proposal:

does not alter the Dockerfile syntax
does not make builds more host dependent that they currently are

Which are counter arguments left regarding the idea? If there aren't any maybe we should start discussing the implementation details for the --mount mechanism.

To re-enforce the argument that builds are already host dependent and non-reproducible I provide a list of Dockerfile fragments with this property:

# Install different software depending on the kernel version of the host
RUN wget http://example.com/$(uname -r)/some_resource

# example.intranet is only accessible from specific hosts
RUN wget http://example.intranet/some_resource

# get something from localhost
RUN wget http://localhost/some_resource

# gcc will enable optimizations supported by the host's CPU
RUN gcc -march=native .....

# node:latest changes as time goes by
FROM node

# ubuntu package lists change as time goes by
RUN apt-get update

# install different software depending on the docker storage driver
RUN if [ $(mount | head -n 1 | awk '{print $5}') == "zfs" ]; then .....; fi

petrosagg on 19 Nov 2016

👍9 ❤1

Honestly, if we just add the --mount and let the user handle cache invalidation (--no-cache), I think we'll be fine. We may want to look at finer-grained cache control from the CLI than all or nothing, but that's a separate topic.

cpuguy83 on 20 Nov 2016

👍4 ❤1

My Use Case

I have been facing a similar issue for a while now, but I've opted to go with the increased size of the image until a solution is finalized. I'll try to describe my scenario here in case someone finds a better workaround.

Conditions

CircleCI has the ssh keys to download all internal dependencies during build
GitHub hosts internal dependencies and others are downloadable from within the image during build

Options

Use --build-arg to pass a token during build (strongly discouraged). This is a very attractive and easy option since "it just works" without any added steps.
Download all dependencies and add them to the build context. Unfortunately, ADD and COPY are executed in separate layers so I'm stuck with data from previous layers.

The size of some of my images more than doubled in some cases, but the overall size is tolerable for now. I think there was a PR (I can't seem to find it) to remove build time args from the build history, but it wasn't acceptable due to caching concerns irrc.
I'll be happy to hear of any other workarounds being used out there.

misakwa on 6 Dec 2016

@misakwa we'll likely support secrets on build in 1.14.

cpuguy83 on 6 Dec 2016

👍5

That's very exciting to hear @cpuguy83. I'll keep an eye out for when its released. It'll definitely simplify some of my workflows.

misakwa on 6 Dec 2016

we'll likely support secrets on build in 1.14.

Will it aslo work for mapping build-time mapping of other type of volumes like for example yarn-cache?

whitecolor on 27 Dec 2016

👍2

BTW there is an intersting way to build production images using docker-compose, I found it working and quite effective:

So you have compose file docker-compose.build.yml something like this:

services: 
  my-app:
    image: mhart/alpine-node:7.1.0
    container_name: my-app-build-container # to have fixed name
    volumes:
    - ${YARN_CACHE}:/root/.cache/yarn # attach yarn cache from host
    - ${HOME}/.ssh:/.ssh:ro    # attach secrets
    - ./:/source
    environment: # set any vars you need
     TEST_VAR: "some value"    
    ports:
    - "3000"
    working_dir: /app/my-app # set needed correct working dir even if if doesn't exist in container while build type
    command: sh /source/my-app.docker.build.sh # build script

1) you build container using docker compose:

$ docker-compose -f docker-compose.build.yml up --force-recreate my-app

it creates container and runs shell build script my-app.docker.build.sh, I don't use Dockerfile and do everything in the build script:

install needed os packages
copy needed soruce code (from mapped /source folder)
install dependencies
build/compile/test if needed
remove packages and stuff that are not needed for target env to work (to reduce final image size)

Then you create image from container, replacing CMD for that needs to be run in target env:

docker commit -c "CMD npm run serve" my-app-build-container my-app-build-image:tag

So your image is ready, used external yarn cache and external secret keys that where available only while build time.

whitecolor on 28 Dec 2016

👍17 ❤3 🎉3 😄1

@whitecolor yep that works :) except for one thing: docker build is really effective at uploading the build context. Mounted source volumes unfortunately don't work with remote docker daemons (e.g. docker-machine on cloud for low-powered/bandwidth laptops). For that we have to do cumbersome docker run, docker cp, docker run, etc series of docker commands and then snapshot the final image, but its really hacky.

It really helps to have this officially part of docker build, and use layering and build context 😄

rdsubhas on 11 Jan 2017

@rdsubhas Yes you are correct

whitecolor on 11 Jan 2017

@whitecolor That is a really simple and effective solution. I just cut down a 30-40 min build on a project to about 5 minutes. I look forward to the possibility of having a --mount on build feature but for now this solution really unblocks my pipeline.

nuzz on 19 Jan 2017

This is a comment I left for issue #17745 which I had understood had been closed but was not marked duplicate. Seems I was wrong about that latter point: I'll admit I'm used to systems like Bugzilla that explicitly mark something as "RESOLVED DUPLICATE", and display such up in the top description area of a bug. I'm no mind reader. (So my apologies @graingert, I had little way of knowing, thus there is no need to yell at me in 20pt font -- that was excessive.)

In my case, where this would be useful would be on Debian systems: mounting /var/cache/apt as a volume, so you're not re-downloading the same .deb files over and over again. (Truly "unlimited" Internet quota just doesn't exist, especially here in Australia, and it even if there was, there's time wasted waiting for the download.)

Or another scenario, you're doing a build, but it also produces test reports such as failure listings and code coverage reports that you don't need to ship with the image, but are useful artefacts to have around. These could be written to a volume when a CI server goes to build the image for the CI server to pick up and host.

Or tonight, I'm doing some Gentoo-based images for myself, I'd like to mount /usr/portage from the host. It is not hard for a Dockerfile to realise, "hey, /usr/portage (in the container) is empty, no problems I'll just grab that" when running without the volume mounted, OR, it just uses the volume as-is, saving time fetching a fresh copy.

Adding those smarts is a trivial if statement in a Bourne shell script… IF the underlying logic to mount the volume is present in the first place. Right now for my Gentoo images, I'm having to pull /usr/portage every time I do a build (luckily the mirror is on my LAN) which means it's a good few minutes wait for that one step to complete.

So lots of reasons why this is a worthwhile proposal, and I'm doubtful that the nested builds proposed in #7115 is going to help in the above instances.

@whitecolor has an interesting approach, but if doing that, I might as well use a Makefile completely external to the Docker system to achieve the build.

sjlongland on 21 Jan 2017

@sjlongland I wasn't yelling at you, I was poly-filling a big "RESOLVED DUPLICATE" notice

graingert on 23 Jan 2017

I am using docker and docker-compose to build several containers for our infrastructure. The containers are microservices, mostly written in nodeJS, but there is one microservice witten in Java, using the maven framework.
Every time we rebuild the java container, tens of dependencies are downloaded from Maven; this takes several minutes. Then the code is build in about 15 seconds.

This is very ugly and it impacts our CI strategy pretty hard.

In this scenario it doesn't really matter if the volume with the build dependencies is missing or empty, because in that case the dependencies would be downloaded. Reproductibility is not affected.

I understand that there are security concerns, because I could tamper with the dependencies and inject nasty code in there; IMHO that could be easily circumvented by not allowing images build with "build volumes" to be published on docker-hub or docker-store.
To spell this out differently, there should be a distinction of scopes between the enterprise use and the personal use of docker.

stepps on 26 Jan 2017

👍1

@stepps check out https://pypi.python.org/pypi/shipwright instead of docker-compose

graingert on 27 Jan 2017

I've been following this thread for a while, looking for a good solution for myself. For building minimal containers in a flexible way with minimal effort I really like https://github.com/edannenberg/gentoo-bb by @edannenberg.

It separates build-time dependencies from run-time dependencies
Builds are done in containers and are isolated, clean, and repeatable
Handles dependencies between images and build ordering

It's based off using Gentoo's portage and emerge, so @sjlongland you may like it for your Gentoo-based images. Dist files and binary packages are cached, so it doesn't need to download or build them again making rebuilds fast. It has hooks to easily customise the build process. Installing 3rd party software is easy, such as using git to clone a repo and then build it, keeping only the build in the final image. It templates the Dockerfile.

A simple example is for figlet is: -

build.conf:

IMAGE_PARENT="gentoobb/glibc"

Dockerfile.template:

FROM ${IMAGE_PARENT}
ADD rootfs.tar /
USER figlet
CMD ["gentoo-bb"]
ENTRYPOINT ["figlet"]

builld.sh

PACKAGES="app-misc/figlet"

configure_rootfs_build() {
        useradd figlet
}

I like @whitecolor's solution it's simple using just Docker technology and then simple shell script or anything else you want to use. I'm using gentoo-bb as it's more complete. Shipwright looks good with more developer focused features such as dealing with branches. https://github.com/grammarly/rocker also seems interesting. Thanks for sharing everyone.

berney on 5 Feb 2017

👍2

Just another voice added to the pile. Our very complex dev environment would be vastly simpler if we could mount local volumes on build.

nickwilliams-eventbrite on 28 Feb 2017

👍11

A workaround is to run during a build an http server that exposes the local files and then use curl/wget etc. to get the files into the docker build. But I really wish such hacks would be unnecessary.

ibukanov on 7 Mar 2017

Another use case.. I want to build docker images for building a proprietary OS which as 10s of different versions. The install media is >80GB, so I cannot just copy this into the docker build environment. A bind mount would be much more preferable.

uSpike on 10 Mar 2017

Another one : my project use distributes Dockerfiles in the repository for building from sources in the container. Currently, we pull another git clone in the container from github. There is shallow clones and all, but still...

Piezoid on 22 Mar 2017

So, I just tested [1] on a rhel7 build host, and Red Hat's build of the docker daemon DOES have the -v option for build. I haven't tested on CentOS/Fedora, but one would imagine Fedora/CentOS probably have it too. It's worth testing. Also, RHEL Developer subscriptions are now free [2]:

fatherlinux on 22 Mar 2017

👍1

@fatherlinux Under Fedora `docker build -v' is also available.

ibukanov on 22 Mar 2017

@fatherlinux The CentOS 7 version includes it.

+1 I think this would be really useful feature to add to the official docker.

nathanjackson on 30 Mar 2017

👍2

Just updated on both centos and linuxmint (now running 17.03.1-ce), Am I missing something here ? I can't see the option -v

On mint

$ docker build --help

Usage:  docker build [OPTIONS] PATH | URL | -

Build an image from a Dockerfile

Options:
      --build-arg list             Set build-time variables (default [])
      --cache-from stringSlice     Images to consider as cache sources
      --cgroup-parent string       Optional parent cgroup for the container
      --compress                   Compress the build context using gzip
      --cpu-period int             Limit the CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int              Limit the CPU CFS (Completely Fair Scheduler) quota
  -c, --cpu-shares int             CPU shares (relative weight)
      --cpuset-cpus string         CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string         MEMs in which to allow execution (0-3, 0,1)
      --disable-content-trust      Skip image verification (default true)
  -f, --file string                Name of the Dockerfile (Default is 'PATH/Dockerfile')
      --force-rm                   Always remove intermediate containers
      --help                       Print usage
      --isolation string           Container isolation technology
      --label list                 Set metadata for an image (default [])
  -m, --memory string              Memory limit
      --memory-swap string         Swap limit equal to memory plus swap: '-1' to enable unlimited swap
      --network string             Set the networking mode for the RUN instructions during build (default "default")
      --no-cache                   Do not use cache when building the image
      --pull                       Always attempt to pull a newer version of the image
  -q, --quiet                      Suppress the build output and print image ID on success
      --rm                         Remove intermediate containers after a successful build (default true)
      --security-opt stringSlice   Security options
      --shm-size string            Size of /dev/shm, default value is 64MB
  -t, --tag list                   Name and optionally a tag in the 'name:tag' format (default [])
      --ulimit ulimit              Ulimit options (default [])
$ cat /etc/lsb-release 
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=18
DISTRIB_CODENAME=sarah
DISTRIB_DESCRIPTION="Linux Mint 18 Sarah"
$ docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:45:26 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Fri Mar 24 00:45:26 2017
 OS/Arch:      linux/amd64
 Experimental: false

On centos 7

# docker build --help

Usage:  docker build [OPTIONS] PATH | URL | -

Build an image from a Dockerfile

Options:
      --build-arg list             Set build-time variables (default [])
      --cache-from stringSlice     Images to consider as cache sources
      --cgroup-parent string       Optional parent cgroup for the container
      --compress                   Compress the build context using gzip
      --cpu-period int             Limit the CPU CFS (Completely Fair Scheduler) period
      --cpu-quota int              Limit the CPU CFS (Completely Fair Scheduler) quota
  -c, --cpu-shares int             CPU shares (relative weight)
      --cpuset-cpus string         CPUs in which to allow execution (0-3, 0,1)
      --cpuset-mems string         MEMs in which to allow execution (0-3, 0,1)
      --disable-content-trust      Skip image verification (default true)
  -f, --file string                Name of the Dockerfile (Default is 'PATH/Dockerfile')
      --force-rm                   Always remove intermediate containers
      --help                       Print usage
      --isolation string           Container isolation technology
      --label list                 Set metadata for an image (default [])
  -m, --memory string              Memory limit
      --memory-swap string         Swap limit equal to memory plus swap: '-1' to enable unlimited swap
      --network string             Set the networking mode for the RUN instructions during build (default "default")
      --no-cache                   Do not use cache when building the image
      --pull                       Always attempt to pull a newer version of the image
  -q, --quiet                      Suppress the build output and print image ID on success
      --rm                         Remove intermediate containers after a successful build (default true)
      --security-opt stringSlice   Security options
      --shm-size string            Size of /dev/shm, default value is 64MB
  -t, --tag list                   Name and optionally a tag in the 'name:tag' format (default [])
      --ulimit ulimit              Ulimit options (default [])
# docker version
Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:05:44 2017
 OS/Arch:      linux/amd64
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)

wilfriedroset on 30 Mar 2017

@wilfriedroset In CentOS 7, the non-Official Docker packages provide the option. I think its part of the EPEL repository.

nathanjackson on 30 Mar 2017

thanks @nathanjackson. Do we have an ETA for this feature in the official release ?

wilfriedroset on 1 Apr 2017

@wilfriedroset AFAIK, there is NO ETA because it was decided (several times) that this feature SHOULD not be in the official docker to preserve "build portability." aka allow your Dockerfiles to run anywhere including the Docker build service.

In my experience, limited build portability is what customers really want. They want to set up a build environment/farm and ensures that builds can always be rebuilt in that environment. The -v build option does not prevent this in any way.

For example, if you use NFS mounts, just make sure all of the build servers have that mount in their fstabs and you build will complete without issue anywhere in the farm.

On RHEL 7.3
````
[root@rhel7 ~]# docker build --help

Usage: docker build [OPTIONS] PATH | URL | -

Build an image from a Dockerfile

Options:
--build-arg value Set build-time variables (default [])
--cgroup-parent string Optional parent cgroup for the container
--cpu-period int Limit the CPU CFS (Completely Fair Scheduler) period
--cpu-quota int Limit the CPU CFS (Completely Fair Scheduler) quota
-c, --cpu-shares int CPU shares (relative weight)
--cpuset-cpus string CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems string MEMs in which to allow execution (0-3, 0,1)
--disable-content-trust Skip image verification (default true)
-f, --file string Name of the Dockerfile (Default is 'PATH/Dockerfile')
--force-rm Always remove intermediate containers
--help Print usage
--isolation string Container isolation technology
--label value Set metadata for an image (default [])
-m, --memory string Memory limit
--memory-swap string Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--no-cache Do not use cache when building the image
--pull Always attempt to pull a newer version of the image
-q, --quiet Suppress the build output and print image ID on success
--rm Remove intermediate containers after a successful build (default true)
--shm-size string Size of /dev/shm, default value is 64MB
-t, --tag value Name and optionally a tag in the 'name:tag' format (default [])
--ulimit value Ulimit options (default [])
-v, --volume value Set build-time bind mounts (default [])
```

fatherlinux on 2 Apr 2017

👍2

another use case on a CI building node projects is to share the CI's yarn cache when building all the images.

Puneeth-n on 5 Apr 2017

👍14

+1 : install node_modules again and again is really terrible, especially for nodejs micro services
I'm trying to solve this problem with nfs , I think "repeatable" is not a good reason for not implement this feature...

GongT on 11 Apr 2017

This seems like it will be even more important with #31257 and #32063 merged in.

mkobit on 11 Apr 2017

Take a look at #32507

cpuguy83 on 11 Apr 2017

👍1

@fatherlinux could you explain how build portability works when you can have COPY commands within the Dockerfile? I have an issue where I want to avoid the number of copies of a large file (for time complexity reasons) and am looking for a build-time read-only option to share the file with the container.

arunmk on 10 May 2017

👍1

@arunmk See https://github.com/moby/moby/issues/32507

cpuguy83 on 10 May 2017

@arunmk @cpuguy83 exactly. The idea is that you really don't want to COPY data into the container on build. That can make it very large. We just want the data available at build time. Per above, you can do a -v bind mount in Red Hat's version of the docker daemon which allows you to have data available, but's it's read only right now (burned me last week).

So, if you need it today, check out Fedora, CentOS, or RHEL and you can mount in a Read Only copy of data at build time...

And, if you need portability within a build farm, I would suggest NFS or some such....

fatherlinux on 10 May 2017

If you don't care about copying it in but rather just care about having it in the final image, you can use multi-stage builds to handle this.

A contrived example:

FROM fatImage AS build
COPY bigData /data
RUN some_stoff /data

FROM tinyImage
COPY --from=build /data/result

cpuguy83 on 10 May 2017

👍2

Thanks for the clarification @fatherlinux
@cpuguy83 thanks for the detail. Let me add more detail to my issue which may be uncommon: I have a build system that generates a 3.3GB file. That is added to an RPM which is built within a docker container. So there are two copies that are produced: one from the build system into the docker container, one from within the docker container to within the RPM. Now, I cannot avoid the second copy. I was thinking of avoiding the first copy but it looks like that is also not possible, even with the multi-stage builds.
I can understand that, if the large file was used repeatedly, the multi-stage copy would have reduced the number of times the copy runs to '1'. I use it once and wanted to reduce the number to '0'. Am I right in understanding that it won't be possible?

arunmk on 11 May 2017

👍1

@arunmk No matter what it's going to have to be copied to the build instance from the client.

cpuguy83 on 11 May 2017

@cpuguy83 thanks for the clarification. Looks like I have to take the overhead for now. Is that to have atomicity?

arunmk on 11 May 2017

@fatherlinux

I tried what you said, using -v on RHEL7 to try and readonly mount a directory during build, but get this error:

Volumes aren't supported in docker build. Please use only bind mounts.

ghost on 18 May 2017

This will only work with the docker package from RHEL not the one from Docker. Patch was not accepted upstream.

rhatdan on 18 May 2017

@fatherlinux

I tried what you said, using -v on RHEL7 to try and readonly mount a directory during build, but get this error:

Volumes aren't supported in docker build. Please use only bind mounts.

@fcntl

you need to use binds as the error said, you probably used -v /something rather than /hostsomething:/containersomething

runcom on 18 May 2017

@thebigb and perhaps others, we've set up an infrastructure to be able to use ccache during docker builds. we've published it at https://github.com/WebHare/ccache-memcached-server if it helps you, although ideally resolving this issue would probably obsolete it.

unilynx on 17 Jun 2017

I was just about to add, a use case I really need this for is ccache. I would like to be able to mount my ccache cache during a docker image build--there's no sense in it being in the image itself. @unilynx I'll have a look at your workaround--good timing!

embray on 4 Jul 2017

Juse another voice.

My use case: currently I use rocker's MOUNT command to share /root/.cache and /var/cache/apk directories.

For some reason I have very (very, very) slow network access to apk packages and pip packages. Any rebuild will make the process incredibly time-comsuming. It make things a lot easier with this build-time MOUNT feature.

roxma on 4 Jul 2017

👍2

@embray @roxma have a look at https://github.com/moby/moby/issues/32507 if that would address your use case; feedback welcome

thaJeztah on 4 Jul 2017

With the introduction of multi-stage builds, I find the need to specify a volume mount for Maven's Local Cache is critical.

awhitford on 16 Jul 2017

👍11

@gim913 This is not how you participate in any community. If you would like to contribute, please review the existing proposals linked here to see if any of them solves your use-case.

cpuguy83 on 27 Jul 2017

👍4

@gim913 At this stage of docker integration into various distributions, changing environments (ie dropping docker completely) seems like a lot more disruptive than changing your 'OS' (I assume you mean switching from a different Linux distribution towards the RedHat build which apparently includes -v? )

Wouldn't it be easier to just take RedHat's version of docker? Perhaps someone here can point you towards the relevant patches/forks/commits to get the '-v' option in the build.

unilynx on 27 Jul 2017

@unilynx here you go

Puneeth-n on 28 Jul 2017

👍4

I was looking at some examples that used wget and got here...my use case is similar...I want to unzip a large tarball and just run it. I don't wan't to litter the docker file w/ the tarball or waste time doing a wget from a local web server. Mounting like you can do w/ docker compose seems like a reasonable thing to do at build time. Please merge Puneeth's change if it looks ok :-)

kenyee on 1 Aug 2017

👍1

I precompile python wheels and want to install those in the container without copying them and making a layer I really don't need, or have to somehow try to squash. Day 1 and I am already looking into rocker 😢 😢 😢

This would be easy to add and extremely useful (or a mount command, see rocker again). How much time is spent (in the community) scripting around this or similar missing features?

awbacker on 2 Aug 2017

👍2 👎1

@awbacker Multi-stag build solves this pretty well where you can do something like

FROM something AS my_wheels
RUN compile_all_the_things

FROM something
COPY --from my_wheels /wherever
RUN do_stuff_with_wheels

The first part is only run if something changes. The cache for it can be shared amongst other builds/dockerfiles as well.
This makes the whole build self-contained.

There's also a proposal that would allow RUN --mount where the mount spec would tell it to mount a thing from the my_wheels build target instead of copying it.

Like for @kenyee, this could mount something from the build context, which in 17.07-experimental is only sent incrementally as needed.

cpuguy83 on 2 Aug 2017

👍1

@cpuguy83 That doesn't work in practice - at least for Gradle Java builds. I have a base Docker image that has the Gradle Jar files pre-cached, but Gradle build of your source is what triggers the download all of your dependencies into the cache.

BryanHunt on 2 Aug 2017

👍1

@cpuguy83 multi-stage does't allow to remove copied wheels from resulted image, it's what @awbacker talking about. Thus content in /wherever folder will be cached and image size will be increased.

Bahus on 2 Aug 2017

@BryanHunt So part of your build process is downloading the deps? For sure Gradle must provide a way to cache these without going through and actually building?

cpuguy83 on 2 Aug 2017

@cpuguy83 Yep, deps are downloaded as part of the build. Basically the same as Maven. For reference: https://github.com/gradle/gradle/issues/1049

BryanHunt on 2 Aug 2017

was there a PR for build mounts somewhere?

graingert on 2 Aug 2017

@graingert Here

Puneeth-n on 3 Aug 2017

👍 for this. At Lunar Way we want to do the complete "build -> test -> build production image" process in a single Docker build in order to remove build and test dependencies from the CI server. With multi stage builds we can do this, but we cannot get the test results out of the intermediate container in the build process. We therefore have to do it in two steps right now - with a separate Dockerfile for building the test image, running it and then only proceeding to the build prod image step, if tests succeeeds.

A -v option on docker build would allow us to store the test results in a folder mounted in from the CI server and remove the need for the current 2-step process.

tbflw on 20 Sep 2017

👍3

@tbflw By default Docker build does not remove intermediate containers after an unsuccessful build. So if a test fails, you can get the test results from those.

ibukanov on 20 Sep 2017

👍1

Please, we also really, really need this feature! Resorting to other tools like rocker or forking docker with ad-hoc patches is by far uglier than breaking the evangelic notion of "build portability".

nardeas on 29 Sep 2017

👍4

@BryanHunt @stepps @yngndrw others too @awhitford
One way to cache build dependencies is to make your build work like the example multi-stage go build in the documentation or the python onbuild Dockerfile.
Here is an example I made that seems to work for maven. I'll copy it here.

FROM maven
WORKDIR /usr/src/app
# /root/.m2 is a volume :(
ENV MAVEN_OPTS=-Dmaven.repo.local=../m2repo/
COPY pom.xml .
# v2.8 doesn't work :(
RUN mvn -B -e -C -T 1C org.apache.maven.plugins:maven-dependency-plugin:3.0.2:go-offline
COPY . .
RUN mvn -B -e -o -T 1C verify

FROM openjdk
COPY --from=0 /usr/src/app/target/*.jar ./

It needs to be set up so it downloads dependencies before it copies the rest of the codebase in. Also make sure that the place your artifacts get stored aren't in a VOLUME.

sixcorners on 5 Oct 2017

👍1

@sixcorners That doesn't work for Gradle

BryanHunt on 5 Oct 2017

@BryanHunt This Dockerfile or this approach doesn't work for gradle? cpuguy83 asked if there was a way to download dependencies without actually performing a build. You linked to a resolve dependencies task. Couldn't you just add the build.gradle file and run that task?

sixcorners on 5 Oct 2017

@sixcorners When you have many modules, you have to replicate your directory structure along with the build files and property files. I suppose it could be done, but I see this as very error prone.

BryanHunt on 5 Oct 2017

The multistage by @sixcorners is an interesting trick and I have seen it used for different package managers (eg npm, composer).

There is an issue though, whenever the list of dependencies is changed COPY pom.xml in the stage 0 image causes the layer to be ditched out and thus the whole cache is gone. That meant that whenever a developer change anything in the pom (a comment, a 1kBytes dependency) the whole cache gotta be redownloaded again.

For CI machines building the image and then running the tests with dependencies that keep changing, that is thousands and thousands of packages that have to be redownloaded (either from a proxy or from upstrea) and make the rebuild quite slow. A local file based cache mounted as a volume is way faster.

That is also an issue when developers iterate the build of an image, specially if they are on slow connections. Though one can setup a local Nexus instance and http_proxy to it but that has other side effects (such as channeling any http request via Nexus).

Multistage is a nice workaround, but it is not ideal.

hashar on 5 Oct 2017

👍1

A solution we are about to try is to build an image by building our shared libraries and retaining the dependency cache. This image would then become our build image for our apps. It's not ideal, but we think it's worth a try.

BryanHunt on 6 Oct 2017

There is an issue though, whenever the list of dependencies is changed COPY pom.xml in the stage 0 image causes the layer to be ditched out and thus the whole cache is gone. That meant that whenever a developer change anything in the pom (a comment, a 1kBytes dependency) the whole cache gotta be redownloaded again.

@hashar note that the COPY --from feature is not limited to build-stages; from the Dockerfile reference:

Optionally COPY accepts a flag --from=<name|index> that can be used to set the source location to a previous build stage (created with FROM .. AS <name>) that will be used instead of a build context sent by the user. The flag also accepts a numeric index assigned for all previous build stages started with FROM instruction. _In case a build stage with a specified name can’t be found an image with the same name is attempted to be used instead._

This allows you to _build_ an image for your dependencies, tag it, and use that to copy your dependencies from. For example:

FROM maven
WORKDIR /usr/src/app
# /root/.m2 is a volume :(
ENV MAVEN_OPTS=-Dmaven.repo.local=../m2repo/
COPY pom.xml .
# v2.8 doesn't work :(
RUN mvn -B -e -C -T 1C org.apache.maven.plugins:maven-dependency-plugin:3.0.2:go-offline
COPY . .
RUN mvn -B -e -o -T 1C verify

docker build -t dependencies:1.0.0 .

And specify using the dependencies:1.0.0 image for your dependencies;

FROM openjdk
COPY --from=dependencies:1.0.0 /usr/src/app/target/*.jar ./

Or (just a very basic example to test);

$ mkdir example && cd example
$ touch dep-one.jar dep-two.jar dep-three.jar

$ docker build -t dependencies:1.0.0 . -f -<<'EOF'
FROM scratch
COPY . /usr/src/app/target/
EOF

$ docker build -t myimage -<<'EOF'
FROM busybox
RUN mkdir /foo
COPY --from=dependencies:1.0.0 /usr/src/app/target/*.jar /foo/
RUN ls -la /foo/
EOF

In the output of the build, you'll see:

Step 4/4 : RUN ls -la /foo/
 ---> Running in 012a8dbef91d
total 8
drwxr-xr-x    1 root     root          4096 Oct  7 13:27 .
drwxr-xr-x    1 root     root          4096 Oct  7 13:27 ..
-rw-r--r--    1 root     root             0 Oct  7 13:26 dep-one.jar
-rw-r--r--    1 root     root             0 Oct  7 13:26 dep-three.jar
-rw-r--r--    1 root     root             0 Oct  7 13:26 dep-two.jar
 ---> 71fc7f4b8802

thaJeztah on 7 Oct 2017

👍4

I don't know if anyone has mentioned this use-case yet (I briefly searched the page) but mounting an SSH auth socket into the build container would make utilizing dependencies that deployed via private git repositories much easier. There would be less of a need for boilerplate inside the Dockerfile regarding copying around keys in non-final build stages, etc.

dustinlacewell-wk on 10 Oct 2017

👍6

buildkit has native support for git
https://github.com/moby/buildkit

AkihiroSuda on 10 Oct 2017

Solving.
Create bash script(~/bin/docker-compose or like):

#!/bin/bash

trap 'kill $(jobs -p)' EXIT
socat TCP-LISTEN:56789,reuseaddr,fork UNIX-CLIENT:${SSH_AUTH_SOCK} &

/usr/bin/docker-compose $@

And in Dockerfile using socat:

...
ENV SSH_AUTH_SOCK /tmp/auth.sock
...
  && apk add --no-cache socat openssh \
  && /bin/sh -c "socat -v UNIX-LISTEN:${SSH_AUTH_SOCK},unlink-early,mode=777,fork TCP:172.22.1.11:56789 &> /dev/null &" \
  && bundle install \
...
or any other ssh commands will works

Then run docker-compose build

kinnalru on 30 Nov 2017

To throw another use case on the pile. I use Docker for Windows to generate a filesystem for building embedded linux system in one container and I would like to share this with other containers during their build step. I interact with this container changing configuration and rebuilding etc so performing the build in a Dockerfile and using multi-stage builds isn't a really good fit as I would lose incremental builds. I want to cache my previous build artefacts as it takes about 1.5 hours to do a clean build. Due to the way Windows deals with symbolic links I can't do my build into a host mounted volume so I use named volumes. Ideally I would like to share these named volumes in the build steps of my other images; as at the moment I have to create a tar of the build output (about 4gb) then do a docker copy to make it available on the windows host for subsequent builds.

nigelgbanks on 8 Jan 2018

In case of python, when we pip install package it and its dependencies are downloaded to a cache folder and then installed to site-packages.
As a good practice we use pip --no-cache-dir install package to not store rubbish/cache in current layer. But for best practice it is desired to put the cache folder out of build context. so build time -v will help.
some users above mentioned to use COPY . /somewhere/in/container/ it is OK for user app or files but not for cache. because COPY creates one more layer as its own and removing caches in later layers will not be useful. other bad side effect is if cache changed when we use COPY the context changed and following layers will invalidate and forced to rebuild.

wtayyeb on 9 Jan 2018

@wtayyeb If you have Dockerfile which runs pip install ... only when requirements file changes then build time -v doesn't seem that important since requirements do not change as often as applications do when building.

manishtomar on 9 Jan 2018

👎1

@wtayyeb You can use multi-stage Dockerfile to have both the cache and a lean image. That is, use an installer image to install python into some directory and then for your final image use COPY --from to transfer only the necessary python files without any installation artifacts or even pip itself.

ibukanov on 9 Jan 2018

👍2

@manishtomar, Thanks, Yes and No! In clean case all dependencies are downloaded again and build and converted to wheels and cached, then installed into destination environment. So if one put requirements in there that is one time job. But if one tiny dependency is updated, all the dependencies must to re-downloaded, re-build and re-wheeled and re-cached to be usable.
When using a CI to build and test your libraries and your applications in a matrix of several jobs, multiply above work in number of concurrent jobs in your CI server and will get iowait raising to more than 3s and load average above 15 even with SSDs. (these numbers are real for 2 concurrent builds and app with ~20 dependencies) I think pip cache is doing it in the right way, avoiding re-downloading, re-building and re-wheeling the ready packages. and without bind -v we loose time and server resources.

@ibukanov, Thanks. I am using multi-stage Dockerfile for building my apps packages and use them later. It would help if I have only one Dockerfile and want to build it several times, but what if there be several Dockerfiles and each one is build against a python version (2.7,3.6 for now) and also have several c-extensions that need to be build for selected base image? what about above paragraph?

wtayyeb on 9 Jan 2018

@thaJeztah You suggestion is great and it will save us some time, however in the case of build caches we really don't want to have to copy anything from the other image.
Why can't we access another image without copying it?

thedrow on 6 Feb 2018

@thedrow my example was with the features that are currently there; have a look at the RUN --mount proposal (https://github.com/moby/moby/issues/32507), which may be a closer fit to your use case

thaJeztah on 6 Feb 2018

🎉2

Reading the above thread I see a large number of people trying to find kludges to fix a basic functionality gap in the docker build process. I see no compelling arguments from the basis of portability without necessarily conflating host mounts with image mounts- arguments which are frankly specious and lazy.

I am also a gentoo container user and was redirected from https://github.com/moby/moby/issues/3156 which is a completely valid use case for this missing functionality.

All I really want is the ability to mount the contents of another image at build-time so that I don't bloat my images.

kbaegis on 23 Feb 2018

👍2

@kbaegis sounds like an exact match with the feature that's proposed in https://github.com/moby/moby/issues/32507

thaJeztah on 23 Feb 2018

Sure. That one's only been an unimplemented P3 in the backlog for one year rather than 3 years.

It looks like https://github.com/projectatomic/buildah is actually going to outstrip docker build pretty quickly here for this basic functionality. I think I'm just going to switch my pipeline over once that happens.

kbaegis on 24 Feb 2018

👍1

@kbaegis what did you come here to add to this discussion? You described a use-case that _exactly_ matches a different proposal;

All I really want is the ability to mount the contents of another image at build-time so that I don't bloat my images.

It’s open-source, things don’t come to existence magically.

thaJeztah on 24 Feb 2018

What am I looking to add to the discussion?

Succinctly that I'm moving on from this toolset. I'm sure that's valuable information for the development team as I'm sure I'm not alone there.

The glacial speed and low priority for supporting this use case (and any reliable workaround that provides this functionality) has forced me onto other tools and that I'm abandoning this build pipeline due to missing functionality.

kbaegis on 24 Feb 2018

👍3 😕1

I've got a (rehash, I'm sure) use case to add. #32507 may suit this better.

I'm building a docker image for some bioinformatics pipelines. A few of the tools require some databases to be present prior to their compilation/installation (please don't ask, it's not my code). These databases weigh in at a lovely 30gb minimum.

During runtime, I certainly intend for those databases to be mounted -v volumes. Unfortunately, I cannot do this during the build process without "baking" them in, resulting in a rather obscenely sized image.

draeath on 27 Feb 2018

@draeath take a look at the https://github.com/grammarly/rocker . It already supports a lovely MOUNT instruction.

lig on 28 Feb 2018

👍1

@draeath also, check out Buildah, it supports mounts by default because it is set up more like a programming tool. Also supports mounts with a Dockerfile:

https://github.com/projectatomic/buildah

fatherlinux on 28 Feb 2018

👍5

Thank you both @fatherlinux and @lig - this will help me get my task done. I still think I shouldn't have to stray outside the project to do it, though, and would still love to see this and #32507 implemented ;)

draeath on 28 Feb 2018

I've come here via some googling to ask for the same feature, volumes at 'docker build' time, not 'docker run' time.

We have an embedded system that contains a CPU. The manufacturer provides tooling to compose a system image, and then transfer the image into the CPU. This tooling is 3rd party to me and I cannot change it. The manufacturer is also unlikely to alter it at my request.

I want to build a docker image that does a first pass "build the firmware image", and then be able to spawn containers that just push the firmware image to the fresh-off-the-line PCBs. A Dockerfile might look like:
----------[ Cut Here ]----------
FROM base-image as builder
COPY src src
RUN build-src

FROM base-image as flasher
COPY --from=builder build-artifacts
RUN cpu-build-and-flash --build-only
----------[ Cut Here ]----------
Unfortunately, the cpu-build-and-flash step requires access to the target device via USB bus, even though it's not going to push the firmware image to the device. Thus I need to take the '-v /dev/usb/bus:/dev/usb/bus' from the 'docker run' command and have it in the build instead.

It's clear that this isn't currently possible.

The workaround I'm going ahead with is to manually create a flashing image by 'docker container commit'ing a container to an image. I'd much rather just mount the USB bus at build time.

smacdonald-miov on 1 Mar 2018

Update for any who are interested: I've recently rebuilt my entire pipe successfully with buildah. I've currently got the two build pipelines running in parallel and the oci/buildah pipeline is generating smaller images (specifically removing /usr/portage in my case by masking it with another mount).

kbaegis on 2 Mar 2018

👍2

And finally this feature is here: https://github.com/docker/docker-py/issues/1498

socketpair on 28 Apr 2018

🎉2 👎1 👍1

But I want RW volumes for a build cache

On Sat, 28 Apr 2018, 17:29 Коренберг Марк, notifications@github.com wrote:

And finally this feature is here: docker/docker-py#1498
https://github.com/docker/docker-py/issues/1498

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-385188262, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAZQTJodLCCzyDdPFtNiIUZ_z85YvLWbks5ttJjagaJpZM4FIdOc
.

graingert on 28 Apr 2018

👍30

I'd also like to see this feature (with write capabilities) so that a unit test results file can be exported during the multistage build process in a CI pipeline. To keep with the spirit of build portability, if the -v switch was not provided, the file would simply be written internally within the test image at that stage.

The ideal goal is to build once, test once, and still have the results file given to the host system, even in the event (and especially in the event) that tests fail, stopping the build.

mcattle on 17 May 2018

👍10

Yes please. All day.

shanselman on 19 May 2018

😕2

Not entirely relevant, but we're migrating a part of our deployment infrastructure and needed a way to copy files from an image after build. The following did the trick:

docker build -t x .
ID=$(docker create x)
docker cp $ID:/package.deb .
docker rm $ID

hoffa on 21 May 2018

It should have already been added when multistage docker file was introduced. Eventually everyone is gonna face this issue as soon as they are gonna start running unit tests as a stage in multistage docker file specially in case of CI build pipelines. We are also facing this issue where we have to publish unit test reports to VSTS. Already applying the workaround @hoffa has mentioned. But after all it is a workaround and making the things complicated.

TqrHsn on 23 May 2018

👍8

Should we make a different issue for people that want/need build-time volumes for a build cache?

ajbouh on 23 May 2018

@ajbouh Yes, probably at https://github.com/moby/buildkit/issues

AkihiroSuda on 24 May 2018

See https://github.com/moby/moby/issues/32507#issuecomment-391685221

On Wed, May 23, 2018, 19:22 Akihiro Suda notifications@github.com wrote:

@ajbouh https://github.com/ajbouh Yes, probably at
https://github.com/moby/buildkit/issues

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-391566368, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcnSqNoVc4j34ElECy53gIfPecQFKfks5t1hlkgaJpZM4FIdOc
.

ajbouh on 24 May 2018

While you can't add volumes at build-time, you can add hosts, so I now build all my docker images with something like --add-host yum-mirror:$MIRROR_IP which serves up a yum mirror which my build images then detect via a wrapper around yum. Handy when my project changes dependencies many times a day and I'm offline or on a bad connection (part of the project involves updating and cleaning up its many deps).

I find Docker's resistance to solving this problem infuriating.

chuckadams on 17 Jun 2018

👍7

Experimental support for buildkit was recently merged, with that comes with an option to RUN --mount=<opts> <command>.

cpuguy83 on 17 Jun 2018

🎉15

link to @cpuguy83 note: https://github.com/moby/buildkit/pull/442

glensc on 17 Jun 2018

👍1

@glensc @cpuguy83 When can we expect a release for this merged feature?

agolomoodysaada on 18 Jun 2018

👍9

+1

ekovacs on 22 Jun 2018

👎12

RUN --mount doesn't have volume support, so things like https://github.com/avsm/docker-ssh-agent-forward remain impossible at build time, what is the solution for this?

bufdev on 1 Oct 2018

👍1

@peter-edge https://github.com/moby/buildkit/pull/655

AkihiroSuda on 2 Oct 2018

docker build --secret is finally available in Docker 18.09 https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066

Can we close this issue?

AkihiroSuda on 11 Nov 2018

👎7 😕5 👍2

--secret is not usable for the caching use case, from what I can tell.

jglick on 11 Dec 2018

👍1

@AkihiroSuda RUN --mount in general looks like something possibly fitting as the solution for this issue.

lig on 11 Dec 2018

Yes, I suppose RUN --mount=type=cache (for cache volume) and --mount=type=secret with docker build --secret (for secret volume) almost covers the issue.

AkihiroSuda on 11 Dec 2018

@AkihiroSuda so, a working example solving the original issue would be good to see

lig on 12 Dec 2018

@AkihiroSuda From the article (https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066) I saw 2 use cases of using mount during build: Secret and SSH

[Secret]

docker build --secret id=mysite.key,src=path/to/mysite.key .
RUN --mount=type=secret,id=mysite.key,required <command-to-run>

[SSH]

RUN --mount=type=ssh git clone [email protected]:myorg/myproject.git myproject

There are 2 other use cases (that I remember) that aren't explained how to use in the article nor in this issue:

1) [Cache] RUN --mount=type=cache
2) Volumes in general (for example, to mount SSL certificates, or in the case of large volumes that should be used during build, but not included in the generated image, and so on...)

lucasbasquerotto on 12 Dec 2018

👍3

Once use case is mounting yarn workspace before running webpack

Puneeth-n on 12 Dec 2018

👍3

You can do all of this..

RUN --mount=type=cache,from=<some image>,source=<path in from image>,target=<target>

You can also replace from=<some image> to from=<some build stage>

Here's a contrived example:

# syntax=docker/dockerfile:1.0.0-experimental
FROM busybox as hello
RUN  echo hello > /hello.txt

FROM scratch
RUN --mount=type=cache,from=busybox,source=/bin,target=/bin --mount=type=cache,from=hello,source=/hello.txt,target=/tmp/hello.txt echo /tmp/hello.txt

cpuguy83 on 12 Dec 2018

👍6

Here's some documentation on this: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

cpuguy83 on 12 Dec 2018

👍2

I agree with @AkihiroSuda, this should handle all the cases... but please do let us know if it does not.

cpuguy83 on 12 Dec 2018

@AkihiroSuda @cpuguy83 : Unfortunately, the current implementation (buildkit in docker 18.09) has issues with private registries. As of now, these new features can't be used if you have to fetch your images through a private registry. See my tests in https://github.com/moby/moby/issues/38303.

lbndev on 17 Dec 2018

I think this would be also use for Jenkins artifacts, so for example if im creating a Docker image and compiling something inside i want to get some artifacts such as say junit pytest output

spstarr on 15 Mar 2019

This would be very useful. I really would rather not need to add --experimental to support RUN --mount=type=cache /user/.cache/pip pip install (in order to save tons of package index bandwidth).

buildah bud (buildah build-using-dockerfile) has a --volume/-v option:
https://github.com/containers/buildah/blob/master/docs/buildah-bud.md

buildah can run builds as non-root without a docker socket.

westurner on 17 Apr 2019

Because package downloads from the network are more reproducible?

westurner on 17 Apr 2019

No need to add "--experimental", only "DOCKER_BUILDKIT=1" on the client.

Yes, network builds are more reproducible in that the context is all in the Dockerfile. If you have to mount context from the host to make the build work it's a bad experience.

Note that you can also mount an image into the build.

cpuguy83 on 17 Apr 2019

Yes, network builds are more reproducible in that the context is all in the Dockerfile.

Surely having RUN apt-get update in the Dockerfile makes sure that one has all the steps needed to build the image. However, it is not reproducible since additional context gets downloaded from a third party. The only difference with a mount is that all external contexts are indeed defined in the Dockerfile.

If you have to mount context from the host to make the build work it's a bad experience.

My bad experience with Docker build is that its never reproducible and we could definitely benefit from mounting a cache from the host which would arguably speed up some use cases.

What I end up doing eventually is to have a multistage build. One image that get the context from network, which thus act as a snapshot of the remote context. Then tag that with some arbitrary version, the date works fine. Eg:

RUN apt-get update

docker build -t aptupdate-20190417

And in the actual image:

FROM aptupdate-20190417
FROM somebaseimage

COPY --from=aptupdate-20190417 /var/apt /var/apt

Repeat with other remote context and you more or less have something which is reproducible.

Or in short: a Dockerfile that relies on network access is probably not reproducible. A mount might make it not reproducible but would help making some use cases reproducible. But I guess the point that Dockerfile should have all the steps required to actually build the image, though in my experience most write their own tooling to instrument building images.

hashar on 17 Apr 2019

👍2

I mean, RUN --mount=type=cache is exactly for this.
Or you can even mount from another image from a registry and it will be fetched.

Your apt commands can be made (relatively) reproducible by pinning what you want to fetch.
But if you really want to control all the bits, then why are you using apt in your build? Storing this on a build host is not reproducible and easily breaks from host to host.
Keeping it in a registry is not bad other than the potential for network failure... which is of course a fair criticism.

-v on buildah and redhat's fork was explicitly rejected here because it's overly broad... not to say it's not useful, but it easily breaks from host to host, which goes against the design of docker build.
Meanwhile the reason RH added it (or more precisely why they decided to work on it) was to be able mount in RHEL credentials into the build environment.

cpuguy83 on 17 Apr 2019

👍1

Yes, network builds are more reproducible in that the context is all in the Dockerfile. If you have to mount context from the host to make the build work it's a bad experience.

I vehemently disagree. The network may be down or compromised; in which case a local cache prevents the whole build from failing while the internet is down.

I could specify volumes: once in my docker-compose.yml; but instead need to do DOCKER_BUILDKIT=1 and add RUN --mount=type=cache in Dockerfiles managed upstream? Why?

With CI builds, we're talking about a nontrivial amount of unnecessary re-downloading tens to thousands of packages (tens or hundreds of times a day) that could just be cached in a volume mount (in a build that runs as nonroot without the ability to execute privileged containers with their own volumes on the host).

Package indexes are in many cases generously supported by donations. Wasting that money on bandwidth to satisfy some false idea of reproducibility predicated upon a false belief that remote resources are a more reproducible cache of build components is terribly frustrating.

Please just add --volume so that my docker-compose.yml works.

westurner on 18 Apr 2019

Please just add --volume so that my docker-compose.yml works.

Making your "docker-compose" just work is backwards.
docker-compose consumers this project, not the other way around.

cpuguy83 on 18 Apr 2019

docker-compose interacts with the docker socket. docker-compose YAML is a consolidated way to store container options (which can be converted to k8s pod defs (which podman supports, to a degree)). How should I specify DOCKER_BUILDKIT=1 in a reproducible way? I could specify build_volumes: in a reproducible way in a docker-compose.yml.

When I -- in my CI build script that runs n times a day -- build an image by e.g. calling docker-compose build (e.g. with ansible) or packer (instead of buildah and podman), I have a few objectives:

Save Resources / Don't Waste Resources
- Don't constantly re-download OS and per-language packages.
- Save the bandwidth resources of my organization the package index.
Assure Availability
- It should / must work offline
- It should depend on as few components as necessary
Assure Build Integrity
- Be able to recreate the same image with the same parameters
- Isolate Variance / Deliver Reproducible Builds
- We do not control remote resources like package indexes.
- We do not control the network path
  - DNSSEC and DNS over HTTPS probably aren't implemented correctly
- We can get banned and fairly rate-limited
- We should use signed checksums to verify signed resources
  - Authorization to access and sign with keys is delegated somewhere
  - ENVironment variables are available to all processes in the container namespace
  - Build-time volume mounts are one way to share keys necessary only at build time (without leaking them into the image cache)
Keep Builds Fast
- Cache and memoize frequent operations.
- Caches do add points of failure, potential variance, and risk unreproducbility.
  - HTTP(S!) Proxy Cache
  - Application Layer Cache over the network
  - Local filesystem cache
- Implement dumb, flushable caches without external dependencies
  - Local filesystem cache

If I need to flush the cache volume, I can flush the cache volume.

0. Status quo

RUN pip install app && rm -rf /root/.cache

Possible today
O(n_builds): Bandwidth usage
Will not work offline: Depends on the network
Slower rebuilds

A. Copies

COPY . /app/src/app
COPY .cache/pip /app/.cache/pip
RUN pip install /app/src/app \
    && rm -rf /app/.cache/pip

Possible today
~O(1) package index bandwidth
O(n): On every build (* ONBUILD)
- Copies the cache
- Unarchives packages
- Deletes the cache
Works offline
Slower rebuilds

B. Fork and modify every Dockerfile from upstream to add `RUN --mount=type=cache` and set an environment variable

# Fork by copying to modify every pip install line
RUN --mount=type=cache /app/.cache/pip pip install /app/src/pip

$ DOCKER_BUILDKIT=1 docker build . [...]

Possible today
This already introduces unreproducibility: there's an extra- Dockerfile, extra- docker-compose.yml parameter that introduces variance in the output: the named built image.
Docs: how to flush the --mount=type=cache cache (?)
~O(1) package index bandwidth
Works offline
Fast rebuilds

C. Specify a volume that gets mounted at build time

C.1. buildah

$ buildah bud -v .cache/pip:/app/.cache.pip

Possible today
Also introduces unreproducibility
Docs: how to flush the cache
~O(1) package index bandwidth
Works offline
Fast rebuilds

C.2. docker (what this issue is asking for)

C.2.1 docker CLI

$ docker build -v .cache/pip:/app/.cache.pip

Not possible today
Also introduces unreproducibility
Docs: how to flush the cache
~O(1) package index bandwidth
Would work offline
Fast rebuilds

C.2.2 docker compose

services:
  app:
    image: imgname:latest
    build: .
    build_volumes:  # "build_volumes" ?
    - ./.cache/pip:/app/.cache/pip

$ docker-compose build

Not possible today
Also introduces unreproducibility
Docs: how to flush the cache
Would require a docker-compose schema revision
~O(1) package index bandwidth
Would work offline
Fast rebuilds

...

COPY || REMOTE_FETCH || read()
- Which of these are most reproducible?

westurner on 18 Apr 2019

👍15 🚀3

:point_up: Just a reminder. You can pin a downloaded file by checking its checksum. Some package manager, such as pip, also support that.

Yajo on 18 Apr 2019

@westurner Thanks for the detailed explanation.

I think that the following would be similar to your case B, but you could clear the cache and it would end up like your case C2 (what you are asking for, I think):

_docker-compose.yml:_

services:
  my-cache:
    build: ./my-cache
    image: local/my-cache

  my-image:
    build: ./my-image

_my-cache/Dockerfile:_

FROM python

RUN pip install app

_my-image/Dockerfile:_

FROM my-repo/my-image

RUN --mount=target=/export,type=bind,from=local/my-cache

RUN pip install /app/src/app

(https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypecache)

You can build the cache image with:

docker-compose build my-cache

The command RUN --mount=target=/export,type=bind,from=local/my-cache should bind to the image. If you want to refresh the cache you could remove and rebuild the cache image.

If this still uses the cache in the RUN --mount... you can use a .env file with a version, include the version in image: local/my-cache:$MY_VERSION and from=local/my-cache:$MY_VERSION (it should be included as a build arg).

You could include the my-cache service in another docker-compose file if you don't want it to be in the same file as your main services.

You would still need to use DOCKER_BUILDKIT=1 (like in your B case, but I think this won't be necessary in future versions) and it would still not be reproducible (but your C2 case isn't either).

What would be the penalty you see if it isn't reproducible? If you put the cache image local/my-cache in docker hub (with a different repo name) or in a private registry and use versions for each build (that will create a different cache), with the same version having always the same cache, wouldn't it make it reproducible? You wouldn't even need to include the service in the docker-compose file and call the build command. (The Docker Hub should be accessed from the network, but it is the same for your other images, I assume, and after you download once, it should not be needed anymore, unless you generate a new version with a new cache)

DISCLAIMER: I haven't tested the above code.

lucasbasquerotto on 18 Apr 2019

@Yajo The checksum support in pip was originally implemented in 'peep' and then merged into pip. You can add known good hashes as URL fragments in pip requirements file entries. (There is funding for security improvements in the PyPA project this year; TUF (The Update Framework; just like Docker Notary) support in PyPI is planned for later this year.) Correctly bootstrapping pip and PyPI (with keys and trust) in docker images will likely be a topic later this year.
(edit; a bit OT but for the concerned) https://discuss.python.org/t/pypi-security-work-multifactor-auth-progress-help-needed/1042/

westurner on 18 Apr 2019

@lucasbasquerotto Thanks for your help. This is significantly more complicated than just specifying --volume at build time. Namely, it seems to require:

Specifying DOCKER_BUILDKIT=1 in the docker build shell env
Modifying any/every upstream Dockerfile RUN instruction with (edit) RUN --mount=type=cache and args
Read/write access into another image? Mutability! Or is said cache frozen with probably-stale versions?

If I can COPY files from the host, or specify build-time parameters that aren't stored elsewhere, I don't see how mounting a volume at build time is any less reproducible?

COPY || REMOTE_FETCH || read()

Which of these are most reproducible?

westurner on 18 Apr 2019

@westurner

Specifying DOCKER_BUILDKIT=1 in the docker build shell env

If you use docker-compose, as I saw in your other posts, and if you are running it from a container, like:

$ sudo curl -L --fail https://github.com/docker/compose/releases/download/1.24.0/run.sh -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose

Then you can edit the downloaded file in /usr/local/bin/docker-compose to use that env variable. Change from:

exec docker run --rm $DOCKER_RUN_OPTIONS $DOCKER_ADDR $COMPOSE_OPTIONS $VOLUMES -w "$(pwd)" $IMAGE "$@"

to

DOCKER_BUILDKIT=1
exec docker run --rm $DOCKER_RUN_OPTIONS $DOCKER_ADDR $COMPOSE_OPTIONS $VOLUMES -w "$(pwd)" --env DOCKER_BUILDKIT=$DOCKER_BUILDKIT $IMAGE "$@"

This is a very easy change and it's transparent to whoever runs the command.

_(If you don't run as a container, then the above doesn't apply)_

Modifying any/every upstream Dockerfile RUN instruction with RUN --cache and args

In the case I exposed, it would be RUN --mount=type=bind..., but in any case, having to change the Dockerfile is also bad IMO. A -v option would really be much better and more transparent.

Read/write access into another image? Mutability! Or is said cache frozen with probably-stale versions?

When you bind the image it would probably create a container (or whatever would be the name, with a replicated filesystem), and changes done there while building shouldn't change the original image (it wouldn't make sense). So if you build using a cache image named my-repo/my-cache:my-version in a build, in the next build it would be exactly the same (imutability). If you want to use a more up-to-date cache, you can create a new image with a new version and use it, like my-repo/my-cache:my-new-version.

Which of these are most reproducible?

I consider reproducible as something that would be exactly the same even if you run it in another machine. In this sense, if you push an image to a (safe and reliable) docker registry, and never change that image, I would consider it reproducible (if you have concerns about internet connection, you could use a private registry and access it inside a VPN or something like that (never used a private registry myself)).

If the COPY command is copying your machine cache, I don't consider it reproducible because if you run pip install (or apt-get, or whatever) in another machine, at another time, can you guarantee that the contents of the cache will be the same? Maybe this could be a concern for you. Maybe not.

On the other hand, if you have files that you own in some reliable place that you "own" (like a S3 bucket), download those files into your machine and copy those files with the COPY command, then you can reproduce it from another machine with the same results (assuming the files haven't changed, and the other machine is identical to the previous one). So I would consider this as being reproducible. It depends from where those files are coming and how much control you have over them.

Truth be told, I don't consider anything as being 100% reproducible in all cases (after all, hardware can fail), but the more reliable, the better. When I refer to some process being reproducible, I'm mainly refering to it's contents and result being the same, and this would include something downloaded from the network, assuming that the contents don't change over time (I disconsider the possibility of network failure in this case).

lucasbasquerotto on 18 Apr 2019

👍1

There's some kind of Docker networking bug which makes go mod download unreliable inside a container, too (at least for applications our size), so just running it every time to download all of my GOPATH/pkg/mod over again is not just wasteful, but broken. 🤷‍♀

I could avoid a whole lot of unnecessary file copying if I could use --volume!

kevincantu on 8 Aug 2019

@kevincantu RUN --mount=type=cache should cover your usecase

AkihiroSuda on 8 Aug 2019

That requires at least one successful download of modules from within a docker build, and in this particular case I've not yet ever seen that..

kevincantu on 8 Aug 2019

https://github.com/moby/moby/issues/14080#issuecomment-484314314 by @westurner is a pretty good overview but I couldn't get buildkit to work:

$ sudo docker -v
Docker version 19.03.1, build 74b1e89

$ sudo DOCKER_BUILDKIT=1 docker build .
Ä+Ü Building 0.1s (2/2) FINISHED                                                                                                                
 => ÄinternalÜ load build definition from Dockerfile                                                                                       0.0s
 => => transferring dockerfile: 407B                                                                                                       0.0s
 => ÄinternalÜ load .dockerignore                                                                                                          0.0s
 => => transferring context: 2B                                                                                                            0.0s
failed to create LLB definition: Dockerfile parse error line 8: Unknown flag: mount

My Dockerfile does start with # syntax=docker/dockerfile:experimental.

I'd actually like to use it via docker-compose. Tried ENV DOCKER_BUILDKIT 1 in the Dockerfile and also passing it from docker-compose.yml via ARG DOCKER_BUILDKIT but it's all the same:

$ sudo docker-compose up --build
Building web
ERROR: Dockerfile parse error line 10: Unknown flag: mount

@lucasbasquerotto How would what you proposed in https://github.com/moby/moby/issues/14080#issuecomment-484639378 translate to an installed version of docker-compose?

Finally, I'm not even sure if this would cover my use case, perhaps some of you can tell me whether I should pursue this. I want to use a build-time cache for local development which survives between builds so that upon updating dependencies only the new ones would have to be downloaded. So I would add RUN --mount=type=cache,target=/deps to the Dockerfile and set the dependency manager's cache to /deps.

thisismydesign on 30 Aug 2019

for docker compose see https://github.com/docker/compose/pull/6865, which will be in an upcoming release candidate of compose

thaJeztah on 30 Aug 2019

👍3

I have another use case... I want to build containers for arm on an x86_64 host with configured binfmt. This requires that i have the architecture specific static qemu cpu emulator in /usr/bin.

My current solution is to add qemu-arm-static into the container as file like:

FROM arm32v7/alpine:3.10
COPY qemu-arm-static /usr/bin/qemu-arm-static
RUN apk update && apk upgrade
RUN apk add alpine-sdk cmake
...

The easier solution would be to mount my file only if needed inside the container like:
docker build -v /usr/bin/qemu-arm-static:/usr/bin/qemu-arm-static -t test:arm32v7 .
This does work very good for docker run, but i miss this functionality for building containers.

Is there another solution how i can build arm container on x86_64 hosts or can we allow volumes at build time for at least this case?

jneuhauser on 16 Oct 2019

@jneuhauser latest kernels allow these binaries to be statically loaded, so there‘s no need to configure them every time. You can achieve this e.g. by running the linuxkit/binfmt image in privileged mode once after boot.

alehaa on 16 Oct 2019

latest kernels allow these binaries to be statically loaded, so there‘s no need to configure them every time.

@alehaa Don't you still need the static qemu emulator binary within the container, though?

cybe on 20 Oct 2019

@cybe This is not required anymore, if the F-flag is used (which is what the linuxkit/binfmt package does). You can find more information about this here.

alehaa on 20 Oct 2019

👍1

Could someone provide a working setup for trying out buildkit? I can't get it working on Ubuntu. My setup is as follows:

cat /etc/docker/daemon.json

{
  "experimental": true
}

Dockerfile

# syntax=docker/dockerfile:experimental

FROM ruby:2.6.3

RUN --mount=type=cache,target=/bundle/vendor

sudo docker -v

Docker version 19.03.1, build 74b1e89

DOCKER_BUILDKIT=1 sudo docker build .

Error response from daemon: Dockerfile parse error line 12: Unknown flag: mount

thisismydesign on 29 Oct 2019

sudo doesn't carry env vars with it unless you tell it to with sudo -E or declare the variable within the sudo.

cpuguy83 on 29 Oct 2019

❤2

~~I wrote a few words about this feature and created some minimal examples showing how to cache~~

Edit: see below

@cpuguy83 thanks!

thisismydesign on 29 Oct 2019

@thisismydesign sorry to ruin your excitement, but you can't --cache node_modules, it will not be present in the final image, so your app is broken.

glensc on 29 Oct 2019

@glensc Damn you're right.. is there a way to make a build-time cache part of the final image?

Honestly, I thought this would be considered for a feature advertised as

allows the build container to cache directories for compilers and package managers.

thisismydesign on 30 Oct 2019

You should be able to map ~/.npm instead… https://docs.npmjs.com/files/folders.html#cache

@thisismydesign

clintmod on 30 Oct 2019

👍1

You can use another image as a cache, though, either by building it in your Dockerfile or a literal image stored in a registry somewhere and use COPY --from

FROM example/my_node_modules:latest AS node_modules

FROM nodejs AS build
COPY --from=/node_modules node_modules 
...

This is just an example you can use this for many different things.

cpuguy83 on 30 Oct 2019

Ugh I hate to bring this up and get involved here (also hi friends)

but we have a use case for this.

Is there a good place I can get involved or a call or list I can join to get a digest here?

Also if we need someone to put some resources on this I have 1 kris nova and a small team I can probably persuade to look at this.

TLDR Can I code this please? Is there anyone I can talk to about this?

kris-nova on 12 Jun 2020

_TLDR_ Can I code this please? Is there anyone I can talk to about this?

I can't speak for Docker but my impression is that they're not open to adding volume mounting to builds (and that they should probably close this issue)

A lot of the use cases for buildtime -v are now covered by buildkit. It has at least resolved it for me.

unilynx on 12 Jun 2020

👍1

I will check out buildkit then - I also have some hacky bash that gets the job done if anyone is interested.

thanks @unilynx

kris-nova on 12 Jun 2020

+1 to @unilynx on closing this issue out, buildkit solved the build time volume issues for me too.

frezbo on 12 Jun 2020

I bet if someone dropped a few links and an example we could convince our friends to press the shiny close button.

(I would also benefit from them)

kris-nova on 12 Jun 2020

The use case of caching isn't solved for me and many others as the build time volumes with buildkit are not present in the final image.

thisismydesign on 12 Jun 2020

So I was able to pull all my build artifacts out of the temporary volume used at build time and reconstruct the image again with the previous cache using this bash I mentioned above.

I was able able to rebuild my image on top of itself such that the overlay filesystem only grabbed a small delta.

I was even able to re-use the volume for other images at build time.

are other folks not able to do this?

kris-nova on 12 Jun 2020

(cache) mounts are in the "experimental" front-end; described in https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md (about to head into a meeting, but I can link more extended examples)

thaJeztah on 12 Jun 2020

👍3 😕1

thanks @thaJeztah LMK if I can help here in any way :)

kris-nova on 12 Jun 2020

https://github.com/moby/moby/issues/14080#issuecomment-547662701

@thisismydesign sorry to ruin your excitement, but you can't --cache node_modules, it will not be present in the final image, so your app is broken.

@thaJeztah I don't believe the issue above is solved. Would love to take a look at some examples where it's possible to cache e.g. npm install during build time that will also allow the resulting image to use the cached installation.

@kris-nova I didn't solve this problem but then again I'm not looking to use bash scripts. Perhaps we need a new issue but this is a pretty common use case that AFAIK isn't solved yet.

thisismydesign on 12 Jun 2020

@thaJeztah Here are some examples using cache mounts showing how the final image won't contain the mount and there it doesn't cover many use cases of build-time caching:

thisismydesign on 12 Jun 2020

For npm: Wouldn't one use the cache mounts for the npm cache directory (see https://docs.npmjs.com/cli-commands/cache.html, usually ~/.npm)?

ankon on 12 Jun 2020

@ankon That could work, thanks, I'll give it a try. Another use case I'm not sure about is Bundler and Ruby.

thisismydesign on 12 Jun 2020

So I think (haven't tested yet) for Bundler you can at least get rid of the network dependency by using a build volume at $BUNDLE_PATH and then during the build

bundle install
bundle package
bundle install --standalone --local

This basically means you have a cached bundle install directory, from there you package gems into ./vendor/cache and re-install into ./bundle. But this doesn't spare the time around installing and building gems, it might actually make the build step longer.

thisismydesign on 12 Jun 2020

👍1

If you want to save the cached data into the image, then copy it into the image from the cache.

cpuguy83 on 12 Jun 2020

😄1

Thanks, however, it still is more a workaround because

you have to do an additional copy
I assume you have to have different directories between build and run environments (you cannot use the directory where you mounted a volume during the build, right?) so it requires additional setup

I don't know how much effort would it be to simply have a native option for mounting the same volume into the final image but I'm pretty sure it'd make the usage easier. These are just 2 examples from script languages where the way to use this cache wasn't obvious to me. I can most certainly imagine this will come up in different contexts as well.

thisismydesign on 12 Jun 2020

@thisismydesign It seems like what you want is to be able to share a cache between build and run?

cpuguy83 on 12 Jun 2020

buildkit is a linux only solution, what do we do on windows?

isanych on 12 Jun 2020

@thisismydesign I'm not sure why do you expect a (cache) mount to stay in the final image. I wouldn't expect this and I don't want to have ~1gb in my image just because of using download cache mount.

Bessonov on 12 Jun 2020

😕1

buildkit is a linux only solution, what do we do on windows?

You can use buildkit on Windows.

https://docs.docker.com/develop/develop-images/build_enhancements/

You may find it easier to set the daemon setting through the Docker for Windows UI rather than setting the environment variable before executing.

nigelgbanks on 13 Jun 2020

@nigelgbanks at the top of your link:

Only supported for building Linux containers

isanych on 13 Jun 2020

Oh sorry I just assume you were building Linux containers on Windows.

nigelgbanks on 13 Jun 2020

@thisismydesign It seems like what you want is to be able to share a cache between build and run?

That would solve my use case around caching, yes.

thisismydesign on 13 Jun 2020

Making this easier could save millions of re-downloads of packages in CI
builds per year.

Do any CI services support experimental buildkit features?

On Sat, Jun 13, 2020, 2:08 PM Csaba Apagyi notifications@github.com wrote:

@thisismydesign https://github.com/thisismydesign It seems like what
you want is to be able to share a cache between build and run?

That would solve my use case around caching, yes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-643657987, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAMNS6IEQDCO5F3LNHJK5TRWO6AJANCNFSM4BJB2OOA
.

westurner on 13 Jun 2020

Do any CI services support experimental buildkit features?

Do they have to explicitly support it? I'm using gitlab-ci with buildkit and it just works. After all, it's just a different way of invoking 'docker build'.

Of course, unless you bring your own runners to gitlab, odds of getting a cache hit during build are low anyway.

unilynx on 14 Jun 2020

Copying from a named stage of a multi-stage build is another solution
FROM golang:1.7.3 AS builder
COPY --from=builder
But then container image locality is still a mostly-unsolved issue for CI job scheduling

Runners would need to be more sticky and share (intermediate) images in a common filesystem in order to minimize unnecessary requests to (perenially-underfunded) package repos.

westurner on 14 Jun 2020

I just tried buildkit but it only marginally improves my workflow, which would be 100% helped by "real" volume or bind mounts to the host.

I am using docker build to cross-compile old glibc versions which should then be part of new build containers providing these glibcs to build under and link against.

Now the repeated glibc source download is solved by a bind mount (from buildkit), the archive can be read only, no problem. But I have no way to access the build dir for analysis after failed builds, since the container bombs out on error. (If I restart it to access it, it restarts the build, so that doesn't help).

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir, where if the build dir would have been a mount in the first place, it would have been so easy. (Just do make install after the build and I have a clean container without build dir and without the downloaded sources).

So I still believe this is a very valid feature request and would make our lives a lot easier. Just because a feature could be abused and could break other functionality if used, does not mean it should be avoided to implement it at all cost. Just consider it an extra use for a more powerful tool.

ppenguin on 22 Jun 2020

👍4

But I have no way to access the build dir for analysis after failed builds

Sounds like a feature request for buildkit. This is definitely a known missing piece.

One could do this today by having a target for fetching the "build dir". You'd just run that after a failed run, everything should still be cached, just need the last step to grab the data.
Understand this is a bit of a work-around, though.

Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir

Can you explain more what you are wanting/expecting here?

cpuguy83 on 22 Jun 2020

Can you explain more what you are wanting/expecting here?

In this case it's just wanting to kill 2 birds with 1 stone:

have an easy way to access intermediate results from the host (here "build dir analysis")
be sure that this storage space is not polluting the newly built image

Since this, and all the other cases where the build container (as well as "container build") needs to make building as painless as possible, would solved so much more elegantly by just providing -v functionality, I have a hard time understanding the resistance to provide this feature. Apart from the "cache-aware" functionality buildkit apparently offers, I can only see it as a convoluted and cumbersome way to achieve exactly this functionality, and only partially at that. (And in many cases where caching is the main goal, it would also be solved by -v, at the cost of having to lock the mounted volume to a specific container as long as it runs, but the cache with buildkit has the same restrictions afaict.)

ppenguin on 22 Jun 2020

👍4

Can you explain more what you are wanting/expecting here?

I'm using a multi-stage build process, where the build environment itself is containerized, and the end result is an image containing only the application and the runtime environment (without the build tools).

What I'd like is some way for the interim Docker build container to output unit test and code coverage results files to the host system in the events of both a successful build and a failed build, without having to pass them into the build output image for extraction (because the whole build process is short-circuited if the unit tests don't pass in the earlier step, so there won't be an output image in that situation, and that's when we need the unit test results the most). I figure if a host volume could be mounted to the Docker build process, then the internal test commands can direct their output to the mounted folder.

mcattle on 27 Jun 2020

👍2

@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).

ppenguin on 27 Jun 2020

👍4

There is no offense in saying another tool suits your needs better.

If something works for you, that's wonderful.

The shortcomings of both the v1 builder in Docker and the buildkit builder are pretty well understood in this context, and are looking at how to address those, just preferably without having to resort to bind mounts from the client.
GitHub notifications@github.com wrote:
“@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).”

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.

cpuguy83 on 27 Jun 2020

👍2

without having to resort to bind mounts from the client.

Here I explain why a build time -v option is not resorting to or sacrificing reproducibility any more than depending on network resources at build time.

https://github.com/moby/moby/issues/14080#issuecomment-484314314 :

COPY || REMOTE_FETCH || read()

Which of these are most reproducible?

I'm going with buildah for build time -v (and cgroupsv2) as well.

westurner on 27 Jun 2020

👍3

@mcattle I have had the same requirement. I solved it with labeling.

xellsys on 28 Jun 2020

👍1

I'm going with buildah for build time -v (and cgroupsv2) as well.

I'm seriously considering switch from Ubuntu (which has just docker) to Fedora (which has replaced docker with podman/buildah) on our build server because of "-v" support.

Btw. Podman supports also rootless mode, and so far it has seemed fully Docker compatible (except for differences in --user/USER impact, and image caching, that come from using rootless mode instead of running as root like Docker daemon does).

PS. while cgroups v2 is needed for rootless operation, support for that is more about container runtime, than docker. If you use CRun instead of RunC (like Fedora does), you would have cgroups v2 support. RunC does have some v2 & rootless support in Git, but I had some problems when testing it on Fedora (31) few months ago.

EDIT: Ubuntu has podman/buildah/etc in Groovy, just not in latest 20.04 LTS, I think imported from Debian unstable. It hasn't been backported to LTS, at least not yet. Whereas it's been in Fedora since 2018 I think.

eero-t on 27 Aug 2020

@eero-t perhaps you could describe your use-case, and what's missing in the options that BuildKit currently provides that is not addressed for those.

thaJeztah on 27 Aug 2020

Moby: build time only -v option

I would like to disassociate the declarative build process (infrastructure as code) from the container run-time deployable artifact. I do not want to have to deal with the dead weight of 1.5GB that I do not need.

Most helpful comment

All 258 comments

Problem

Possible solution

My Use Case

Conditions

Options

0. Status quo

A. Copies

B. Fork and modify every Dockerfile from upstream to add `RUN --mount=type=cache` and set an environment variable

C. Specify a volume that gets mounted at build time

C.1. buildah

C.2. docker (what this issue is asking for)

C.2.1 docker CLI

C.2.2 docker compose

...

Related issues

Moby: build time only -v option

I would like to disassociate the declarative build process (infrastructure as code) from the container run-time deployable artifact. I do not want to have to deal with the dead weight of 1.5GB that I do not need.

Most helpful comment

All 258 comments

Problem

Possible solution

My Use Case

Conditions

Options

0. Status quo

A. Copies

B. Fork and modify every Dockerfile from upstream to add RUN --mount=type=cache and set an environment variable

C. Specify a volume that gets mounted at build time

C.1. buildah

C.2. docker (what this issue is asking for)

C.2.1 docker CLI

C.2.2 docker compose

...

Related issues

B. Fork and modify every Dockerfile from upstream to add `RUN --mount=type=cache` and set an environment variable