As suggested by @cpuguy83 in https://github.com/docker/docker/issues/3156
here is the use case for a flexible -v option at build time.
When building a Docker image I need to install a database and an app. It's all wrapped up in two tarballs: 1 for the DB and 1 for the App that needs to be installed in it (schema, objects, static data, credentials etc.). The whole solution is then run via a shell script that handles several shell variables and tune OS credentials and other things accordingly.
When I explode the above tarball (or use the Dockerfile ADD directive) the whole thing bloats up to about 1.5GB(!). Not ideal as you can immagine.
I would like to have this '-v /distrib/ready2installApp:/distrib' directive still possible (as it is today in the Dockerfile)
but
Could we have an --unmount-volume option that I can run at the end of the Dockerfile?
or
Given how Volume works right now in a Dockerfile, maybe we need a new Dockerfile directive for a temporary volume that people use while installing? I think the Puppet example supplied by @fatherlinux was on a similar line...
or
Whatever you guys can think of.
The objective is avoiding to have to carry around all that dead weight that is useless for a deployed App or Service. However that dead weight is necessary @install-time. Not Everybody has a simple "yum install" from the official repositories. :)
thank you very much
I'm looking for a similar solution.
Recently the enterprise I work enabled Zscaler proxy with SSL inspection, which implies having certificates installed and some environment variables set during build.
A temporarily solution was to create a new Dockerfile with certificates and environment variables set. But that doesn't seem reasonable, in a long term view.
So, my first thought was set a transparent proxy with HTTP and HTTPS, but again I need to pass a certificate during build.
The ideal scenario is with the same Dockerfile, I would be able to build my image on my laptop, at home, and enterprise.
# Enterprise
$ docker build -v /etc/ssl:/etc/ssl -t myimage .
# Home
$ docker build -t myimage .
I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.
I.e:
docker build -v /home/dokku/cache/dnx/packages:/opt/dnx/packages -t "dokku/aspnettest" .
@yngndrw what you propose would be OK for me too, i.e, we need to mount extra resources at build time that would not be necessary at run time as they have been installed in the container.
FWIW I saw somewhere in these pages somebody saying something along the line of (and I hope I'm paraphrasing it right) "resolve your compilation issue on a similar host machine then just install the deployable artifact or exe in the container".
I'm afraid it's not that simple guys. At times, I need to install in /usr/bin but I also need to edit some config file. I check for the OS I'm running on, the kernel params I need to tune, the files I need to create depending on variables or manifesto build files. There are many dependencies that are just not satisfied with a simple copy of a compiled product.
I re-state what I said when I open the issue: there is a difference between a manifest declaration file and its process and the run-time of an artifact.
If we truly believe in infrastructure-as-code and furthermore in immutable infrastructure, that Docker itself is promoting further & I like it btw, then this needs to be seriously considered IMO (see the bloating in post 1 herewith)
Thank you again
Another use case that is really interesting is upgrading software. There are times, like with FreeIPA, you should really test with a copy of the production data to makes sure that all of the different components can cleanly upgrade. You still want to do the upgrade in a "build" environment. You want the production copy of the data to live somewhere else so that when you move the new upgraded versions of the containers into production, they can mound the exact data that you did the upgrade on.
Another example, would be Satellite/Spacewalk which changes schema often and even changed databases from Oracle to Postgresql at version 5.6 (IIRC).
There are many, many scenarios when I temporarily need access to data while doing an upgrade of software in a containerized build, especially with distributed/micro services....
Essentially, I am now forced to do a manual upgrade by running a regular container with a -v bind mount, then doing a "docker commit." I cannot understand why the same capability wouldn't be available with an automated Dockerfile build?
Seconding @yngndrw pointing out caching: the exact same reasoning applies to many popular projects such as Maven, npm, apt, rpm -- allowing a shared cache can dramatically speed up builds, but must not make it into the final image.
I agree with @stevenschlansker. It can be many requirements for attach cache volume, or some kind of few data gigabytes, which must present (in parsed state) on final image, but not as raw data.
I've also been bitten by the consistent resistance to extending docker build
to support the volumes that can be used by docker run
. I have not found the 'host-independent builds' mantra to be very convincing, as it only seems to make developing and iterating on Docker images more difficult and time-consuming when you need to re-download the entire package repository every time you rebuild an image.
My initial use case was a desire to cache OS package repositories to speed up development iteration. A workaround I've been using with some success is similar to the approach suggested by @fatherlinux, which is to just give up wrestling with docker build
and the Dockerfile
altogether, and start from scratch using docker run
on a standard shell script followed by docker commit
.
As a bit of an experiment, I extended my technique into a full-fledged replacement for docker build
using a bit of POSIX shell scripting: dockerize.
If anyone wants to test out this script or the general approach, please let me know if it's interesting or helpful (or if it works at all for you). To use, put the script somewhere in your PATH and add it as a shebang for your build script (the #!
thing), then set relevant environment variables before a second shebang line marking the start of your Docker installation script.
FROM
, RUNDIR
, and VOLUME
variables will be automatically passed as arguments to docker run
.
TAG
, EXPOSE
, and WORKDIR
variables will be automatically passed as arguments to docker commit
.
All other variables will be evaluated in the shell and passed as environment arguments to docker run
, making them available within your build script.
For example, this script will cache and reuse Alpine Linux packages between builds (the VOLUME mounts a home directory to CACHE, which is then used as a symlink for the OS's package repository cache in the install script):
#!/usr/bin/env dockerize
FROM=alpine
TAG=${TAG:-wjordan/my-image}
WORKDIR=/var/cache/dockerize
CACHE=/var/cache/docker
EXPOSE=3001
VOLUME="${HOME}/.docker-cache:${CACHE} ${PWD}:${WORKDIR}:ro /tmp"
#!/bin/sh
ln -s ${CACHE}/apk /var/cache/apk
ln -s ${CACHE}/apk /etc/apk/cache
set -e
apk --update add gcc g++ make libc-dev python
[...etc etc build...]
So, after meeting the French contingent :) from Docker at MesoCon last week (it was a pleasure guys) I was made aware they have the same issue in-house and they developed a hack that copies over to a new slim image what they need.
I'd say that hacks are note welcome in the enterprise world ;) and this request should be properly handled.
Thank you for listening guys...
I'm also in favor of adding build-time -v
flag to speed up builds by sharing a cache directory between them.
@yngndrw I don't understand why you closed two related issues. I read your #59 issue and I don't see how this relates to this. In some cases containers become super-bloated when it's not needed at run-time. Please read the 1st post.
I hope I'm not missing something here... as it has been a long day :-o
@zrml Issue https://github.com/aspnet/aspnet-docker/issues/59 was related to the built-in per-layer caching that docker provides during a build to all docker files, but this current issue is subtly different as we are talking about using host volumes to provide dockerfile-specific caching which is dependent on the dockerfile making special use of the volume. I closed issue https://github.com/aspnet/aspnet-docker/issues/59 as it is not specifically related to the aspnet-docker project / repository.
The other issue that I think you're referring to is issue https://github.com/progrium/dokku/issues/1231, which was regarding the Dokku processes explicitly disabling the built-in docker layer caching. Michael made a change to Dokku in order to allow this behaviour to be configurable and this resolved the issue in regards to the Dokku project / repository, so that issue was also closed.
There is possibly still a Docker-related issue that is outstanding (I.e. Why was Docker not handling the built-in layer caching as I expected in issue https://github.com/aspnet/aspnet-docker/issues/59), but I haven't had a chance to work out why that is and confirm if it's still happening. If it is still an issue, then a new issue for this project / repository should be raised for it as it is distinct from this current issue.
@yngndrw exactly, so we agree this is different and known @docker.com so I'm re-opening it if you don't mind... well I cannot. Do you mind, please?
I'd like to see some comments from our colleagues in SF at least before we close it
BTW I was asked by @cpuguy83 to open a user case and explain it all, from log #3156
@zrml I'm not sure I follow - Is it https://github.com/aspnet/aspnet-docker/issues/59 that you want to re-open ? It isn't an /aspnet/aspnet-docker issue so I don't think it's right to re-open that issue. It should really be a new issue on /docker/docker, but would need to be verified and would need re-producible steps generating first.
no, no.. this one #14080 that you closed yesterday.
This issue is still open ?
@yngndrw I believe I mis-read the red "closed" icon. Apologies.
Heartily agree that build time -v would be a huge help.
Build caching is one use case.
Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault
I'm commenting here because this is hell in a corporate world.
We have a SSL intercepting proxy, while I can direct traffic through it, heaps of projects assume they have good SSL connections, so they die horribly.
Even though my machine (and thus the docker builder) trusts the proxy, docker images don't.
Worst still the best practice is now to use curl inside the container, so that is painful, I have to modify Dockerfiles to make them even build. I could mount the certificates with a -v option, and be happy.
This being said. Its less the fault of docker, more the fault of package managers using https when they should be using a system similar to how apt-get works. As that is still secure and verifyable, and also cacheable by a http proxy.
@btrepp thank you for another good use case.
I can think of another situation.
One of the things I would like to do with my dockerfiles is not ship the build tools with the "compiled" docker file. There's no reason a C app needs gcc, nor a ruby app need bundler in the image, but using docker build currently while have this.
An idea I've had is specifying a dockerfile, that runs multiple docker commands when building inside it. Psuedo-ish dockerfiles below.
Docker file that builds others
FROM dockerbuilder
RUN docker build -t docker/builder myapp/builder/Dockerfile
RUN docker run -v /app:/app builder
RUN docker build -t btrepp/myapplication myapp/Dockerfile
btrepp/myapplication dockerfile
FROM debian:jessie+sayrubyruntime
ADD . /app //(this is code thats been build using the builder dockerfile
ENTRYPOINT ["rails s"]
Here we have a temporary container that does all the bundling install/package management and any build scripts, but it produces the files that the runtime container needs.
The runtime container then just adds the results of this, meaning it shouldn't need much more than ruby installed. In the case of say GCC or even better statically linked go, we may not need anything other than the core OS files to run.
That would keep the docker images super light.
Issue here is that the temporary builder container would go away at the end, meaning it would be super expensive without the ability to load a cache of sorts, we would be grabbing debian:jessie a whole heap of times.
I've seen people to certain techniques like this, but using external http servers to add the build files. I would prefer to keep it all being build by docker. Though there is possibly a way of using a docker image to do this properly. Using run and thus being able to mount volumes.
Here is another example. Say I want to build a container for systemtap that has all of the debug symbols for the kernel in it (which are Yuuuuge). I have to mount the underlying /lib/modules so that the yum command knows which RPMs to install.
Furthermore, maybe I would rather have these live somewhere other than in the 1.5GB image (from the debug symbols)
I went to write a Dockerfile, then realize it was impossible :-(
docker run --privileged -v /lib/modules:/lib/modules --tty=true --interactive=true rhel7/rhel-tools /bin/bash
yum --enablerepo=rhel-7-server-debug-rpms install kernel-debuginfo-$(uname -r) kernel-devel-$(uname -r)
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
52dac30dc495 rhel7/rhel-tools:latest "/bin/bash" 34 minutes ago Exited (0) 15 minutes ago dreamy_thompson
docker commit dreamy_thompson stap:latest
I'd like to repeat my use case here from #3949 as that bug has been closed for other reasons.
I'd really like to sandbox proprietary software in docker. It's illegal for me to host it anywhere, and the download process is not realistically (or legally) able to be automated. In total, the installers come to about 22GB (and they are getting bigger with each release). I think it's silly to expect that this should be copied into the docker image at build time.
Any news in this needed feature?
thank you
_USER POLL_
_The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right._
The people listed below have appreciated your meaningfull discussion with a random +1:
@vad
+1 for this feature!
Another use case is using ssh keys at build time for building from private repos without them being stored in the layer, eliminating the need for hacks (though well engineered) such as this one: https://github.com/dockito/vault
This is our usecase as well (ssh keys rendered using tmpfs on the host in this case).
Another usecase for this is for a local cache of the node_modules
directory on a CI server to reduce build-times.
npm install
is very slow and even in the current "best" case where the package.json
is ADD
ed to the image, npm install
is run and only then are the actual project sources added and built on changes to package.json
all dependencies have to be redownloaded again.
See npm/npm#8836 for an issue about this on the Node/npm side.
Related aspnet-docker issue regarding slow package restoration and the resulting image size of caching the current packages in the layer. Would be much better to use a mounted volume for the caching of the package.
https://github.com/aspnet/aspnet-docker/issues/123
This isn't a language-specific issue, it will affect many people given that package managers are now an accepted standard.
The OP has nailed the issue on the head, in that "docker build -v" would greatly help decoupling the build process from the runtime environment.
I've seen several projects which now build "Mulberry harbours" which are then used to build the actual docker that is then pushed/distributed. This is overly complex from both an administration and compute resource perspective, which in turn translates to slower CI and unit testing, and overall a less productive development workflow.
I've been thinking about this, and the other option I can think of is the ability to mark layers as "src" layers.
Something along the lines of those layers only being accessible during a docker build, but not pulled in the resulting image file.
This way docker can cache earlier layers/images, temporary build artifacts, but these aren't required to utilize the final image.
Eg.
FROM ubuntu
RUN apt-get install gcc
ADDPRIVATE . /tmp/src <--these can be cached by docker locally
RUNPRIVATE make <-- basically these layers become scoped to the current build process/dockerfile
RUN make install <--result of this layer is required.
Of course this means you would need to know what you are doing better, as you could very well leave critical files out.
@yngndrw
A much better solution for situations like netcore would be for them to not use HTTPS for package management, then its trivial to set up iptables+squid to have a transparent caching proxy for docker builds.My personal opinion is that these package managers should up their game, they are terrible to use in corporate environments due to ssl resigning, whereas things such as apt-get work perfectly fine and are already cacheable with iptables+squid for docker.
I can also see a downside to using build time volumes, dockerfiles won't be as reproducible, and it's going to require extra setup outside of docker build -t btrepp/myapp ., It's also going to make automated builds on dockerhub difficult.
@btrepp: I like your suggestion. I could even live for my use cases with an hardcoded (I know it's generally a bad thing) TMP dir that Docker tells us about so that they know when they build the final artifact from all the layers that they can forget/leave out the one mounted on /this_is_the_tmp_explosion_folder_that_will_be_removed_from_your_final_container_image
easy enough....
@btrepp I quite like your source layer idea.
However regarding package managers not using SSL, I would have to disagree.
If you were to want to cache packages like that, then you should probably use a (Local) private package feed instead which mirrors the official source. Reverting to HTTP seems like a bad idea to me, especially given that a lot of package managers don't seem to sign their packages and therefore rely on HTTPS.
There is a tool grammarly/rocker which can be used while this issue is not yet fixed.
@yngndrw
My point being the local proxy etc is a problem that was long solved. Package managers only need verification, they don't need privacy. Using https is a lazy way of providing verification, but it comes with the privacy attachment.
There's zero reason "super_awesome_ruby_lib" needs to be private when being pulled down via http(s). The better way would be for ruby gems to have a keyring. Or even a known public key, and for it to sign packages. This is more or less how apt-get works, and allows for standard http proxies to cache things.
Regarding a local private package feed, docker doesn't even support this well itself. There's zero way of disabling the standard feed, and it _rightly_ looses it if the https certificate is not in the cert store. I'm pretty sure docker always wants to at least check the main feed when pulling images too. Afaik the rocket/rkt implementation was going to use signing+http to get container images.
If the main motivation for build time volumes is just cache-ing of packages, then I think pressure should be placed on the package managers to better support cacheing, rather than compromising some of the automated/pureness of docker currently.
To be clear, i'm not advocating package managers switch to just using http and dropping https. They do need verification of packages to prevent against man in the middle attacks. What they don't need is the privacy aspect using https as a "security catch all sledgehammer" offers.
That's a really narrow view. You're asking the entire universe of package managers to change how they behave to fit Docker's prescription of how they think applications will be built.
There's also a ton of other examples of why this is necessary in this thread. Saying "well you should just change how all the tools you use to build your applications work" doesn't drive the problem away, it'll only drive the users away.
(I also strongly disagree with Docker's attachment to the public registry -- I would very much prefer to forbid access to the public registry, and only allow our internal one to be used. But that's a different subject entirely.)
For me I also need docker build -v
.
In our case we want to build an image which consists of a pre-configured installation of the concerned product, and the installer is over 2GB. Not being able to mount a host volume, we're not able to build the image with the installer even though we've already downloaded in the host OS, for which we can use various tools/protocols, say proxy with https cert/auth, or maybe even bit torrent.
As a workaround, we have to use wget to re-download the installer during docker build
, which is a much restricted environment, much less convenient, more time consuming, and error prone.
Also because of the flexibility of the product installation/configuration options, it makes much more sense for us to ship the images with the product pre-installed, rather than shipping an image merely with the installer.
@thaJeztah any chance of this happening?
Fwiw this is the sole reason I don't (or really, can't) use docker
We carry a patch in Red Hat versions of docker that include the -v option. But the true solution to this would be to build new and different ways to build OCI Container Images other then docker build.
@rhatdan RHEL or Fedora?
We also have implemented the -v option of docker build in our internal version of docker at resin.io. You can find the diff here https://github.com/resin-io/docker/commit/9d155107b06c7f96a8951cbbc18287eeab8f60cc
@rhatdan @petrosagg can you create a PR for this?
@jeremyherbert the patch is in the docker daemon that comes in all recent versions of RHEL, CentOS, and Fedora...
@graingert We have submitted it in the past and It has been rejected.
@rhatdan do you have a link to it?
@runcom Do you have the link?
@thaJeztah is this something you guys would have rejected?
Here's a list of existing issues that have been closed or not responded to:
https://github.com/docker/docker/issues/3949
https://github.com/docker/docker/issues/3156
https://github.com/docker/docker/issues/14251
https://github.com/docker/docker/issues/18603
Info about the Project Atomic patches used in RHEL/CentOS/Fedora can be found at:
http://www.projectatomic.io/blog/2016/08/docker-patches/
@daveisfera looks like they only add R volumnes not RW volumes, so it won't work for @yngndrw and my use case.
@graingert Why do you need RW volumes? I do understand read-only as a work-around for certain cases.
Testing schema migrations would be one good reason...
On 11/01/2016 10:36 AM, Brian Goff wrote:
@graingert https://github.com/graingert Why do you need RW volumes?
I do understand read-only as a work-around for certain cases.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257582035,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZdp0D6fAtuNglajPBIwnpWGq3slOks5q5050gaJpZM4FIdOc.
Scott McCarty
scott.[email protected]
@fatherlinux
@cpuguy83 Another use-case for RW would be ccache
@fatherlinux I'm not sure I follow. Why would you need a volume for this? Also why must it be done during the build phase?
I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.
I would bind mount for example:
docker build -v /home/jenkins/pythonapp/cache/pip:/root/.cache/pip -t pythonapp .
docker build -v /home/jenkins/scalaapp/cache/ivy2:/root/.ivy2 -t scalaapp .
Because there are many times that schema migration has to be done when
the software is installed. If you run read-only containers, you should
never be installing software any time other than when you are in the
build phase.....
On 11/01/2016 10:42 AM, Brian Goff wrote:
@fatherlinux https://github.com/fatherlinux I'm not sure I follow.
Why would you need a volume for this? Also why must it be done during
the build phase?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257583693,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZfhBG8RUWtqPD-6RaLC7uoCNc-3nks5q50_TgaJpZM4FIdOc.
Scott McCarty
scott.[email protected]
@fatherlinux
I know that the contents of this directories will not cause the build to be host dependant (missing these mounts will cause the build to work anyway, just slower)
NFS solved this like 30 years ago...
On 11/01/2016 10:45 AM, Thomas Grainger wrote:
I know that the contents of this directories will not stop the build
being idempotent or host dependant (missing these mounts will cause
the build to work anyway)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257584576,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZS75Vq0BSEvUjI2oXORsS0el2mwOks5q51CQgaJpZM4FIdOc.
Scott McCarty
scott.[email protected]
@fatherlinux
NFS solved this like 30 years ago...
Not a helpful comment
@graingert sorry, that seriously came off wrong. I was trying to respond too quickly and did not give enough context. In seriousness, we are looking at NFS in combination with CRIO to solve some of these types of problems.
Both image registry and bulds have a lot of qualities in common. What you are talking about is basically a caching problem. NFS, and particularly the caching built in can make builds host independent and handle all of the caching for you.
Hence, even with a -v build time option, a build doesn't have to be locked to only one host. It might not be Internet scale independent, but it's quite enough for many people who control their build environment to a single site or location.
@fatherlinux I'd use gitlab or travis caching to take the cache directory and upload/download into S3
@graingert yeah, but that only works on certain types of data/apps, also only at the bucket level right, not at the posix meta data and block level. For certian types of front end and middleware apps, no problem. For a database schema migration, you kinda need to test ahead of time and have the cache local for speed and it typically needs to be posix.
Imagine I had a MySQL Galera cluster with 1TB of data and I want to do an upgrade and they are all in containers. Containerized/Orchestrated multi-node, sharded Galera is really convenient. I don't want to have to manually test a schema migration during every upgrade.
I want snapshot the data volume (pv in Kube world), the expose it to a build server, then test the upgrade and schema migration. If everything works right and tests pass, then we build the production containers and let the schema migrationi happen in production....
@graingert sorry, forgot to add, then discard the snapshot which was used in the test run... I don't want to orchestrate a build and test event separately, though that would be possible...
@fatherlinux I think that's an orthogonal use case...
@graingert not a useful comment. Orthogonal to what? Orthogonal to the request for a -v during build which is what I understood this conversation to be about?
There's a few different uses I see for this flag.
The later two use cases could be solved more cleanly with two new keywords.
BUILDCONSTFILE <path>
Would run a COPY <path>
before each RUN, and delete <path>
from the image after.
TEST <cmd> WITH <paths>
Which would COPY the paths, run the command, then with 0 exit status continue the build from the parent image, otherwise would halt the build
Personally I think TEST ... WITH is better handled in another CI step that tests your container as a whole
Let me preface with this: I _think_ I'm ok with adding --mount
to build ("-v" probably not so much). Not 100% sure on the implementation, how caching is handled (or not handled), etc.
For the docker project what we do is build a builder image.
It basically has everything we need, copies code in, but does not actually build docker.
We have a Makefile
that orchestrates this. So make build
builds the image, make binary
builds the binary with build
as a dependency, etc.
Making a binary runs the build image and does the build, with this we can mount in what we need, including package caches for incremental builds.
Most of this is pretty straight forward and easily orchestrated.
So there are certainly ways to handle this case today, just docker alone can't handle 100% of it (and that's not necessarily a bad thing) and you'll have to make this work with your CI system.
@cpuguy83 I think this would nail most of my use cases. Just so I understand, do you mean --mount to mean read only? and -v to be read/write?
@cpuguy83 we are also mostly building "builder" images which IMHO is becoming a more and more common pattern...
@fatherlinux swarm services and now (for 1.13) docker run
supports --mount
which is much more precise and flexible: https://docs.docker.com/engine/reference/commandline/service_create/#/add-bind-mounts-or-volumes
Looks like docs are missing the 3rd type of mount, tmpfs
.
Ahh, very cool, thank you...
On 11/01/2016 02:20 PM, Brian Goff wrote:
@fatherlinux https://github.com/fatherlinux swarm services and now
(for 1.13) |docker run| supports |--mount| which is much more precise
and flexible:
https://docs.docker.com/engine/reference/commandline/service_create/#/add-bind-mounts-or-volumesLooks like docs are missing the 3rd type of mount, |tmpfs|.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/docker/docker/issues/14080#issuecomment-257648598,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHLZXv_VBfVi4WUAjVijE-SKR0ErRC4ks5q54L2gaJpZM4FIdOc.
Scott McCarty
scott.[email protected]
@fatherlinux
@cpuguy83 we're also using the builder pattern a lot and we have a need for caching that is not persisted in the image and also survives layer invalidation.
We build Yocto images and we have a shared sstate cache on NFS storage. Another usecase is npm cache so that you can invalidate the whole RUN npm install
layer but re-calculate it faster due to cached packages.
As a possible compromise based on @graingert's post, could one not have an optional hash of your huge file in the dockerfile, and then docker checks this when running the build? There would be no issues with deterministic builds then, and it would be obvious to the person building that they don't have the required dependencies, rather than it just exploding with a strange error at some point. Same thing goes for ssh keys, etc which would need to be distributed with the dockerfile anyway.
I also think that any idea which requires _copying_ of the huge file is less than ideal. File sizes I am interested in using are on the order of 10-40GB, and even with a good ssd that's at least a minute or two worth of copying. This is my problem with the ADD directive already in docker; I don't want to ADD a 30GB to my image every time it builds and have to deal with having all of that extra free space, as well as needing to squash images.
That would not work for what we're using this for. We have a volume that contains sstate caches from the yocto build system that is bind-mounted RW in the build because any cache miss will be calculated during the build and saved in sstate for future ones. Our directories are also at ~30GB so even calculating the hash would take a while.
I never understood the deterministic build concept. There are ways to shoot yourself in the foot even with today's semantics. For example you can curl
something from an internal IP. Suddenly this Dockerfile doesn't work everywhere and it's host dependent. But there are legitimate cases of why you'd want to do that. For example a local HTTP cache.
So since builds are not deterministic anyway, and since one can emulate the bind-mounted volume over the network today, why not provide a native way of doing it with the appropriate warnings if need be?
@petrosagg @zrml @thaJeztah What we know is:
Dockerfile syntax is frozen
comment, and then after HEALTHCHECK
instruction was added, freeze was removed but issues remained closed)Given all that we know, I think this will likely be closed as either Dupe or WontFix. It doesn't seem to matter what use cases we give. Update: I am happy to be wrong here. The proposal looks open :)
Our company moved to an agnostic container runtime, and will soon have to move to an agnostic image building experience as well. But this won't be the right place to discuss that because negativity doesn't help. That should be a separate post.
@rdsubhas care to share the link when you are done?
@rdsubhas that's a nice summary. It doesn't look like this thread will be closed as dupe/wontfix since @cpuguy83 thinks he's ok with adding --mount
during the build which covers most usecases.
What I'd like to know is given that the current proposal:
Which are counter arguments left regarding the idea? If there aren't any maybe we should start discussing the implementation details for the --mount
mechanism.
To re-enforce the argument that builds are already host dependent and non-reproducible I provide a list of Dockerfile fragments with this property:
# Install different software depending on the kernel version of the host
RUN wget http://example.com/$(uname -r)/some_resource
# example.intranet is only accessible from specific hosts
RUN wget http://example.intranet/some_resource
# get something from localhost
RUN wget http://localhost/some_resource
# gcc will enable optimizations supported by the host's CPU
RUN gcc -march=native .....
# node:latest changes as time goes by
FROM node
# ubuntu package lists change as time goes by
RUN apt-get update
# install different software depending on the docker storage driver
RUN if [ $(mount | head -n 1 | awk '{print $5}') == "zfs" ]; then .....; fi
Honestly, if we just add the --mount
and let the user handle cache invalidation (--no-cache
), I think we'll be fine. We may want to look at finer-grained cache control from the CLI than all or nothing, but that's a separate topic.
I have been facing a similar issue for a while now, but I've opted to go with the increased size of the image until a solution is finalized. I'll try to describe my scenario here in case someone finds a better workaround.
--build-arg
to pass a token during build (strongly discouraged). This is a very attractive and easy option since "it just works" without any added steps.ADD
and COPY
are executed in separate layers so I'm stuck with data from previous layers. The size of some of my images more than doubled in some cases, but the overall size is tolerable for now. I think there was a PR (I can't seem to find it) to remove build time args from the build history, but it wasn't acceptable due to caching concerns irrc.
I'll be happy to hear of any other workarounds being used out there.
@misakwa we'll likely support secrets on build in 1.14.
That's very exciting to hear @cpuguy83. I'll keep an eye out for when its released. It'll definitely simplify some of my workflows.
we'll likely support secrets on build in 1.14.
Will it aslo work for mapping build-time mapping of other type of volumes like for example yarn-cache
?
BTW there is an intersting way to build production images using docker-compose, I found it working and quite effective:
So you have compose file docker-compose.build.yml
something like this:
services:
my-app:
image: mhart/alpine-node:7.1.0
container_name: my-app-build-container # to have fixed name
volumes:
- ${YARN_CACHE}:/root/.cache/yarn # attach yarn cache from host
- ${HOME}/.ssh:/.ssh:ro # attach secrets
- ./:/source
environment: # set any vars you need
TEST_VAR: "some value"
ports:
- "3000"
working_dir: /app/my-app # set needed correct working dir even if if doesn't exist in container while build type
command: sh /source/my-app.docker.build.sh # build script
1) you build container using docker compose:
$ docker-compose -f docker-compose.build.yml up --force-recreate my-app
it creates container and runs shell build script my-app.docker.build.sh
, I don't use Dockerfile
and do everything in the build script:
/source
folder)Then you create image from container, replacing CMD for that needs to be run in target env:
docker commit -c "CMD npm run serve" my-app-build-container my-app-build-image:tag
So your image is ready, used external yarn cache and external secret keys that where available only while build time.
@whitecolor yep that works :) except for one thing: docker build
is really effective at uploading the build context. Mounted source volumes unfortunately don't work with remote docker daemons (e.g. docker-machine on cloud for low-powered/bandwidth laptops). For that we have to do cumbersome docker run
, docker cp
, docker run
, etc series of docker commands and then snapshot the final image, but its really hacky.
It really helps to have this officially part of docker build, and use layering and build context 😄
@rdsubhas Yes you are correct
@whitecolor That is a really simple and effective solution. I just cut down a 30-40 min build on a project to about 5 minutes. I look forward to the possibility of having a --mount on build feature but for now this solution really unblocks my pipeline.
This is a comment I left for issue #17745 which I had understood had been closed but was not marked duplicate. Seems I was wrong about that latter point: I'll admit I'm used to systems like Bugzilla that explicitly mark something as "RESOLVED DUPLICATE", and display such up in the top description area of a bug. I'm no mind reader. (So my apologies @graingert, I had little way of knowing, thus there is no need to yell at me in 20pt font -- that was excessive.)
In my case, where this would be useful would be on Debian systems: mounting /var/cache/apt
as a volume, so you're not re-downloading the same .deb files over and over again. (Truly "unlimited" Internet quota just doesn't exist, especially here in Australia, and it even if there was, there's time wasted waiting for the download.)
Or another scenario, you're doing a build, but it also produces test reports such as failure listings and code coverage reports that you don't need to ship with the image, but are useful artefacts to have around. These could be written to a volume when a CI server goes to build the image for the CI server to pick up and host.
Or tonight, I'm doing some Gentoo-based images for myself, I'd like to mount /usr/portage
from the host. It is not hard for a Dockerfile
to realise, "hey, /usr/portage
(in the container) is empty, no problems I'll just grab that" when running without the volume mounted, OR, it just uses the volume as-is, saving time fetching a fresh copy.
Adding those smarts is a trivial if statement in a Bourne shell script… IF the underlying logic to mount the volume is present in the first place. Right now for my Gentoo images, I'm having to pull /usr/portage
every time I do a build (luckily the mirror is on my LAN) which means it's a good few minutes wait for that one step to complete.
So lots of reasons why this is a worthwhile proposal, and I'm doubtful that the nested builds proposed in #7115 is going to help in the above instances.
@whitecolor has an interesting approach, but if doing that, I might as well use a Makefile
completely external to the Docker system to achieve the build.
@sjlongland I wasn't yelling at you, I was poly-filling a big "RESOLVED DUPLICATE" notice
I am using docker and docker-compose to build several containers for our infrastructure. The containers are microservices, mostly written in nodeJS, but there is one microservice witten in Java, using the maven framework.
Every time we rebuild the java container, tens of dependencies are downloaded from Maven; this takes several minutes. Then the code is build in about 15 seconds.
This is very ugly and it impacts our CI strategy pretty hard.
In this scenario it doesn't really matter if the volume with the build dependencies is missing or empty, because in that case the dependencies would be downloaded. Reproductibility is not affected.
I understand that there are security concerns, because I could tamper with the dependencies and inject nasty code in there; IMHO that could be easily circumvented by not allowing images build with "build volumes" to be published on docker-hub or docker-store.
To spell this out differently, there should be a distinction of scopes between the enterprise use and the personal use of docker.
@stepps check out https://pypi.python.org/pypi/shipwright instead of docker-compose
I've been following this thread for a while, looking for a good solution for myself. For building minimal containers in a flexible way with minimal effort I really like https://github.com/edannenberg/gentoo-bb by @edannenberg.
It's based off using Gentoo's portage and emerge, so @sjlongland you may like it for your Gentoo-based images. Dist files and binary packages are cached, so it doesn't need to download or build them again making rebuilds fast. It has hooks to easily customise the build process. Installing 3rd party software is easy, such as using git to clone a repo and then build it, keeping only the build in the final image. It templates the Dockerfile.
A simple example is for figlet
is: -
build.conf:
IMAGE_PARENT="gentoobb/glibc"
Dockerfile.template:
FROM ${IMAGE_PARENT}
ADD rootfs.tar /
USER figlet
CMD ["gentoo-bb"]
ENTRYPOINT ["figlet"]
builld.sh
PACKAGES="app-misc/figlet"
configure_rootfs_build() {
useradd figlet
}
I like @whitecolor's solution it's simple using just Docker technology and then simple shell script or anything else you want to use. I'm using gentoo-bb as it's more complete. Shipwright looks good with more developer focused features such as dealing with branches. https://github.com/grammarly/rocker also seems interesting. Thanks for sharing everyone.
Just another voice added to the pile. Our very complex dev environment would be vastly simpler if we could mount local volumes on build.
A workaround is to run during a build an http server that exposes the local files and then use curl/wget etc. to get the files into the docker build. But I really wish such hacks would be unnecessary.
Another use case.. I want to build docker images for building a proprietary OS which as 10s of different versions. The install media is >80GB, so I cannot just copy this into the docker build environment. A bind mount would be much more preferable.
Another one : my project use distributes Dockerfiles in the repository for building from sources in the container. Currently, we pull another git clone in the container from github. There is shallow clones and all, but still...
So, I just tested [1] on a rhel7 build host, and Red Hat's build of the docker daemon DOES have the -v option for build. I haven't tested on CentOS/Fedora, but one would imagine Fedora/CentOS probably have it too. It's worth testing. Also, RHEL Developer subscriptions are now free [2]:
@fatherlinux Under Fedora `docker build -v' is also available.
@fatherlinux The CentOS 7 version includes it.
+1 I think this would be really useful feature to add to the official docker.
Just updated on both centos and linuxmint (now running 17.03.1-ce), Am I missing something here ? I can't see the option -v
On mint
$ docker build --help
Usage: docker build [OPTIONS] PATH | URL | -
Build an image from a Dockerfile
Options:
--build-arg list Set build-time variables (default [])
--cache-from stringSlice Images to consider as cache sources
--cgroup-parent string Optional parent cgroup for the container
--compress Compress the build context using gzip
--cpu-period int Limit the CPU CFS (Completely Fair Scheduler) period
--cpu-quota int Limit the CPU CFS (Completely Fair Scheduler) quota
-c, --cpu-shares int CPU shares (relative weight)
--cpuset-cpus string CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems string MEMs in which to allow execution (0-3, 0,1)
--disable-content-trust Skip image verification (default true)
-f, --file string Name of the Dockerfile (Default is 'PATH/Dockerfile')
--force-rm Always remove intermediate containers
--help Print usage
--isolation string Container isolation technology
--label list Set metadata for an image (default [])
-m, --memory string Memory limit
--memory-swap string Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--network string Set the networking mode for the RUN instructions during build (default "default")
--no-cache Do not use cache when building the image
--pull Always attempt to pull a newer version of the image
-q, --quiet Suppress the build output and print image ID on success
--rm Remove intermediate containers after a successful build (default true)
--security-opt stringSlice Security options
--shm-size string Size of /dev/shm, default value is 64MB
-t, --tag list Name and optionally a tag in the 'name:tag' format (default [])
--ulimit ulimit Ulimit options (default [])
$ cat /etc/lsb-release
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=18
DISTRIB_CODENAME=sarah
DISTRIB_DESCRIPTION="Linux Mint 18 Sarah"
$ docker version
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Fri Mar 24 00:45:26 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.1-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: c6d412e
Built: Fri Mar 24 00:45:26 2017
OS/Arch: linux/amd64
Experimental: false
On centos 7
# docker build --help
Usage: docker build [OPTIONS] PATH | URL | -
Build an image from a Dockerfile
Options:
--build-arg list Set build-time variables (default [])
--cache-from stringSlice Images to consider as cache sources
--cgroup-parent string Optional parent cgroup for the container
--compress Compress the build context using gzip
--cpu-period int Limit the CPU CFS (Completely Fair Scheduler) period
--cpu-quota int Limit the CPU CFS (Completely Fair Scheduler) quota
-c, --cpu-shares int CPU shares (relative weight)
--cpuset-cpus string CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems string MEMs in which to allow execution (0-3, 0,1)
--disable-content-trust Skip image verification (default true)
-f, --file string Name of the Dockerfile (Default is 'PATH/Dockerfile')
--force-rm Always remove intermediate containers
--help Print usage
--isolation string Container isolation technology
--label list Set metadata for an image (default [])
-m, --memory string Memory limit
--memory-swap string Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--network string Set the networking mode for the RUN instructions during build (default "default")
--no-cache Do not use cache when building the image
--pull Always attempt to pull a newer version of the image
-q, --quiet Suppress the build output and print image ID on success
--rm Remove intermediate containers after a successful build (default true)
--security-opt stringSlice Security options
--shm-size string Size of /dev/shm, default value is 64MB
-t, --tag list Name and optionally a tag in the 'name:tag' format (default [])
--ulimit ulimit Ulimit options (default [])
# docker version
Client:
Version: 17.03.1-ce
API version: 1.27
Go version: go1.7.5
Git commit: c6d412e
Built: Mon Mar 27 17:05:44 2017
OS/Arch: linux/amd64
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
@wilfriedroset In CentOS 7, the non-Official Docker packages provide the option. I think its part of the EPEL repository.
thanks @nathanjackson. Do we have an ETA for this feature in the official release ?
@wilfriedroset AFAIK, there is NO ETA because it was decided (several times) that this feature SHOULD not be in the official docker to preserve "build portability." aka allow your Dockerfiles to run anywhere including the Docker build service.
In my experience, limited build portability is what customers really want. They want to set up a build environment/farm and ensures that builds can always be rebuilt in that environment. The -v build option does not prevent this in any way.
For example, if you use NFS mounts, just make sure all of the build servers have that mount in their fstabs and you build will complete without issue anywhere in the farm.
On RHEL 7.3
````
[root@rhel7 ~]# docker build --help
Usage: docker build [OPTIONS] PATH | URL | -
Build an image from a Dockerfile
Options:
--build-arg value Set build-time variables (default [])
--cgroup-parent string Optional parent cgroup for the container
--cpu-period int Limit the CPU CFS (Completely Fair Scheduler) period
--cpu-quota int Limit the CPU CFS (Completely Fair Scheduler) quota
-c, --cpu-shares int CPU shares (relative weight)
--cpuset-cpus string CPUs in which to allow execution (0-3, 0,1)
--cpuset-mems string MEMs in which to allow execution (0-3, 0,1)
--disable-content-trust Skip image verification (default true)
-f, --file string Name of the Dockerfile (Default is 'PATH/Dockerfile')
--force-rm Always remove intermediate containers
--help Print usage
--isolation string Container isolation technology
--label value Set metadata for an image (default [])
-m, --memory string Memory limit
--memory-swap string Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--no-cache Do not use cache when building the image
--pull Always attempt to pull a newer version of the image
-q, --quiet Suppress the build output and print image ID on success
--rm Remove intermediate containers after a successful build (default true)
--shm-size string Size of /dev/shm, default value is 64MB
-t, --tag value Name and optionally a tag in the 'name:tag' format (default [])
--ulimit value Ulimit options (default [])
-v, --volume value Set build-time bind mounts (default [])
```
another use case on a CI building node projects is to share the CI's yarn
cache when building all the images.
+1 : install node_modules again and again is really terrible, especially for nodejs micro services
I'm trying to solve this problem with nfs , I think "repeatable" is not a good reason for not implement this feature...
This seems like it will be even more important with #31257 and #32063 merged in.
Take a look at #32507
@fatherlinux could you explain how build portability works when you can have COPY commands within the Dockerfile? I have an issue where I want to avoid the number of copies of a large file (for time complexity reasons) and am looking for a build-time read-only option to share the file with the container.
@arunmk See https://github.com/moby/moby/issues/32507
@arunmk @cpuguy83 exactly. The idea is that you really don't want to COPY data into the container on build. That can make it very large. We just want the data available at build time. Per above, you can do a -v bind mount in Red Hat's version of the docker daemon which allows you to have data available, but's it's read only right now (burned me last week).
So, if you need it today, check out Fedora, CentOS, or RHEL and you can mount in a Read Only copy of data at build time...
And, if you need portability within a build farm, I would suggest NFS or some such....
If you don't care about copying it in but rather just care about having it in the final image, you can use multi-stage builds to handle this.
A contrived example:
FROM fatImage AS build
COPY bigData /data
RUN some_stoff /data
FROM tinyImage
COPY --from=build /data/result
Thanks for the clarification @fatherlinux
@cpuguy83 thanks for the detail. Let me add more detail to my issue which may be uncommon: I have a build system that generates a 3.3GB file. That is added to an RPM which is built within a docker container. So there are two copies that are produced: one from the build system into the docker container, one from within the docker container to within the RPM. Now, I cannot avoid the second copy. I was thinking of avoiding the first copy but it looks like that is also not possible, even with the multi-stage builds.
I can understand that, if the large file was used repeatedly, the multi-stage copy would have reduced the number of times the copy runs to '1'. I use it once and wanted to reduce the number to '0'. Am I right in understanding that it won't be possible?
@arunmk No matter what it's going to have to be copied to the build instance from the client.
@cpuguy83 thanks for the clarification. Looks like I have to take the overhead for now. Is that to have atomicity?
@fatherlinux
I tried what you said, using -v on RHEL7 to try and readonly mount a directory during build, but get this error:
Volumes aren't supported in docker build. Please use only bind mounts.
This will only work with the docker package from RHEL not the one from Docker. Patch was not accepted upstream.
@fatherlinux
I tried what you said, using -v on RHEL7 to try and readonly mount a directory during build, but get this error:
Volumes aren't supported in docker build. Please use only bind mounts.
@fcntl
you need to use binds as the error said, you probably used -v /something
rather than /hostsomething:/containersomething
@thebigb and perhaps others, we've set up an infrastructure to be able to use ccache during docker builds. we've published it at https://github.com/WebHare/ccache-memcached-server if it helps you, although ideally resolving this issue would probably obsolete it.
I was just about to add, a use case I really need this for is ccache. I would like to be able to mount my ccache cache during a docker image build--there's no sense in it being in the image itself. @unilynx I'll have a look at your workaround--good timing!
Juse another voice.
My use case: currently I use rocker's MOUNT
command to share /root/.cache
and /var/cache/apk
directories.
For some reason I have very (very, very) slow network access to apk packages and pip packages. Any rebuild will make the process incredibly time-comsuming. It make things a lot easier with this build-time MOUNT
feature.
@embray @roxma have a look at https://github.com/moby/moby/issues/32507 if that would address your use case; feedback welcome
With the introduction of multi-stage builds, I find the need to specify a volume mount for Maven's Local Cache is critical.
@gim913 This is not how you participate in any community. If you would like to contribute, please review the existing proposals linked here to see if any of them solves your use-case.
@gim913 At this stage of docker integration into various distributions, changing environments (ie dropping docker completely) seems like a lot more disruptive than changing your 'OS' (I assume you mean switching from a different Linux distribution towards the RedHat build which apparently includes -v? )
Wouldn't it be easier to just take RedHat's version of docker? Perhaps someone here can point you towards the relevant patches/forks/commits to get the '-v' option in the build.
@unilynx here you go
I was looking at some examples that used wget and got here...my use case is similar...I want to unzip a large tarball and just run it. I don't wan't to litter the docker file w/ the tarball or waste time doing a wget from a local web server. Mounting like you can do w/ docker compose seems like a reasonable thing to do at build time. Please merge Puneeth's change if it looks ok :-)
I precompile python wheels and want to install those in the container without copying them and making a layer I really don't need, or have to somehow try to squash. Day 1 and I am already looking into rocker
😢 😢 😢
This would be easy to add and extremely useful (or a mount command, see rocker
again). How much time is spent (in the community) scripting around this or similar missing features?
@awbacker Multi-stag build solves this pretty well where you can do something like
FROM something AS my_wheels
RUN compile_all_the_things
FROM something
COPY --from my_wheels /wherever
RUN do_stuff_with_wheels
The first part is only run if something changes. The cache for it can be shared amongst other builds/dockerfiles as well.
This makes the whole build self-contained.
There's also a proposal that would allow RUN --mount
where the mount spec would tell it to mount a thing from the my_wheels
build target instead of copying it.
Like for @kenyee, this could mount something from the build context, which in 17.07-experimental is only sent incrementally as needed.
@cpuguy83 That doesn't work in practice - at least for Gradle Java builds. I have a base Docker image that has the Gradle Jar files pre-cached, but Gradle build of your source is what triggers the download all of your dependencies into the cache.
@cpuguy83 multi-stage does't allow to remove copied wheels from resulted image, it's what @awbacker talking about. Thus content in /wherever folder will be cached and image size will be increased.
@BryanHunt So part of your build process is downloading the deps? For sure Gradle must provide a way to cache these without going through and actually building?
@cpuguy83 Yep, deps are downloaded as part of the build. Basically the same as Maven. For reference: https://github.com/gradle/gradle/issues/1049
was there a PR for build mounts somewhere?
@graingert Here
👍 for this. At Lunar Way we want to do the complete "build -> test -> build production image" process in a single Docker build in order to remove build and test dependencies from the CI server. With multi stage builds we can do this, but we cannot get the test results out of the intermediate container in the build process. We therefore have to do it in two steps right now - with a separate Dockerfile for building the test image, running it and then only proceeding to the build prod image step, if tests succeeeds.
A -v option on docker build would allow us to store the test results in a folder mounted in from the CI server and remove the need for the current 2-step process.
@tbflw By default Docker build does not remove intermediate containers after an unsuccessful build. So if a test fails, you can get the test results from those.
Please, we also really, really need this feature! Resorting to other tools like rocker or forking docker with ad-hoc patches is by far uglier than breaking the evangelic notion of "build portability".
@BryanHunt @stepps @yngndrw others too @awhitford
One way to cache build dependencies is to make your build work like the example multi-stage go build in the documentation or the python onbuild Dockerfile.
Here is an example I made that seems to work for maven. I'll copy it here.
FROM maven
WORKDIR /usr/src/app
# /root/.m2 is a volume :(
ENV MAVEN_OPTS=-Dmaven.repo.local=../m2repo/
COPY pom.xml .
# v2.8 doesn't work :(
RUN mvn -B -e -C -T 1C org.apache.maven.plugins:maven-dependency-plugin:3.0.2:go-offline
COPY . .
RUN mvn -B -e -o -T 1C verify
FROM openjdk
COPY --from=0 /usr/src/app/target/*.jar ./
It needs to be set up so it downloads dependencies before it copies the rest of the codebase in. Also make sure that the place your artifacts get stored aren't in a VOLUME.
@sixcorners That doesn't work for Gradle
@BryanHunt This Dockerfile or this approach doesn't work for gradle? cpuguy83 asked if there was a way to download dependencies without actually performing a build. You linked to a resolve dependencies task. Couldn't you just add the build.gradle file and run that task?
@sixcorners When you have many modules, you have to replicate your directory structure along with the build files and property files. I suppose it could be done, but I see this as very error prone.
The multistage by @sixcorners is an interesting trick and I have seen it used for different package managers (eg npm, composer).
There is an issue though, whenever the list of dependencies is changed COPY pom.xml
in the stage 0 image causes the layer to be ditched out and thus the whole cache is gone. That meant that whenever a developer change anything in the pom (a comment, a 1kBytes dependency) the whole cache gotta be redownloaded again.
For CI machines building the image and then running the tests with dependencies that keep changing, that is thousands and thousands of packages that have to be redownloaded (either from a proxy or from upstrea) and make the rebuild quite slow. A local file based cache mounted as a volume is way faster.
That is also an issue when developers iterate the build of an image, specially if they are on slow connections. Though one can setup a local Nexus instance and http_proxy to it but that has other side effects (such as channeling any http request via Nexus).
Multistage is a nice workaround, but it is not ideal.
A solution we are about to try is to build an image by building our shared libraries and retaining the dependency cache. This image would then become our build image for our apps. It's not ideal, but we think it's worth a try.
There is an issue though, whenever the list of dependencies is changed COPY pom.xml in the stage 0 image causes the layer to be ditched out and thus the whole cache is gone. That meant that whenever a developer change anything in the pom (a comment, a 1kBytes dependency) the whole cache gotta be redownloaded again.
@hashar note that the COPY --from
feature is not limited to build-stages; from the Dockerfile reference:
Optionally
COPY
accepts a flag--from=<name|index>
that can be used to set the source location to a previous build stage (created withFROM .. AS <name>
) that will be used instead of a build context sent by the user. The flag also accepts a numeric index assigned for all previous build stages started withFROM
instruction. _In case a build stage with a specified name can’t be found an image with the same name is attempted to be used instead._
This allows you to _build_ an image for your dependencies, tag it, and use that to copy your dependencies from. For example:
FROM maven
WORKDIR /usr/src/app
# /root/.m2 is a volume :(
ENV MAVEN_OPTS=-Dmaven.repo.local=../m2repo/
COPY pom.xml .
# v2.8 doesn't work :(
RUN mvn -B -e -C -T 1C org.apache.maven.plugins:maven-dependency-plugin:3.0.2:go-offline
COPY . .
RUN mvn -B -e -o -T 1C verify
docker build -t dependencies:1.0.0 .
And specify using the dependencies:1.0.0
image for your dependencies;
FROM openjdk
COPY --from=dependencies:1.0.0 /usr/src/app/target/*.jar ./
Or (just a very basic example to test);
$ mkdir example && cd example
$ touch dep-one.jar dep-two.jar dep-three.jar
$ docker build -t dependencies:1.0.0 . -f -<<'EOF'
FROM scratch
COPY . /usr/src/app/target/
EOF
$ docker build -t myimage -<<'EOF'
FROM busybox
RUN mkdir /foo
COPY --from=dependencies:1.0.0 /usr/src/app/target/*.jar /foo/
RUN ls -la /foo/
EOF
In the output of the build, you'll see:
Step 4/4 : RUN ls -la /foo/
---> Running in 012a8dbef91d
total 8
drwxr-xr-x 1 root root 4096 Oct 7 13:27 .
drwxr-xr-x 1 root root 4096 Oct 7 13:27 ..
-rw-r--r-- 1 root root 0 Oct 7 13:26 dep-one.jar
-rw-r--r-- 1 root root 0 Oct 7 13:26 dep-three.jar
-rw-r--r-- 1 root root 0 Oct 7 13:26 dep-two.jar
---> 71fc7f4b8802
I don't know if anyone has mentioned this use-case yet (I briefly searched the page) but mounting an SSH auth socket into the build container would make utilizing dependencies that deployed via private git repositories much easier. There would be less of a need for boilerplate inside the Dockerfile regarding copying around keys in non-final build stages, etc.
buildkit has native support for git
https://github.com/moby/buildkit
Solving.
Create bash script(~/bin/docker-compose or like):
#!/bin/bash
trap 'kill $(jobs -p)' EXIT
socat TCP-LISTEN:56789,reuseaddr,fork UNIX-CLIENT:${SSH_AUTH_SOCK} &
/usr/bin/docker-compose $@
And in Dockerfile using socat:
...
ENV SSH_AUTH_SOCK /tmp/auth.sock
...
&& apk add --no-cache socat openssh \
&& /bin/sh -c "socat -v UNIX-LISTEN:${SSH_AUTH_SOCK},unlink-early,mode=777,fork TCP:172.22.1.11:56789 &> /dev/null &" \
&& bundle install \
...
or any other ssh commands will works
Then run docker-compose build
To throw another use case on the pile. I use Docker for Windows to generate a filesystem for building embedded linux system in one container and I would like to share this with other containers during their build step. I interact with this container changing configuration and rebuilding etc so performing the build in a Dockerfile and using multi-stage builds isn't a really good fit as I would lose incremental builds. I want to cache my previous build artefacts as it takes about 1.5 hours to do a clean build. Due to the way Windows deals with symbolic links I can't do my build into a host mounted volume so I use named volumes. Ideally I would like to share these named volumes in the build steps of my other images; as at the moment I have to create a tar of the build output (about 4gb) then do a docker copy to make it available on the windows host for subsequent builds.
In case of python, when we pip install package
it and its dependencies are downloaded to a cache folder and then installed to site-packages.
As a good practice we use pip --no-cache-dir install package
to not store rubbish/cache in current layer. But for best practice it is desired to put the cache folder out of build context. so build time -v will help.
some users above mentioned to use COPY . /somewhere/in/container/
it is OK for user app or files but not for cache. because COPY creates one more layer as its own and removing caches in later layers will not be useful. other bad side effect is if cache changed when we use COPY the context changed and following layers will invalidate and forced to rebuild.
@wtayyeb If you have Dockerfile which runs pip install ...
only when requirements file changes then build time -v doesn't seem that important since requirements do not change as often as applications do when building.
@wtayyeb You can use multi-stage Dockerfile to have both the cache and a lean image. That is, use an installer image to install python into some directory and then for your final image use COPY --from to transfer only the necessary python files without any installation artifacts or even pip itself.
@manishtomar, Thanks, Yes and No! In clean case all dependencies are downloaded again and build and converted to wheels and cached, then installed into destination environment. So if one put requirements in there that is one time job. But if one tiny dependency is updated, all the dependencies must to re-downloaded, re-build and re-wheeled and re-cached to be usable.
When using a CI to build and test your libraries and your applications in a matrix of several jobs, multiply above work in number of concurrent jobs in your CI server and will get iowait raising to more than 3s and load average above 15 even with SSDs. (these numbers are real for 2 concurrent builds and app with ~20 dependencies) I think pip cache is doing it in the right way, avoiding re-downloading, re-building and re-wheeling the ready packages. and without bind -v we loose time and server resources.
@ibukanov, Thanks. I am using multi-stage Dockerfile for building my apps packages and use them later. It would help if I have only one Dockerfile and want to build it several times, but what if there be several Dockerfiles and each one is build against a python version (2.7,3.6 for now) and also have several c-extensions that need to be build for selected base image? what about above paragraph?
@thaJeztah You suggestion is great and it will save us some time, however in the case of build caches we really don't want to have to copy anything from the other image.
Why can't we access another image without copying it?
@thedrow my example was with the features that are currently there; have a look at the RUN --mount
proposal (https://github.com/moby/moby/issues/32507), which may be a closer fit to your use case
Reading the above thread I see a large number of people trying to find kludges to fix a basic functionality gap in the docker build process. I see no compelling arguments from the basis of portability without necessarily conflating host mounts with image mounts- arguments which are frankly specious and lazy.
I am also a gentoo container user and was redirected from https://github.com/moby/moby/issues/3156 which is a completely valid use case for this missing functionality.
All I really want is the ability to mount the contents of another image at build-time so that I don't bloat my images.
@kbaegis sounds like an exact match with the feature that's proposed in https://github.com/moby/moby/issues/32507
Sure. That one's only been an unimplemented P3 in the backlog for one year rather than 3 years.
It looks like https://github.com/projectatomic/buildah is actually going to outstrip docker build pretty quickly here for this basic functionality. I think I'm just going to switch my pipeline over once that happens.
@kbaegis what did you come here to add to this discussion? You described a use-case that _exactly_ matches a different proposal;
All I really want is the ability to mount the contents of another image at build-time so that I don't bloat my images.
It’s open-source, things don’t come to existence magically.
What am I looking to add to the discussion?
Succinctly that I'm moving on from this toolset. I'm sure that's valuable information for the development team as I'm sure I'm not alone there.
The glacial speed and low priority for supporting this use case (and any reliable workaround that provides this functionality) has forced me onto other tools and that I'm abandoning this build pipeline due to missing functionality.
I've got a (rehash, I'm sure) use case to add. #32507 may suit this better.
I'm building a docker image for some bioinformatics pipelines. A few of the tools require some databases to be present prior to their compilation/installation (please don't ask, it's not my code). These databases weigh in at a lovely 30gb minimum.
During runtime, I certainly intend for those databases to be mounted -v
volumes. Unfortunately, I cannot do this during the build process without "baking" them in, resulting in a rather obscenely sized image.
@draeath take a look at the https://github.com/grammarly/rocker . It already supports a lovely MOUNT instruction.
@draeath also, check out Buildah, it supports mounts by default because it is set up more like a programming tool. Also supports mounts with a Dockerfile:
Thank you both @fatherlinux and @lig - this will help me get my task done. I still think I shouldn't have to stray outside the project to do it, though, and would still love to see this and #32507 implemented ;)
I've come here via some googling to ask for the same feature, volumes at 'docker build' time, not 'docker run' time.
We have an embedded system that contains a CPU. The manufacturer provides tooling to compose a system image, and then transfer the image into the CPU. This tooling is 3rd party to me and I cannot change it. The manufacturer is also unlikely to alter it at my request.
I want to build a docker image that does a first pass "build the firmware image", and then be able to spawn containers that just push the firmware image to the fresh-off-the-line PCBs. A Dockerfile might look like:
----------[ Cut Here ]----------
FROM base-image as builder
COPY src src
RUN build-src
FROM base-image as flasher
COPY --from=builder build-artifacts
RUN cpu-build-and-flash --build-only
----------[ Cut Here ]----------
Unfortunately, the cpu-build-and-flash step requires access to the target device via USB bus, even though it's not going to push the firmware image to the device. Thus I need to take the '-v /dev/usb/bus:/dev/usb/bus' from the 'docker run' command and have it in the build instead.
It's clear that this isn't currently possible.
The workaround I'm going ahead with is to manually create a flashing image by 'docker container commit'ing a container to an image. I'd much rather just mount the USB bus at build time.
Update for any who are interested: I've recently rebuilt my entire pipe successfully with buildah. I've currently got the two build pipelines running in parallel and the oci/buildah pipeline is generating smaller images (specifically removing /usr/portage in my case by masking it with another mount).
And finally this feature is here: https://github.com/docker/docker-py/issues/1498
But I want RW volumes for a build cache
On Sat, 28 Apr 2018, 17:29 Коренберг Марк, notifications@github.com wrote:
And finally this feature is here: docker/docker-py#1498
https://github.com/docker/docker-py/issues/1498—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-385188262, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAZQTJodLCCzyDdPFtNiIUZ_z85YvLWbks5ttJjagaJpZM4FIdOc
.
I'd also like to see this feature (with write capabilities) so that a unit test results file can be exported during the multistage build process in a CI pipeline. To keep with the spirit of build portability, if the -v switch was not provided, the file would simply be written internally within the test image at that stage.
The ideal goal is to build once, test once, and still have the results file given to the host system, even in the event (and especially in the event) that tests fail, stopping the build.
Yes please. All day.
Not entirely relevant, but we're migrating a part of our deployment infrastructure and needed a way to copy files from an image after build. The following did the trick:
docker build -t x .
ID=$(docker create x)
docker cp $ID:/package.deb .
docker rm $ID
It should have already been added when multistage docker file was introduced. Eventually everyone is gonna face this issue as soon as they are gonna start running unit tests as a stage in multistage docker file specially in case of CI build pipelines. We are also facing this issue where we have to publish unit test reports to VSTS. Already applying the workaround @hoffa has mentioned. But after all it is a workaround and making the things complicated.
Should we make a different issue for people that want/need build-time volumes for a build cache?
@ajbouh Yes, probably at https://github.com/moby/buildkit/issues
See https://github.com/moby/moby/issues/32507#issuecomment-391685221
On Wed, May 23, 2018, 19:22 Akihiro Suda notifications@github.com wrote:
@ajbouh https://github.com/ajbouh Yes, probably at
https://github.com/moby/buildkit/issues—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-391566368, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAcnSqNoVc4j34ElECy53gIfPecQFKfks5t1hlkgaJpZM4FIdOc
.
While you can't add volumes at build-time, you can add hosts, so I now build all my docker images with something like --add-host yum-mirror:$MIRROR_IP
which serves up a yum mirror which my build images then detect via a wrapper around yum. Handy when my project changes dependencies many times a day and I'm offline or on a bad connection (part of the project involves updating and cleaning up its many deps).
I find Docker's resistance to solving this problem infuriating.
Experimental support for buildkit was recently merged, with that comes with an option to RUN --mount=<opts> <command>
.
link to @cpuguy83 note: https://github.com/moby/buildkit/pull/442
@glensc @cpuguy83 When can we expect a release for this merged feature?
+1
RUN --mount
doesn't have volume support, so things like https://github.com/avsm/docker-ssh-agent-forward remain impossible at build time, what is the solution for this?
@peter-edge https://github.com/moby/buildkit/pull/655
docker build --secret
is finally available in Docker 18.09 https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066
Can we close this issue?
--secret
is not usable for the caching use case, from what I can tell.
@AkihiroSuda RUN --mount
in general looks like something possibly fitting as the solution for this issue.
Yes, I suppose RUN --mount=type=cache
(for cache volume) and --mount=type=secret
with docker build --secret
(for secret volume) almost covers the issue.
@AkihiroSuda so, a working example solving the original issue would be good to see
@AkihiroSuda From the article (https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066) I saw 2 use cases of using mount during build: Secret and SSH
[Secret]
docker build --secret id=mysite.key,src=path/to/mysite.key .
RUN --mount=type=secret,id=mysite.key,required <command-to-run>
[SSH]
RUN --mount=type=ssh git clone [email protected]:myorg/myproject.git myproject
There are 2 other use cases (that I remember) that aren't explained how to use in the article nor in this issue:
1) [Cache] RUN --mount=type=cache
2) Volumes in general (for example, to mount SSL certificates, or in the case of large volumes that should be used during build, but not included in the generated image, and so on...)
Once use case is mounting yarn
workspace before running webpack
You can do all of this..
RUN --mount=type=cache,from=<some image>,source=<path in from image>,target=<target>
You can also replace from=<some image>
to from=<some build stage>
Here's a contrived example:
# syntax=docker/dockerfile:1.0.0-experimental
FROM busybox as hello
RUN echo hello > /hello.txt
FROM scratch
RUN --mount=type=cache,from=busybox,source=/bin,target=/bin --mount=type=cache,from=hello,source=/hello.txt,target=/tmp/hello.txt echo /tmp/hello.txt
Here's some documentation on this: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
I agree with @AkihiroSuda, this should handle all the cases... but please do let us know if it does not.
@AkihiroSuda @cpuguy83 : Unfortunately, the current implementation (buildkit in docker 18.09) has issues with private registries. As of now, these new features can't be used if you have to fetch your images through a private registry. See my tests in https://github.com/moby/moby/issues/38303.
I think this would be also use for Jenkins artifacts, so for example if im creating a Docker image and compiling something inside i want to get some artifacts such as say junit pytest output
This would be very useful. I really would rather not need to add --experimental
to support RUN --mount=type=cache /user/.cache/pip pip install
(in order to save tons of package index bandwidth).
buildah bud
(buildah build-using-dockerfile
) has a --volume
/-v
option:
https://github.com/containers/buildah/blob/master/docs/buildah-bud.md
buildah
can run builds as non-root without a docker socket.
Because package downloads from the network are more reproducible?
No need to add "--experimental", only "DOCKER_BUILDKIT=1" on the client.
Yes, network builds are more reproducible in that the context is all in the Dockerfile. If you have to mount context from the host to make the build work it's a bad experience.
Note that you can also mount an image into the build.
Yes, network builds are more reproducible in that the context is all in the Dockerfile.
Surely having RUN apt-get update
in the Dockerfile makes sure that one has all the steps needed to build the image. However, it is not reproducible since additional context gets downloaded from a third party. The only difference with a mount is that all external contexts are indeed defined in the Dockerfile.
If you have to mount context from the host to make the build work it's a bad experience.
My bad experience with Docker build is that its never reproducible and we could definitely benefit from mounting a cache from the host which would arguably speed up some use cases.
What I end up doing eventually is to have a multistage build. One image that get the context from network, which thus act as a snapshot of the remote context. Then tag that with some arbitrary version, the date works fine. Eg:
RUN apt-get update
docker build -t aptupdate-20190417
And in the actual image:
FROM aptupdate-20190417
FROM somebaseimage
COPY --from=aptupdate-20190417 /var/apt /var/apt
Repeat with other remote context and you more or less have something which is reproducible.
Or in short: a Dockerfile that relies on network access is probably not reproducible. A mount might make it not reproducible but would help making some use cases reproducible. But I guess the point that Dockerfile should have all the steps required to actually build the image, though in my experience most write their own tooling to instrument building images.
I mean, RUN --mount=type=cache
is exactly for this.
Or you can even mount from another image from a registry and it will be fetched.
Your apt
commands can be made (relatively) reproducible by pinning what you want to fetch.
But if you really want to control all the bits, then why are you using apt in your build? Storing this on a build host is not reproducible and easily breaks from host to host.
Keeping it in a registry is not bad other than the potential for network failure... which is of course a fair criticism.
-v
on buildah and redhat's fork was explicitly rejected here because it's overly broad... not to say it's not useful, but it easily breaks from host to host, which goes against the design of docker build
.
Meanwhile the reason RH added it (or more precisely why they decided to work on it) was to be able mount in RHEL credentials into the build environment.
Yes, network builds are more reproducible in that the context is all in the Dockerfile. If you have to mount context from the host to make the build work it's a bad experience.
I vehemently disagree. The network may be down or compromised; in which case a local cache prevents the whole build from failing while the internet is down.
I could specify volumes:
once in my docker-compose.yml; but instead need to do DOCKER_BUILDKIT=1
and add RUN --mount=type=cache
in Dockerfiles managed upstream? Why?
With CI builds, we're talking about a nontrivial amount of unnecessary re-downloading tens to thousands of packages (tens or hundreds of times a day) that could just be cached in a volume mount (in a build that runs as nonroot without the ability to execute privileged containers with their own volumes on the host).
Package indexes are in many cases generously supported by donations. Wasting that money on bandwidth to satisfy some false idea of reproducibility predicated upon a false belief that remote resources are a more reproducible cache of build components is terribly frustrating.
Please just add --volume
so that my docker-compose.yml works.
Please just add --volume so that my docker-compose.yml works.
Making your "docker-compose" just work is backwards.
docker-compose consumers this project, not the other way around.
docker-compose interacts with the docker socket. docker-compose YAML is a consolidated way to store container options (which can be converted to k8s pod defs (which podman supports, to a degree)). How should I specify DOCKER_BUILDKIT=1
in a reproducible way? I could specify build_volumes:
in a reproducible way in a docker-compose.yml.
When I -- in my CI build script that runs n times a day -- build an image by e.g. calling docker-compose build
(e.g. with ansible) or packer
(instead of buildah and podman), I have a few objectives:
If I need to flush the cache volume, I can flush the cache volume.
RUN pip install app && rm -rf /root/.cache
COPY . /app/src/app
COPY .cache/pip /app/.cache/pip
RUN pip install /app/src/app \
&& rm -rf /app/.cache/pip
ONBUILD
)RUN --mount=type=cache
and set an environment variable# Fork by copying to modify every pip install line
RUN --mount=type=cache /app/.cache/pip pip install /app/src/pip
$ DOCKER_BUILDKIT=1 docker build . [...]
--mount=type=cache
cache (?)$ buildah bud -v .cache/pip:/app/.cache.pip
$ docker build -v .cache/pip:/app/.cache.pip
services:
app:
image: imgname:latest
build: .
build_volumes: # "build_volumes" ?
- ./.cache/pip:/app/.cache/pip
$ docker-compose build
:point_up: Just a reminder. You can pin a downloaded file by checking its checksum. Some package manager, such as pip, also support that.
@westurner Thanks for the detailed explanation.
I think that the following would be similar to your case B, but you could clear the cache and it would end up like your case C2 (what you are asking for, I think):
_docker-compose.yml:_
services:
my-cache:
build: ./my-cache
image: local/my-cache
my-image:
build: ./my-image
_my-cache/Dockerfile:_
FROM python
RUN pip install app
_my-image/Dockerfile:_
FROM my-repo/my-image
RUN --mount=target=/export,type=bind,from=local/my-cache
RUN pip install /app/src/app
(https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypecache)
You can build the cache image with:
docker-compose build my-cache
The command RUN --mount=target=/export,type=bind,from=local/my-cache
should bind to the image. If you want to refresh the cache you could remove and rebuild the cache image.
If this still uses the cache in the RUN --mount...
you can use a .env
file with a version, include the version in image: local/my-cache:$MY_VERSION
and from=local/my-cache:$MY_VERSION
(it should be included as a build arg).
You could include the my-cache
service in another docker-compose
file if you don't want it to be in the same file as your main services.
You would still need to use DOCKER_BUILDKIT=1
(like in your B case, but I think this won't be necessary in future versions) and it would still not be reproducible (but your C2 case isn't either).
What would be the penalty you see if it isn't reproducible? If you put the cache image local/my-cache
in docker hub (with a different repo name) or in a private registry and use versions for each build (that will create a different cache), with the same version having always the same cache, wouldn't it make it reproducible? You wouldn't even need to include the service in the docker-compose
file and call the build command. (The Docker Hub should be accessed from the network, but it is the same for your other images, I assume, and after you download once, it should not be needed anymore, unless you generate a new version with a new cache)
DISCLAIMER: I haven't tested the above code.
@Yajo The checksum support in pip was originally implemented in 'peep' and then merged into pip. You can add known good hashes as URL fragments in pip requirements file entries. (There is funding for security improvements in the PyPA project this year; TUF (The Update Framework; just like Docker Notary) support in PyPI is planned for later this year.) Correctly bootstrapping pip and PyPI (with keys and trust) in docker images will likely be a topic later this year.
(edit; a bit OT but for the concerned) https://discuss.python.org/t/pypi-security-work-multifactor-auth-progress-help-needed/1042/
@lucasbasquerotto Thanks for your help. This is significantly more complicated than just specifying --volume
at build time. Namely, it seems to require:
DOCKER_BUILDKIT=1
in the docker build
shell envRUN --mount=type=cache
and argsIf I can COPY files from the host, or specify build-time parameters that aren't stored elsewhere, I don't see how mounting a volume at build time is any less reproducible?
COPY || REMOTE_FETCH || read()
- Which of these are most reproducible?
@westurner
Specifying DOCKER_BUILDKIT=1 in the docker build shell env
If you use docker-compose
, as I saw in your other posts, and if you are running it from a container, like:
$ sudo curl -L --fail https://github.com/docker/compose/releases/download/1.24.0/run.sh -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
Then you can edit the downloaded file in /usr/local/bin/docker-compose
to use that env variable. Change from:
exec docker run --rm $DOCKER_RUN_OPTIONS $DOCKER_ADDR $COMPOSE_OPTIONS $VOLUMES -w "$(pwd)" $IMAGE "$@"
to
DOCKER_BUILDKIT=1
exec docker run --rm $DOCKER_RUN_OPTIONS $DOCKER_ADDR $COMPOSE_OPTIONS $VOLUMES -w "$(pwd)" --env DOCKER_BUILDKIT=$DOCKER_BUILDKIT $IMAGE "$@"
This is a very easy change and it's transparent to whoever runs the command.
_(If you don't run as a container, then the above doesn't apply)_
Modifying any/every upstream Dockerfile RUN instruction with RUN --cache and args
In the case I exposed, it would be RUN --mount=type=bind...
, but in any case, having to change the Dockerfile
is also bad IMO. A -v
option would really be much better and more transparent.
Read/write access into another image? Mutability! Or is said cache frozen with probably-stale versions?
When you bind the image it would probably create a container (or whatever would be the name, with a replicated filesystem), and changes done there while building shouldn't change the original image (it wouldn't make sense). So if you build using a cache image named my-repo/my-cache:my-version
in a build, in the next build it would be exactly the same (imutability). If you want to use a more up-to-date cache, you can create a new image with a new version and use it, like my-repo/my-cache:my-new-version
.
Which of these are most reproducible?
I consider reproducible as something that would be exactly the same even if you run it in another machine. In this sense, if you push an image to a (safe and reliable) docker registry, and never change that image, I would consider it reproducible (if you have concerns about internet connection, you could use a private registry and access it inside a VPN or something like that (never used a private registry myself)).
If the COPY command is copying your machine cache, I don't consider it reproducible because if you run pip install
(or apt-get
, or whatever) in another machine, at another time, can you guarantee that the contents of the cache will be the same? Maybe this could be a concern for you. Maybe not.
On the other hand, if you have files that you own in some reliable place that you "own" (like a S3 bucket), download those files into your machine and copy those files with the COPY command, then you can reproduce it from another machine with the same results (assuming the files haven't changed, and the other machine is identical to the previous one). So I would consider this as being reproducible. It depends from where those files are coming and how much control you have over them.
Truth be told, I don't consider anything as being 100% reproducible in all cases (after all, hardware can fail), but the more reliable, the better. When I refer to some process being reproducible, I'm mainly refering to it's contents and result being the same, and this would include something downloaded from the network, assuming that the contents don't change over time (I disconsider the possibility of network failure in this case).
There's some kind of Docker networking bug which makes go mod download
unreliable inside a container, too (at least for applications our size), so just running it every time to download all of my GOPATH/pkg/mod
over again is not just wasteful, but broken. 🤷♀
I could avoid a whole lot of unnecessary file copying if I could use --volume
!
@kevincantu RUN --mount=type=cache should cover your usecase
That requires at least one successful download of modules from within a docker build, and in this particular case I've not yet ever seen that..
https://github.com/moby/moby/issues/14080#issuecomment-484314314 by @westurner is a pretty good overview but I couldn't get buildkit
to work:
$ sudo docker -v
Docker version 19.03.1, build 74b1e89
$ sudo DOCKER_BUILDKIT=1 docker build .
Ä+Ü Building 0.1s (2/2) FINISHED
=> ÄinternalÜ load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 407B 0.0s
=> ÄinternalÜ load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
failed to create LLB definition: Dockerfile parse error line 8: Unknown flag: mount
My Dockerfile
does start with # syntax=docker/dockerfile:experimental
.
I'd actually like to use it via docker-compose
. Tried ENV DOCKER_BUILDKIT 1
in the Dockerfile
and also passing it from docker-compose.yml
via ARG DOCKER_BUILDKIT
but it's all the same:
$ sudo docker-compose up --build
Building web
ERROR: Dockerfile parse error line 10: Unknown flag: mount
@lucasbasquerotto How would what you proposed in https://github.com/moby/moby/issues/14080#issuecomment-484639378 translate to an installed version of docker-compose?
Finally, I'm not even sure if this would cover my use case, perhaps some of you can tell me whether I should pursue this. I want to use a build-time cache for local development which survives between builds so that upon updating dependencies only the new ones would have to be downloaded. So I would add RUN --mount=type=cache,target=/deps
to the Dockerfile
and set the dependency manager's cache to /deps
.
for docker compose see https://github.com/docker/compose/pull/6865, which will be in an upcoming release candidate of compose
I have another use case... I want to build containers for arm on an x86_64 host with configured binfmt. This requires that i have the architecture specific static qemu cpu emulator in /usr/bin.
My current solution is to add qemu-arm-static into the container as file like:
FROM arm32v7/alpine:3.10
COPY qemu-arm-static /usr/bin/qemu-arm-static
RUN apk update && apk upgrade
RUN apk add alpine-sdk cmake
...
The easier solution would be to mount my file only if needed inside the container like:
docker build -v /usr/bin/qemu-arm-static:/usr/bin/qemu-arm-static -t test:arm32v7 .
This does work very good for docker run, but i miss this functionality for building containers.
Is there another solution how i can build arm container on x86_64 hosts or can we allow volumes at build time for at least this case?
@jneuhauser latest kernels allow these binaries to be statically loaded, so there‘s no need to configure them every time. You can achieve this e.g. by running the linuxkit/binfmt
image in privileged mode once after boot.
latest kernels allow these binaries to be statically loaded, so there‘s no need to configure them every time.
@alehaa Don't you still need the static qemu emulator binary within the container, though?
@cybe This is not required anymore, if the F
-flag is used (which is what the linuxkit/binfmt
package does). You can find more information about this here.
Could someone provide a working setup for trying out buildkit? I can't get it working on Ubuntu. My setup is as follows:
cat /etc/docker/daemon.json
{
"experimental": true
}
Dockerfile
# syntax=docker/dockerfile:experimental
FROM ruby:2.6.3
RUN --mount=type=cache,target=/bundle/vendor
sudo docker -v
Docker version 19.03.1, build 74b1e89
DOCKER_BUILDKIT=1 sudo docker build .
Error response from daemon: Dockerfile parse error line 12: Unknown flag: mount
sudo
doesn't carry env vars with it unless you tell it to with sudo -E
or declare the variable within the sudo.
I wrote a few words about this feature and created some minimal examples showing how to cache
Edit: see below
@cpuguy83 thanks!
@thisismydesign sorry to ruin your excitement, but you can't --cache node_modules
, it will not be present in the final image, so your app is broken.
@glensc Damn you're right.. is there a way to make a build-time cache part of the final image?
Honestly, I thought this would be considered for a feature advertised as
allows the build container to cache directories for compilers and package managers.
You should be able to map ~/.npm instead… https://docs.npmjs.com/files/folders.html#cache
@thisismydesign
You can use another image as a cache, though, either by building it in your Dockerfile or a literal image stored in a registry somewhere and use COPY --from
FROM example/my_node_modules:latest AS node_modules
FROM nodejs AS build
COPY --from=/node_modules node_modules
...
This is just an example you can use this for many different things.
Ugh I hate to bring this up and get involved here (also hi friends)
but we have a use case for this.
Is there a good place I can get involved or a call or list I can join to get a digest here?
Also if we need someone to put some resources on this I have 1 kris nova and a small team I can probably persuade to look at this.
TLDR Can I code this please? Is there anyone I can talk to about this?
_TLDR_ Can I code this please? Is there anyone I can talk to about this?
I can't speak for Docker but my impression is that they're not open to adding volume mounting to builds (and that they should probably close this issue)
A lot of the use cases for buildtime -v are now covered by buildkit. It has at least resolved it for me.
I will check out buildkit then - I also have some hacky bash that gets the job done if anyone is interested.
thanks @unilynx
+1 to @unilynx on closing this issue out, buildkit solved the build time volume issues for me too.
I bet if someone dropped a few links and an example we could convince our friends to press the shiny close button.
(I would also benefit from them)
The use case of caching isn't solved for me and many others as the build time volumes with buildkit are not present in the final image.
So I was able to pull all my build artifacts out of the temporary volume used at build
time and reconstruct the image again with the previous cache using this bash I mentioned above.
I was able able to rebuild my image on top of itself such that the overlay filesystem only grabbed a small delta.
I was even able to re-use the volume for other images at build time.
are other folks not able to do this?
(cache) mounts are in the "experimental" front-end; described in https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md (about to head into a meeting, but I can link more extended examples)
thanks @thaJeztah LMK if I can help here in any way :)
https://github.com/moby/moby/issues/14080#issuecomment-547662701
@thisismydesign sorry to ruin your excitement, but you can't --cache
node_modules
, it will not be present in the final image, so your app is broken.
@thaJeztah I don't believe the issue above is solved. Would love to take a look at some examples where it's possible to cache e.g. npm install
during build time that will also allow the resulting image to use the cached installation.
@kris-nova I didn't solve this problem but then again I'm not looking to use bash scripts. Perhaps we need a new issue but this is a pretty common use case that AFAIK isn't solved yet.
@thaJeztah Here are some examples using cache mounts showing how the final image won't contain the mount and there it doesn't cover many use cases of build-time caching:
For npm: Wouldn't one use the cache mounts for the npm cache directory (see https://docs.npmjs.com/cli-commands/cache.html, usually ~/.npm
)?
@ankon That could work, thanks, I'll give it a try. Another use case I'm not sure about is Bundler and Ruby.
So I think (haven't tested yet) for Bundler you can at least get rid of the network dependency by using a build volume at $BUNDLE_PATH
and then during the build
bundle install
bundle package
bundle install --standalone --local
This basically means you have a cached bundle install directory, from there you package gems into ./vendor/cache
and re-install into ./bundle
. But this doesn't spare the time around installing and building gems, it might actually make the build step longer.
If you want to save the cached data into the image, then copy it into the image from the cache.
Thanks, however, it still is more a workaround because
I don't know how much effort would it be to simply have a native option for mounting the same volume into the final image but I'm pretty sure it'd make the usage easier. These are just 2 examples from script languages where the way to use this cache wasn't obvious to me. I can most certainly imagine this will come up in different contexts as well.
@thisismydesign It seems like what you want is to be able to share a cache between build and run?
buildkit is a linux only solution, what do we do on windows?
@thisismydesign I'm not sure why do you expect a (cache) mount to stay in the final image. I wouldn't expect this and I don't want to have ~1gb in my image just because of using download cache mount.
buildkit is a linux only solution, what do we do on windows?
You can use buildkit on Windows.
https://docs.docker.com/develop/develop-images/build_enhancements/
You may find it easier to set the daemon setting through the Docker for Windows UI rather than setting the environment variable before executing.
@nigelgbanks at the top of your link:
Only supported for building Linux containers
Oh sorry I just assume you were building Linux containers on Windows.
@thisismydesign It seems like what you want is to be able to share a cache between build and run?
That would solve my use case around caching, yes.
Making this easier could save millions of re-downloads of packages in CI
builds per year.
Do any CI services support experimental buildkit features?
On Sat, Jun 13, 2020, 2:08 PM Csaba Apagyi notifications@github.com wrote:
@thisismydesign https://github.com/thisismydesign It seems like what
you want is to be able to share a cache between build and run?That would solve my use case around caching, yes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/moby/moby/issues/14080#issuecomment-643657987, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAMNS6IEQDCO5F3LNHJK5TRWO6AJANCNFSM4BJB2OOA
.
Do any CI services support experimental buildkit features?
Do they have to explicitly support it? I'm using gitlab-ci with buildkit and it just works. After all, it's just a different way of invoking 'docker build'.
Of course, unless you bring your own runners to gitlab, odds of getting a cache hit during build are low anyway.
Copying from a named stage of a multi-stage build is another solution
FROM golang:1.7.3 AS builder COPY --from=builder
But then container image locality is still a mostly-unsolved issue for CI job scheduling
Runners would need to be more sticky and share (intermediate) images in a common filesystem in order to minimize unnecessary requests to (perenially-underfunded) package repos.
I just tried buildkit
but it only marginally improves my workflow, which would be 100% helped by "real" volume or bind mounts to the host.
I am using docker build
to cross-compile old glibc
versions which should then be part of new build containers providing these glibc
s to build under and link against.
Now the repeated glibc
source download is solved by a bind mount (from buildkit
), the archive can be read only, no problem. But I have no way to access the build dir for analysis after failed builds, since the container bombs out on error. (If I restart it to access it, it restarts the build, so that doesn't help).
Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir, where if the build dir would have been a mount in the first place, it would have been so easy. (Just do make install
after the build and I have a clean container without build dir and without the downloaded sources).
So I still believe this is a very valid feature request and would make our lives a lot easier. Just because a feature could be abused and could break other functionality if used, does not mean it should be avoided to implement it at all cost. Just consider it an extra use for a more powerful tool.
But I have no way to access the build dir for analysis after failed builds
Sounds like a feature request for buildkit. This is definitely a known missing piece.
One could do this today by having a target for fetching the "build dir". You'd just run that after a failed run, everything should still be cached, just need the last step to grab the data.
Understand this is a bit of a work-around, though.
Also, I fail to see why I should be jumping through hoops like building a new container from an old one just to get rid of my build dir
Can you explain more what you are wanting/expecting here?
Can you explain more what you are wanting/expecting here?
In this case it's just wanting to kill 2 birds with 1 stone:
Since this, and all the other cases where the build container (as well as "container build") needs to make building as painless as possible, would solved so much more elegantly by just providing -v
functionality, I have a hard time understanding the resistance to provide this feature. Apart from the "cache-aware" functionality buildkit
apparently offers, I can only see it as a convoluted and cumbersome way to achieve exactly this functionality, and only partially at that. (And in many cases where caching is the main goal, it would also be solved by -v
, at the cost of having to lock the mounted volume to a specific container as long as it runs, but the cache with buildkit
has the same restrictions afaict.)
Can you explain more what you are wanting/expecting here?
I'm using a multi-stage build process, where the build environment itself is containerized, and the end result is an image containing only the application and the runtime environment (without the build tools).
What I'd like is some way for the interim Docker build container to output unit test and code coverage results files to the host system in the events of both a successful build and a failed build, without having to pass them into the build output image for extraction (because the whole build process is short-circuited if the unit tests don't pass in the earlier step, so there won't be an output image in that situation, and that's when we need the unit test results the most). I figure if a host volume could be mounted to the Docker build process, then the internal test commands can direct their output to the mounted folder.
@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah
a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).
There is no offense in saying another tool suits your needs better.
If something works for you, that's wonderful.
The shortcomings of both the v1 builder in Docker and the buildkit builder are pretty well understood in this context, and are looking at how to address those, just preferably without having to resort to bind mounts from the client.
GitHub notifications@github.com wrote:
“@mcattle
Indeed very similar also to (one of the functionalities) I need.
Since moving to buildah a few days ago I got every function I needed and more. Debugging my build container would have been utterly impossible without the possibility to flexibly enter the exited container and links to the host. Now I'm a happy camper. (I'm sorry to crash the party with a "competitor", I'd happily remove this comment if offence is taken, but it was such an effective solution for the use cases presented in this thread that I thought I should mention it).”
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
without having to resort to bind mounts from the client.
Here I explain why a build time -v option is not resorting to or sacrificing reproducibility any more than depending on network resources at build time.
https://github.com/moby/moby/issues/14080#issuecomment-484314314 :
COPY || REMOTE_FETCH || read()
- Which of these are most reproducible?
I'm going with buildah for build time -v
(and cgroupsv2) as well.
@mcattle I have had the same requirement. I solved it with labeling.
I'm going with buildah for build time
-v
(and cgroupsv2) as well.
I'm seriously considering switch from Ubuntu (which has just docker) to Fedora (which has replaced docker with podman/buildah) on our build server because of "-v" support.
Btw. Podman supports also rootless mode, and so far it has seemed fully Docker compatible (except for differences in --user/USER impact, and image caching, that come from using rootless mode instead of running as root like Docker daemon does).
PS. while cgroups v2 is needed for rootless operation, support for that is more about container runtime, than docker. If you use CRun instead of RunC (like Fedora does), you would have cgroups v2 support. RunC does have some v2 & rootless support in Git, but I had some problems when testing it on Fedora (31) few months ago.
EDIT: Ubuntu has podman/buildah/etc in Groovy, just not in latest 20.04 LTS, I think imported from Debian unstable. It hasn't been backported to LTS, at least not yet. Whereas it's been in Fedora since 2018 I think.
@eero-t perhaps you could describe your use-case, and what's missing in the options that BuildKit currently provides that is not addressed for those.
Most helpful comment
I have a slightly different use case for this feature - Caching packages which are downloaded / updated by the ASP.Net 5 package manager. The package manager manages its own cache folder so ultimately I just need a folder which I can re-use between builds.
I.e: