Description
It's documented that ARG can appear before FROM, so that arguments may be substituted into image names etc.
Rather than having some ARG before and some ARG after FROM, for consistency I attempted to place all my ARG before FROM. However, to my surprise (after a lot of debugging) I determined that my arguments are _always_ blank after FROM.
I believe the meta-arg functionality/refactoring may somehow be responsible:
https://github.com/moby/moby/commit/239c53bf836174108dbae445a394a290f5fe2898
Steps to reproduce the issue:
ARG environment
FROM alpine:3.5
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment
environment ARG (stored in /value_of_environment):docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment
Describe the results you received:
development
Describe the results you expected:
production
Additional information you deem important (e.g. issue happens only occasionally):
Altering the Dockerfile such that ARG comes after FROM i.e.
FROM alpine:3.5
ARG environment
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment
then running again:
docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment
gives the expected output of production.
Output of docker version:
Client:
Version: 17.06.0-ce
API version: 1.30
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:31:53 2017
OS/Arch: darwin/amd64
Server:
Version: 17.06.0-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 02c1d87
Built: Fri Jun 23 21:51:55 2017
OS/Arch: linux/amd64
Experimental: true
Output of docker info:
Containers: 59
Running: 0
Paused: 0
Stopped: 59
Images: 370
Server Version: 17.06.0-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 457
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.31-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.818GiB
Name: moby
ID: BCV5:MEMK:BYKI:I2IU:QY2V:5DRM:F2FP:JFAG:SM46:M2WJ:73YV:3KLP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 20
Goroutines: 40
System Time: 2017-07-16T19:58:09.054157098Z
EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@thaJeztah correct me if I'm wrong.
@Benjamin-Dobell after investigating this, https://github.com/moby/moby/commit/239c53bf836174108dbae445a394a290f5fe2898 is not the origin of this behavior.
Basically, after the FROM instruction all the build arguments are reset and thus aren't available in the Dockerfile.
From what I found the purpose of ARG before FROM is to use it inside the FROM instruction https://github.com/moby/moby/pull/31352
Yes, this doesn't look like a bug; see this pull request, which adds some more information https://github.com/docker/cli/pull/333
@thaJeztah I guess we can close this
Irrespective of whether this was implemented this way intentionally or it's a bug; I think it's a bit of a usability nightmare.
It's not clearly documented that this is the expected behaviour, and it makes for messy Dockerfile. But more importantly, it opens a pandora's box of confusing edge-cases.
What if I intend to use an ARG in both my FROM statement _and_ after it? Am I expected to have multiple ARG statements referring to the same build-arg?
What happens if I use default value syntax ARG argument=some_value before FROM and just ARG argument after FROM? What is the expected value of argument after FROM if no argument build-arg was passed?
What is the expected value of argument after FROM if no argument build-arg was passed?
The same as it would be if you're not using multi-stage build; empty / no value set
@thaJeztah I know that's true now, I've experimented with it. The issue is that it's hugely non-obvious.
If this is _expected_ behaviour and no-one is willing to change it. Then at the very least ARG ought to be deprecated (before FROM) and instead when used prior to FROM the syntax should be FROMARG (which _must_ come before FROM).
ARG is reset after each FROM. If this is documented; why would ARG before FROM have to be deprecated?
/cc @tonistiigi @dnephin
Improved documentation is always appreciated, and would have saved me _some_ time. However, just because behaviour is documented doesn't preclude the behaviour itself from scrutiny.
ARG has too much complexity to it. I'd argue this functionality shouldn't have been added to the ARG keyword in the first place, it's effectively been repurposed and its behaviour is now far to nuanced. A new keyword FROMARG from the on-set would have made a lot more sense.
I should note, that I'm not actually an advocate of expanding the grammar when the _usage_ of the existing grammar can be expanded.
However, in this particular instance ARG has had its existing semantics altered; the behaviour is not additive. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.
It's extremely confusing in single stage builds, and perhaps more-so in multi-stage ones. If arguments really are tied to build stages (although I must confess I'm not sure why this is desirable), then you've suddenly a need to look at the previous "stage", beyond the FROM verb.
Realistically, you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages. Additionally:
a “cache miss” occurs upon its first usage, not its definition
So there is zero incentive to intersperse ARG definitions through-out a file. Therefore, the most logical behaviour would be to _encourage_ all ARG definitions to be placed at the top of a file (where they can clearly be seen) _and_ then update the behaviour to ensure there's no funny business with build stages.
However, in this particular instance ARG has had it's semantics altered. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.
The new ARG features are 100% backward compatible. No previous Dockerfile needs any changes.
then you've suddenly a need to look at the previous "stage", beyond the FROM verb.
It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.
a “cache miss” occurs upon its first usage, not its definition
All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.
However, in this particular instance ARG has had it's semantics altered.
The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.
If arguments really are tied to build stages, then you've suddenly a need to look at the previous "stage", beyond the FROM verb.
I think you're misunderstanding the scope. They are only scoped to the stage where they are declared.
(although I must confess I'm not sure why this is desirable) ... you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages
The use cases supported by a Dockerfile expanded quite a bit with multi-stage builds. It's no longer the case that a single Dockerfile will produce a single image. You can use --target to run different stages. At this time the build is still sequential but in the future we should be able to build more optimally. Not every build stage will run on every build.
In this context the design should make more sense. Although the values might not change, which lines actually run will change depending on the --target, which means the args must be defined in each stage, not in the meta section before a FROM.
All args are used in every
RUNcommand. If argument changes it breaks all cache from the very first time RUN is used.
Yikes! That also needs documenting... and changing.
It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.
When looking at a Dockerfile, what syntax marks the beginning of a new build stage?
FROM does, and yet, somehow it accesses ARG defined prior to this line.
I'm was just clarifying what "first use" means. You use an ARG by executing a RUN command. No changes from the time ARG was introduced.
FROM defines a stage. What do you mean by accessing ARG?
There is a specific syntax that can be used to avoid redefining a default value for ARG multiple times in same file (something that you asked in https://github.com/moby/moby/issues/34129#issuecomment-315856425 btw). That requires both places to define that they want to share it. No ARG defined before FROM accidentally leaks into any build stage.
To be clear, I'm not saying I don't understand how the current implementation works, what has been written in this issue explains it clearly enough. I'm suggesting the implementation itself is non-ideal and confusing; after all, I read the existing docs and literally cloned Docker compose, Docker client and finally Docker before working out what was going on - at which point I opened this issue.
It's just too complicated. Adding so much complexity to the Dockerfile syntax and the corresponding documentation is simply not sustainable.
The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.
I don't think this is necessarily 100% accurate that multi-stage and ARG in FROM are independent, they _should_ have been independent, but I think the existence of multi-stage impacted the implementation of ARG in FROM.
The properties of ARG _were_:
It may appear after FROM.
The argument defined by ARG may be used on any line following the definition.
(2. is the way Dockerfiles always worked, sequential, state is additive, never subtractive).
A feature request comes along:
I'd like to use arguments in
FROM.
Reasonable enough, the two previously defined properties still hold if implemented. We now have a third property:
ARG may appear before FROM.This can cleanly be implemented, without any backwards compatibility issues. Except, it wasn't; it could have been, but it wasn't.
Instead, property 2. was violated, suddenly ARG can't always be used after its defined. If it appears before FROM, then it can only be used in FROM, not on all subsequent lines.
That's changing the semantics of ARG, hence why I'm suggesting it should have been FROMARG, a keyword that can only appear in the "meta section" prior to FROM.
Mind you, this constraint is artificial in nature, there's zero reason 3. shouldn't have been implemented cleanly. The only reason the current implementation was deemed acceptable is because multi-stage builds were also coming, and it was also violating 2., albeit in a (roughly) well-defined fashion.
Anyway, my issue is complexity; that's subjective and given I'm not a maintainer, not for me to decide. Documentation is certainly better than nothing, so this issue may be closed if you see fit.
As a new user of ARG it was very unintuitive why my ARG was empty. I saw someone use an example of ARG in a Dockerfile, but they were using it in the FROM line. For me it makes sense to define any parameterisation of a Dockerfile at the very top, so I didn't question it. Only upon rereading the docs after reading this issue do I understand why.
I would suggest a warning that ARG gets reset after FROM in the documentation, as not everyone is up to speed on multistage builds.
@Benjamin-Dobell I wanted to use build-args in multistage builds to pass secure keys to intermediate build stages which would then disappear. I haven't completely got confirmation that this is secure, but I was actually happy to see your issue.
For the record, aside from implementation details which respondents seem to be burdening you with, clearing build args -- at least so they can't be read from the build history -- seems IMO to be a very important feature... well worth the complexity.
UPDATE -- sigh ... I guess I spoke prematurely. Multistage builds don't help with the fact that args are written to build history.
@shaunc Are you saying that build-arg defined for an intermediate stage is visible in the history of the final stage? This should not happen if you use COPY --from.
I ran into the same issue and in order to underline the impact of that behaviour, I want so share my example here, whos cause took a significant amount of time to figure out. Still it's totally unexpected and I wont exactlly call that user experience.
Please, if you don't see the necessity to change that bahaviour, then at least document it as the creator of this issue suggested, so that people can stumble upon this.
docker image build \
--build-arg NODE_VERSION="4.8.3" \
--build-arg NPM_VERSION="4.5.0"
Works not as expected. NPM_VERSION holds "latest".
ARG NODE_VERSION="latest"
ARG NPM_VERSION="latest"
FROM node:${NODE_VERSION}-alpine
RUN npm install -g npm@${NPM_VERSION}
...
Works as intended. NPM_VERSION holds "4.5.0".
ARG NODE_VERSION="latest"
FROM node:${NODE_VERSION}-alpine
ARG NPM_VERSION="latest"
RUN npm install -g npm@${NPM_VERSION}
...
Please, if you don't see the necessity to change that bahaviour, then at least document it so that people can stumble upon this.
https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
https://docs.docker.com/engine/reference/builder/#scope
If this is a common pattern a PR would probably be accepted that detects this case (at least for variable substitution) and shows a warning about possible misuse.
As far as this keyword behaves with multiple FROM statements, in "multi-stage" builds, ARG lets you specify different defaults for different stages, but there is no way (nor should there be) to pass different values explicitly to different stages. That's far more convoluted than having ARGs go into effect from the keyword down, across any number of stages/FROMs.
If you want to use the same ARG before and after FROM, simply re-declare it after, e.g.:
ARG my_arg
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"
# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
simply re-declare it
This is an over simplification. You are not considering default values and the programming rule of one single source of truth.
ARG my_arg="default"
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"
# Re-declare
ARG my_arg="default"
# This should not be empty
RUN echo "my_arg is $my_arg"
We now have the arg's default value defined twice in one file - we have lost the single source of truth.
This is an over simplification. You are not considering default values
The example given actually takes care of default values;
docker build --no-cache -<<'EOF'
ARG my_arg=latest
FROM busybox:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"
# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
EOF
Sending build context to Docker daemon 2.048kB
Step 1/5 : ARG my_arg=latest
Step 2/5 : FROM busybox:$my_arg
---> 59788edf1f3e
Step 3/5 : RUN echo "my_arg is $my_arg"
---> Running in 029ff9c3cdc8
my_arg is
Removing intermediate container 029ff9c3cdc8
---> f9135f511c84
Step 4/5 : ARG my_arg
---> Running in 7c9616537324
Removing intermediate container 7c9616537324
---> 35ccdf7ea0a9
Step 5/5 : RUN echo "my_arg is $my_arg"
---> Running in 1e712eef0399
my_arg is latest
Removing intermediate container 1e712eef0399
---> 56c25e303cb9
Successfully built 56c25e303cb9
I also posted some examples in https://github.com/moby/moby/issues/37622#issuecomment-412101935, https://github.com/moby/moby/issues/37345#issuecomment-400245466
I lost couple of hours to this. Intuitively I was expecting that ARG before FROM in multistage build will be a global ARG (for all stages). In simply gets cleared instead.
This is horrible to way with something that seems to be a global value.
I have a dockerfile with multiple FROM statements and things are breaking because I can't pass the arg values as I originally thought. Sure, maybe I should read the documentation a bit more but it seems I am not alone in expecting this behaviour (ARG being global) so maybe things should work as the MAJORITY think it should?
I have a reverse twist on this. I remembered from the docs that ARG had to appear before FROM in order to be used in FROM, so I put an ARG before the FROM of my second builder declaration. And got an invalid-format error on the FROM line, because that ARG appeared after the first FROM in the file, and so was ignored when processing the second FROM line. So ARG-before-the-first-FROM is global for all FROM lines and not used in any other lines, while ARG-after-FROM is used only between that FROM and the next FROM. It is consistent in a way, but completely non-intuitive, so really the ARG-before-FROM ought to be named FROMARG as suggested earlier in this thread, because otherwise it just breaks expectations left and right.
Docker version 19.03.6, build 369ce74a3c
Linux 5.3.0-46-generic #38~18.04.1-Ubuntu
ARG VERSION="kinetic"
FROM ros:${VERSION}-ros-base
RUN apt-get update && apt-get install -y \
ros-${VERSION}-ros-tutorials \
ros-${VERSION}-common-tutorials \
&& rm -rf /var/lib/apt/lists/
results in:
E: Unable to locate package ros--ros-tutorials
E: Unable to locate package ros--common-tutorials
Why isn't this global? What is the fix? It seems to me from reading this and other threads that a global arg is the desired and expected...
@darrahts see https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
Most helpful comment
As a new user of
ARGit was very unintuitive why myARGwas empty. I saw someone use an example ofARGin a Dockerfile, but they were using it in theFROMline. For me it makes sense to define any parameterisation of a Dockerfile at the very top, so I didn't question it. Only upon rereading the docs after reading this issue do I understand why.I would suggest a warning that
ARGgets reset afterFROMin the documentation, as not everyone is up to speed on multistage builds.