When writing Dockerfiles, often layering is important, and it's best practice to group shell commands in logical units (e.g., combine apt-get update and apt-get install in a single RUN ).
Unfortunately, the Dockerfile syntax can become complicated if many instructions have to be combined in a RUN instruction.
Basically, to write a script / commands to run in a Dockerfile will require them to be rewritten in most cases.
Having support for multi-line RUN instructions has been discussed in the past (for example, https://github.com/moby/moby/issues/1799, https://github.com/moby/moby/issues/1554, and https://github.com/moby/moby/issues/16058#issuecomment-138011204, possibly others), and was partly addressed by adding support for line-continuation symbols (\), later enhanced with the escape= directive to assist in writing Dockerfiles targeting Windows.
Further changes were put on hold, pending a major refactor of the builder; now that the Dockerfile syntax is no longer frozen, and there's a clearer roadmap for the builder, I'm opening this proposal to start the discussion again :)
My proposal is to add support for heredoc-style notation in the Dockerfile, similar to what's implemented in @jlhawn's Dockramp (https://github.com/jlhawn/dockramp#tokens). Having this notation makes writing multi-line (RUN, possibly extending to other Dockerfile instructions as well) commands easier to write and, even though heredoc is not a known concept on Windows, benefits writing Windows Dockerfiles as well.
The full definitiona of here documents can be found here, but I'll provide some examples below.
The basic notation is;
RUN <<[-]word
(run instructions)
word
Where
<< marks the start of the here document-, if set, strips leading tabs from the here documentword can be any word, and is used as delimiterword is quoted (' or "), no (variable) expansion is performed inside the here document.To see this in action, create a shell-script containing the following;
#! /bin/bash
cat <<EOF
# example 1
echo $PWD
echo \$PWD
echo `pwd`
EOF
cat <<'EOF'
# example 2
echo $PWD
echo \$PWD
echo `pwd`
EOF
cat <<"EOF"
# example 3
echo $PWD
echo \$PWD
echo `pwd`
EOF
cat <<-EOF
# example 4
echo $PWD
echo \$PWD
echo `pwd`
EOF
cat <<-'EOF'
# example 5
echo $PWD
echo \$PWD
echo `pwd`
EOF
cat <<-"EOF"
# example 6
echo $PWD
echo \$PWD
echo `pwd`
EOF
Which produces something like:
# example 1
echo /Users/sebastiaan/projects/docker-proposals/heredoc
echo $PWD
echo /Users/sebastiaan/projects/docker-proposals/heredoc
# example 2
echo $PWD
echo \$PWD
echo `pwd`
# example 3
echo $PWD
echo \$PWD
echo `pwd`
# example 4
echo /Users/sebastiaan/projects/docker-proposals/heredoc
echo $PWD
echo /Users/sebastiaan/projects/docker-proposals/heredoc
# example 5
echo $PWD
echo \$PWD
echo `pwd`
# example 6
echo $PWD
echo \$PWD
echo `pwd`
The heredoc notation in the Dockerfile should largely follow the behavior as described above;
If word is not quoted, environment variables that are known in the builder's context are expanded (as is done today when using the shell syntax);
ENV FOO=hello
RUN <<EOL
echo $FOO
EOL
Is expanded _by the Dockerfile parser_ to;
ENV FOO=hello
RUN <<EOL
echo hello
EOL
Or (in the image's configuration);
/bin/sh -c ' echo hello\n'
or in JSON format:
RUN ["/bin/sh", "-c", " echo hello\n"]
If word _is_ quoted, no expansion takes place, other than expansion by the shell, when executing the command:
ENV FOO=hello
RUN <<'EOL'
echo $FOO
EOL
Produces
/bin/sh -c ' echo \$FOO\n'
or in JSON format:
RUN ["/bin/sh", "-c", " echo $FOO\n"]
The quoted syntax can be usefull for Windows Dockerfiles as well, think of:
ENV FOO=hello
RUN <<'EOL'
dir C:\some\directory
EOL
When using the <<- syntax, all leading tabs are removed.
Note that, due to the way the builder works;
pwd or $(pwd) are not expanded by the builder (but will be executed by the shell) escape directiveMy original intent was to have here-documents _ignore_ the escape-directive, basically, pass anything inside the here-document as-is to the shell (which could be bash, CMD.exe or PowerShell).
While this would solve many use-cases, there are some caveats;
If word is not quoted, _all_ environment variables would be expanded; there is no way to have _some_ environment variables expanded, and others unexpanded. For example:
ENV FOO=hello
ENV BAR=baz
RUN <<-'EOL'
echo $FOO;
echo \$BAR
EOL
Would result in;
/bin/sh -c 'echo hello; echo \baz\n'
We also need to take into account possible expansion of this syntax to Dockerfile instructions, other than just RUN (see below).
Although we could start with just supporting this syntax for RUN, the here-doc syntax could also be implemented for other Dockerfile instructions. Here are some examples that came up in a discussion I had with @tonistiigi;
COPY <<EOF /dest
this is contents
EOF
ARG myscript=<<EOF
stuff
EOF
RUN $myscript
COPY $myscript /
Finally this example came up as well;
RUN <<EOF | sh
echo aa
EOF
ping @tonistiigi @tianon @jlhawn @simonferquel @duglin PTAL
I like the idea of being able to group a set of RUN commands together w/o needing to && things together. However, I would prefer if under the covers the builder just looked at it as a series of RUN commands wrapped by a single commit. Meaning, all of the same env var processing happens - nothing special. The only tricky part might be the cache processing - we may need to consider the entire group of RUN commands as a single entity for this purpose, but that's a detail to be worked out later.
If we really want to make this more generic we could look at grouping more than just RUN cmds, I know that would make a ton of people happy ;-)
But +1 to the concept
👍 For the heredoc in Dockerfile. That will address many issues we encountered before. Grouping more than just RUN cmds would be even better.
I love the idea of "heredoc RUN" -- this would allow for actual newlines in a single RUN line, which are currently impossible. :smile: :+1:
One thing I'd note is that the <<-'EOF' style _only_ removes leading tab characters, not spaces (as an intentional feature), which is often useful for usage text; for example:
usage() {
self="$(basename "$0")"
cat <<-EOF
usage: $0 arg arg
ie: $0 abc xyz foo bar
EOF
}
usage
Whose output is:
usage: script arg arg
ie: script abc xyz foo bar
Doing things like this would be amazing:
RUN <<-'EOF'
set -ex
foo
bar
baz
EOF
Which is way easier to both read and write properly than:
RUN set -ex; \
foo; \
bar; \
baz
(which is a lot more error prone)
cc @yosifkit (relevant to your interests too)
I am a bit confused by these two examples:
# 1 (context "ARG" assuming it is related to the COPY)
ARG myscript=<<EOF
stuff
EOF
# is this going to look for the file "stuff\n" in the context? I believe that would be the current behavior
COPY $myscript /
# I don't think this should have a special case if a variable ($myscript) was defined as a heredoc since there is no correct way to tell if --build-arg would need the same special treatment
# 2, is this to change the shell that runs the script?
RUN <<EOF | sh
echo aa
EOF
# Is there a need for this special sytax when you have shebang and the `SHELL` command
# the parser would have to check the first line for it so that it can exec the right thing
RUN <<EOF
#!/bin/sh
echo aa
EOF
This one will need extra parameters to control file permissions (ownership would be nice too) so that it can be possible to embed executable scripts or files with reduced permissions. What would be the default permissions and ownership of the created file?
COPY <<EOF /dest
this is contents
EOF
This was literally the first thing I wished for when I started writing docker files.
I think it makes sense to consider this a simple syntactic sugar for arguments provided by commands (as the original proposal AIUI has it) and _not_ to have RUN special case any \n found in the argument as suggested in https://github.com/moby/moby/issues/34423#issuecomment-320668076. So the ARG to run is always just a string (which may now include literal \n using this new syntax)
Given that it then it easily expands to be applicable to the ARG given to most commands since it happens before those commands "see" it.
I think the COPY $myscript / example is indeed a bit odd since it does reference a file stuff\n in the context, which is liable to be a bit of an unusual occurrence and not terribly useful in practice. I'd argue that it is better to have all commands accept the heredoc syntactic sugar than a subset though.
I notice that compared to Shell Perl also takes <<\FOO as the same as <<'FOO' WRT quite expansion. It also allows you to stack them:
print(<<EOA, <<EOB);
This is string A
EOA
THis is string B
EOB
Not sure if there are any places in the Dockerfile syntax where that might be useful. http://perldoc.perl.org/perlop.html#%3C%3C_EOF_ is the reference if anyone is interested.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04 is the POSIXy shell definition although on the face of it it looks pretty similar (if not identical) to the bash version linked in the proposal.
Some background on the ARG/COPY examples. I don't think they are 100% critical if there is confusion.
The idea was that heredoc in this command would associate the value with a heredoc(file) type. So
ARG myscript=<<EOF
stuff
EOF
is not equal to
ARG myscript="stuff\n"
The first one can be used as an argument to RUN and COPY. Second can be used for variable replacements. This can be expanded in the future to also include an argument that can keep a reference to a build stage(source in buildkit). --build-arg would work the same with all arg types, in the case of the first one replacing the heredoc body.
COPY $myscript /foo would behave the same as COPY <<EOT /foo, creating a file foo with the expected contents. --chown is being added in another PR but I'm not sure we should worry about this or mode much as if the user wants to control this there are always better ways. Inlining is a simple and readable way for the default case. For consistency it should probably be same as RUN echo aa > /foo.
# 2, is this to change the shell that runs the script?
RUN <<EOF | sh
echo aa
EOF
The shebang would work as well. The benefit of | is that it can be used for more complex io redirects. For example RUN echo foo | sed s/f/b/ > file works fine atm.
I'm a bit confused by this thread. Can someone articulate the exact problem we're trying to solve? At first I thought it was about people having to use && in their RUN but now I'm beginning to wonder if its closer to people asking for a START and END transaction kind of thing where its all merged into one commit, or if people are just looking for a way to insert \n into the strings. Each has a very different possible solution.
The idea was that heredoc in this command would associate the value with a heredoc(file) type.
@tonistiigi I think I was confused, I was mostly familiar with Perl's usage of << which it turns out is a bit different to a shell. This is not the behaviour I was expecting from the shell (based on my incorrect mapping from the Perl behaviour):
$ cp <<EOF foo
This is a heredoc
EOF
cp: missing destination file operand after 'foo'
Try 'cp --help' for more information.
But reading the spec this is correct/expected because what it actually does is pipe This is a heredoc\n into the stdin of the cp process (which doesn't care about its stdin of course).
If it had been the Perl-ish way then this would instead have been the same as cp "The is a heredoc\n" foo and would have looked for a file named "The is a heredoc\n".
I might need to have a harder think but it seems that the stdin for a command in a Dockerfile is not really a concept which exists, so the proposal is not so much for shell like handling, but I think it is also not that similar to the Perl variant. I'm not quite sure yet if I think the proposal here is somewhere in the middle or if it is either one or the other depending on the specific command being used.
@duglin IMHO it is mostly about inserting \n into strings, which in turn can enable writing RUN commands which can then be written without using && \ or ; \ (because you can write a more natural free form script using set -e and you don't have to worry about backticks and line continuations in the same way you do today, so it all becomes easier to read). I don't think anyone else has been talking about START/END transactions here other than yourself, I do not believe that is what this issue is about at all.
The reason I jumped to the start/end thing is because I don't think multiple RUNs are just a matter of inserting \n as much as its providing a list of RUNs to be executed within the scope of a single commit. While I'm sure there are usecases for inserting a script on a RUN in-place of just a simple cmd line, based on what I've heard in the past I think the more popular request is to just be able to do something like this:
RUN cmd1
RUN cmd2
RUN cmd3
END
or
RUN --nocommit cmd1
RUN --nocommit cmd2
RUN cmd3
and not get 3 different layers/commits. And the way people do this today is via &&. See the first sentence of the first comment of this issue.
So, if that's the driving usecase then we should focus on that and less on inserting '\n` - hence my question about the true problem being asked to be solved. I know that at the end the opening comment got into "other Dockerfile commands", but I wonder whether that's something that's been asked for by the community nearly as much as simply being able to specify multiple RUNs within a single commit - because generic heredoc support starts to get into an area where Dockerfiles are not just a list of commands, but takes a step towards becoming a full scripting language. And while that might be a really cool thing to do, I think that should be solve in a holistic way and not piecemeal to ensure we have a consistent solution.
In one of my Dockerfiles i maybe have a use case to consider guys:
RUN printf "\
server {\n\
listen 80;\n\
server_name _;\n\
\n\
location = ${PUBLIC_URL} {\n\
rewrite ^.*$ ${PUBLIC_URL}/ permanent;\n\
}\n\
root /usr/share/nginx/html;\n\
location ${PUBLIC_URL} {\n\
alias /usr/share/nginx/html;\n\
try_files \$uri /index.html;\n\
}\n\
}" > /etc/nginx/conf.d/default.conf
Writing this in place would be probably more readable & elegant:
ARG NGINX_CONFIG=<<EOF
server {
listen 80;
server_name _;
location = ${PUBLIC_URL} {
rewrite ^.*$ ${PUBLIC_URL}/ permanent;
}
root /usr/share/nginx/html;
location ${PUBLIC_URL} {
alias /usr/share/nginx/html;
try_files \$uri /index.html;
}
}
EOF
RUN printf "$NGINX_CONFIG" > /etc/nginx/conf.d/default.conf
I'm totally here for the ability to embed \n directly. Here's another similar example from the openjdk official image:
# add a simple script that can auto-detect the appropriate JAVA_HOME value
# based on whether the JDK or only the JRE is installed
RUN { \
echo '#!/bin/sh'; \
echo 'set -e'; \
echo; \
echo 'dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"'; \
} > /usr/local/bin/docker-java-home \
&& chmod +x /usr/local/bin/docker-java-home
vs
# add a simple script that can auto-detect the appropriate JAVA_HOME value
# based on whether the JDK or only the JRE is installed
RUN <<-'EOR'
set -ex
cat > /usr/local/bin/docker-java-home <<-'EOF'
#!/bin/sh
set -e
dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"
EOF
chmod +x /usr/local/bin/docker-java-home
EOR
This is an excellent idea, it looks like this would require a major change on how we process Dockerfiles though. At the moment we parse the Dockerfile using an AST but we do so by processing line-by-line with lots of loops and hacks.
@thaJeztah Is there a clear grammar of the Dockerfile documented somewhere? Is it planned to refactor the parser now that it's not frozen anymore?
As far as I can tell docker files are usually small, so I guess they are loaded into memory and not processed as a stream. It is not that hard to write a preprocessor which replaces the the \n with \nRUN and removes the heredoc. You don't even need to touch the code of the current parser. I don't understand why this is such a big deal as it is claimed. I don't support the idea of calling these multiline RUNs in some kind of transaction, using them as sugar syntax is much easier.
Wow, I am amazed this is still not implemented.. Why?
One Dockerfile to rule them all!
As this gives better direct view of what is going on right.
Please someone who has the power be our 'hero'-doc implementer!
+1
when is it coming?
I support @duglin's probing relating to the wider community's motivation for wanting this kind of thing. IMO these issue discussions will remain confusing unless it's spelled out in big letters that there will be no implicit insertion of &&s at the shell-language level.
I just learned docker a few weeks ago for the purposes of a C++ build environment for a CI pipeline. The reason I found this issue and vote for it is ease of readability of Dockerfiles. The && \ is very boilerplate syntax for me. I'd like it to be unnecessary. I am seeing remarks about how Dockerfiles are becoming a whole scripting language, but it's already a DSL, and I don't think it needs to measure up against Python or anything. My gut feel reading comments from @duglin is that maybe this is being overthought. Why not address the immediate concerns with a well-thought solution to the specific problem? I'd rather a potentially-deprecated feature be added for a specific problem than for that one problem to be ultimately never addressed because the developers are waiting on a big-bang level change to the whole domain-specific language.
Basically, I am doing something like this right now:
RUN true \
&& boost_version=1.68.0 \
&& boost_dir=boost_1_68_0 \
&& wget -nv https://dl.bintray.com/boostorg/release/${boost_version}/source/${boost_dir}.tar.gz \
&& tar -xzf ${boost_dir}.tar.gz \
&& cd ${boost_dir} \
&& ./bootstrap.sh \
&& ./b2 -j$NUM_PARALLEL \
--without-python \
toolset=clang-6 \
link=shared \
runtime-link=shared \
cxxflags="-stdlib=libc++" \
linkflags="-stdlib=libc++" \
install \
&& cd .. && rm -rf *
I'd rather see a more flexible multi-line syntax for RUN. I expect as a default (majority use case) that the end result of a RUN is atomic; meaning && is implied. And escaping newline should be unnecessary. So something like this:
RUN {
boost_version=1.68.0
boost_dir=boost_1_68_0
wget -nv https://dl.bintray.com/boostorg/release/${boost_version}/source/${boost_dir}.tar.gz
tar -xzf ${boost_dir}.tar.gz
cd ${boost_dir}
./bootstrap.sh
./b2 -j$NUM_PARALLEL \
--without-python \
toolset=clang-6 \
link=shared \
runtime-link=shared \
cxxflags="-stdlib=libc++" \
linkflags="-stdlib=libc++" \
install
cd .. && rm -rf *
}
To me this is much more straightforward.
Remember that I started off my comment with my familiarity with Docker and how I use it for a very specific purpose. I don't have knowledge of specific edge cases, or other contexts. But based on how I use Dockerfiles and how many samples online I see doing things (which were learning material for me), I think there's some reasonable and common patterns you can incorporate support for in your DSL.
I hope my feedback is helpful. Looking forward to some compromise here without waiting on a kitchen sink implementation.
For me it does not matter, all of these idea's are better than && \ everywhere in my Dockerfiles.
Even a:
echo "do stuff"
rm /build/*
echo "do more"
RUNEND
would do, as long as it makes multi-line run more easy!
these times i tend to use multistage build and put commands as separate RUN statements which utilizes build cache.
@glensc thats great, but it just would not fullfill a:
RUNBEGIN # or RUN {
cat > /usr/local/bin/docker-java-home <<-'EOF'
#!/bin/sh
set -e
dirname "$(dirname "$(readlink -f "$(which javac || which java)")")"
RUNEND # or }
While this is wonderful for situations where you like to overview the whole process at once.
I am new to docker and the dockerfile syntax, so my perspective may be one that is different from the majority of voices heard here. Though valuable, I hope. I read the best practices for dockerfiles, I am proficient at shell scripting, and I read the discussion here.
I also wanted to see something like this (or like START / COMMIT) to help reduce the number of storage layers without having to choke down the horrid syntax that is line continuations and &&'s everywhere. (The #comments issues make things worse, and now I'm afraid to put comments anywhere within a multiline RUN, because I'm uncertain what it will do).
Rather than changing RUN (which seems like it comes with some challenges), I propose a different syntax entirely. SHELL [OPTIONS] / ENDSHELL. Example:
# With standard RUN
RUN date > /tmp/timestamp && echo foo
# With SHELL.
SHELL
set -e
date > /tmp/timestamp
echo foo
ENDSHELL
# With SHELL and comments
SHELL
set -e
# put the current date into /tmp/timestamp
date > /tmp/timestamp
echo foo
ENDSHELL
# with SHELL, specific shell specified
SHELL /bin/bash
set -e
alias ts='date > /tmp/timestamp'
ts
echo foo
ENDSHELL
# with SHELL, asking for build context env to be imposed
ENV FOO=fxx
SHELL --env=yes /bin/bash
set -e
echo {$FOO//x/o}
ENDSHELL
Note the following considerations:
RUN commandSHELL and ENDSHELL line is given _verbatim_ as a script to the relevant shell.set -e)--env=yes causes them to be added to the environment in which the shell executes. (e.g. execve to modify the environment)"END""SHELL" evaluates to the literal you wanted.ENDSHELL is seen.Anyway. I hope this perspective is useful.
@DerrickRice I like your proposal. I have only one question:
Why the --env=yes?
Why not just include it by default?
If the environment variables are in the way for the script, the script could just be run before the environment variables has been set.
not sure if this discussion is still active, a small use-case in favor of some form of this syntax:
our applications get their default configurations from a module, but configurations can be overridden by environmental variables. variables would be overridden via an ini of cfg file. we call our docker build steps with GNU make.
It would be very nice to have something along the lines of
ARG bar="baz"
ENV BAR=${bar}
RUN cat <<-EOF > setup.ini
FOO=$BAR
EOF
so that we are able to override these environment variables with docker build arguments.
It's help for me ( https://github.com/jen-soft/pydocker )
[ Dockerfile.py ]
from pydocker import DockerFile # sudo pip install -U pydocker
d = DockerFile(base_img='debian:8.2', name='jen-soft/custom-debian:8.2')
d.RUN_bash_script('/opt/set_repo.sh', r'''
cat >/etc/apt/sources.list <<EOL
deb http://security.debian.org/ jessie/updates main
deb-src http://security.debian.org/ jessie/updates main
EOL
apt-get clean && apt-get update
''')
d.EXPOSE = 80
d.WORKDIR = '/opt'
d.CMD = ["python", "--version"]
# d.generate_files()
d.build_img()
# sudo wget -qO- https://get.docker.com/ | sh
python Dockerfile.py
docker images
Happy birthday :birthday:!!!
This proposal is 2 years old today and first related issue will soon be 6 years old (see #1554)... :smile:
@ptdel You can simply use one-liner like
RUN echo "FOO=$BAR" > setup.ini
Looks better IMO.
+1 would use multiline, devops engineers must reduce cognitive burden wherever possible and we don't need a bunch of extra \ everywhere
I missed an overview with examples, so I have tried to visualize the proposals below.
Read the proposals above too, as my examples are not complete.
This is how it is now:
# Install dependencies
RUN apt-get update -qq && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https apt-utils && \
wget -O- -q https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
wget -q https://packages.microsoft.com/config/debian/9/prod.list -O /etc/apt/sources.list.d/mssql-release.list && \
apt-get update -qq && \
DEBIAN_FRONTEND=noninteractive ACCEPT_EULA=Y apt-get install -q -y --no-install-recommends msodbcsql17 unixodbc-dev && \
apt-get purge -q -y apt-transport-https apt-utils && \
apt-get autoremove -q -y && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /etc/apt/sources.list.d
# Create directory for DB-files
RUN mkdir db
# Install Python dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt
We have the original proposal a single statement multiline.
RUN <<-'EOR'
# Update package lists
apt-get update -qq
# Install
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https apt-utils
wget -O- -q https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
wget -q https://packages.microsoft.com/config/debian/9/prod.list -O /etc/apt/sources.list.d/mssql-release.list
apt-get update -qq
DEBIAN_FRONTEND=noninteractive ACCEPT_EULA=Y apt-get install -q -y --no-install-recommends msodbcsql17 unixodbc-dev
# Cleanup
apt-get purge -q -y apt-transport-https apt-utils
apt-get autoremove -q -y
rm -rf /var/lib/apt/lists/*
rm -rf /etc/apt/sources.list.d
EOR
# Create directory for DB-files
RUN mkdir db
# Install Python dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt
Another alternative is the proposal from @DerrickRice:
...
# Install dependencies
SHELL /bin/bash
# Update package lists
apt-get update -qq
# Install
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https apt-utils
wget -O- -q https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
wget -q https://packages.microsoft.com/config/debian/9/prod.list -O /etc/apt/sources.list.d/mssql-release.list
apt-get update -qq
DEBIAN_FRONTEND=noninteractive ACCEPT_EULA=Y apt-get install -q -y --no-install-recommends msodbcsql17 unixodbc-dev
# Cleanup
apt-get purge -q -y apt-transport-https apt-utils
apt-get autoremove -q -y
rm -rf /var/lib/apt/lists/*
rm -rf /etc/apt/sources.list.d
ENDSHELL
# Create directory for DB-files
RUN mkdir db
# Install Python dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt
Simple grouping is quite powerfull. You have to include RUN with each line, but you can also include other statements and reduce the layering further.
...
STARTGROUP
## Install dependencies
# Update package lists
RUN apt-get update -qq
# Install
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends apt-transport-https apt-utils
RUN wget -O- -q https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
RUN wget -q https://packages.microsoft.com/config/debian/9/prod.list -O /etc/apt/sources.list.d/mssql-release.list
RUN apt-get update -qq
RUN DEBIAN_FRONTEND=noninteractive ACCEPT_EULA=Y apt-get install -q -y --no-install-recommends msodbcsql17 unixodbc-dev
# Cleanup
RUN apt-get purge -q -y apt-transport-https apt-utils
RUN apt-get autoremove -q -y
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /etc/apt/sources.list.d
ENDGROUP
## Create directory for DB-files
RUN mkdir db
STARTGROUP
## Install Python dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt
ENDGROUP
...
IMO, the STARTGROUP/ENDGROUP version has the most potential. Here are my thoughts on why I would prefer STARTGROUP/ENDGROUP to HEREDOCs or ENDSHELL.
1) Currently, the Dockerfile syntax looks like a set of lines, where each line starts with a keyword, describing the type of instruction. The only current case (afaik), that breaks this convention is the line continuation feature, which requires an explicit backslash at the end of the line.
Both the HEREDOC and the ENDSHELL variants of the proposal break this convention. I know, that this might seem like a nitpick, but having a uniform syntax is very important in my opinion.
Changing such an invariant might also require non-trivial rewrites of tooling (linters/syntax highlighting).
Dockerfile
RUN <<EOF
# blah blah
ENV python ... # a shell command, that looks like a docker instruction
EOF
Also, SHELL is already a command with a slightly different meaning, so I'd recommend using a different keyword to avoid ambiguities.
If I understand the STARTGROUP proposal correctly, then the indentation of the RUN commands is purely cosmetic (current Dockerfile syntax already supports arbitrary leading whitespace), which preserves the "one line - one keyword" convention.
2) I feel, like the HEREDOC and ENDSHELL proposals are trying to solve the symptoms of the problem, not the problem itself. As I see it, the root problem is the following:
A Dockerfile is often split into several logical stages (install requirements from apt, build my application, configure some service). Most of the times, it is desirable for a single logical stage to be treated by docker as a single operation.
However, if this logical stage consists of multiple docker instructions, they are treated separately. This is undesirable because we don't want to keep the intermediate state in the final image and because in most cases, an invalidated build cache should trigger a rebuild for the whole logical stage.
In some cases, a single logical stage just so happens to consist of only RUN commands. In these cases, you can use the && hack to mash the RUN commands together. This is currently considered a 'best practice'.
The above description leaves us with 2 problems:
- If a single logical stage, that consist of multiple docker instructions has commands other than RUN, then you can't do anything about that.
- The syntax used in the current 'recommended' workaround hack is somewhat error-prone and looks really ugly.
`HEREDOC` and `ENDSHELL` only solve the ugly syntax problem, without fixing the root cause.
P.S. I guess, these proposals aren't mutually exclusive and you could argue that the ugly syntax is something, that should be fixed regardless of if the STARTGROUP proposal gets implemented.
"ugly syntax" is really what I would like the heredoc for. Including newlines in the string that gets sent to the shell is genuinely useful.
A series of RUN lines cannot accomplish everything that a single heredoc RUN can. Two examples are flow control (like while loops and if/else conditionals) and that the output of a "line" is used as a variable that is then used by later lines. These are not easily possible across RUN lines (and not something I would advocate to be added in the Dockerfile syntax*).
Heredoc: a way to have the Dockerfile parser slurp up lines until a predetermined marker to use for a single Dockerfile command
STARTGROUP: a way to group Dockerfile commands into a single commit layer
While STARTGROUP/ENDGROUP can accomplish the reduction of layers that a heredoc makes easy, it is only tangentially related and I think the discussion for such is a separate issue.
* since I don't think that the goal of the Dockerfile should be to become a full programming language
@yosifkit after thinking a bit about what you said, I agree
it is only tangentially related and I think the discussion for such is a separate issue.
It seems, like these 2 proposals are indeed similar, but not completely the same and although there is an overlap between the 2, there are use cases, which can only be satisfied by either one of them.
Unfortunately, there have been a few proposals for the merge-command-group-into-single-layer feature, which were closed as duplicates of this issue (or older issues suggesting heredocs). As a result, people like me, who wanted the merge-command-group-into-single-layer feature ended up in this discussion.
@thaJeztah what do you think about reopening one of the older merge-command-group-into-single-layer proposal issues? (for example #29719) Or if you want, I can open a new one.
P.S. I apologize if my earlier comment came off as dismissive towards the HEREDOC proposal usefulness.
While the STARTGROUP syntax is definitely the most powerful - and, indeed, represents a capability I've long felt to be missing in Docker - it does not actually solve the issue I have: I want to turn this:
RUN \
echo '# Own private key so other Foobar containers can SSH into the SSHD container:' >> '/home/foo/.ssh/authorized_keys' && \
cat '/home/foo/.ssh/id_ed25519.pub' >> '/home/foo/.ssh/authorized_keys' && \
echo '' >> '/home/foo/.ssh/authorized_keys' && \
echo '# Pregenerated staff key so service staff can SSH into the SSHD container:' >> '/home/foo/.ssh/authorized_keys' && \
cat '/etc/ssh/mounted_keys/id_ed25519-login.pub' >> '/home/foo/.ssh/authorized_keys' && \
echo '' >> '/home/foo/.ssh/authorized_keys' && \
echo '# Pregenerated staff key again, this time as a certificate authority,' >> '/home/foo/.ssh/authorized_keys' && \
echo '# so that it can alternatively be used to sign individual private keys rather than being shared by all service staff:' >> '/home/foo/.ssh/authorized_keys' && \
echo "cert-authority $(cat '/etc/ssh/mounted_keys/id_ed25519-login.pub')" >> '/home/foo/.ssh/authorized_keys' && \
true
Into this:
RUN cat <<EOF >> '/home/foo/.ssh/authorized_keys'
# Own private key so other Foobar containers can SSH into the SSHD container:
$(cat '/home/foo/.ssh/id_ed25519.pub')
# Pregenerated staff key so service staff can SSH into the SSHD container:
$(cat '/etc/ssh/mounted_keys/id_ed25519-login.pub')
# Pregenerated staff key again, this time as a certificate authority,
# so that it can alternatively be used to sign individual private keys rather than being shared by all service staff:
cert-authority $(cat '/etc/ssh/mounted_keys/id_ed25519-login.pub')
EOF
I very much like the SHELL/ENDSHELL syntax. My one suggestion for an amendment for it: start the shell in set -e mode by default. In almost any situation where you'd be using it, if any step of the script fails, you want to have the whole build fail. If -e is not on by default, you'll get a lot of people complaining about builds silently ignoring errors. In the rare case where this is not the desired behaviour, they can always use set +e explicitly.
A large part of the discussion here seems to be about whether it would be better to have multi-line commands run in a single shell, or to have multiple Dockerfile statements grouped together in a single layer.
However, each solves a different problem, and so imho it would be preferable to have both.
Multi-line RUN statements would be desirable when you want to keep the state of a single shell invocation. So you'd keep the environment variables, the current working directory, etc. A here-doc syntax would work nicely for this.
Grouping would be used when you want multiple statements, including non-RUN statements, to result in in a single layer.
You can't combine both in a single feature, as you can't simultaneously have a single shell invocation while having non-RUN statements in between the shell commands.
One other benefit of grouping, which I have not seen mentioned before, is that it would be possible to add a 'development build mode' (docker build --devel?), where groups would be ignored, resulting in separate layers.
This would be useful when earlier steps take up a lot of time — because they install dependencies, check out sources, etc. — while the later steps are still in flux while you are writing the Dockerfile — e.g. running configure with different parameters.
When you are done writing the Dockerfile, you would run in 'production mode', and a single layer would be created for the grouped statements without any modifications to the Dockerfile.
+1 heredoc would make my job now lot easier..
My proposals:
HEREDOC > path/to/file
things here
blah blah
RUN command...
ANY_WORD > path/to/file
gsadjgskldagj
sdgklsjdagklaj
ANY_WORD
RUN command...
Most helpful comment
Happy birthday :birthday:!!!
This proposal is 2 years old today and first related issue will soon be 6 years old (see #1554)... :smile: