Good day,
I'm exploring on how to use pipenv with multi-stage docker builds. In the nutshell, the idea is to "compile" stuff in base image and only copy the resulting artifacts to the final image.
With Python is gets tricky, since you need to copy package dependencies as well.
I've checked out several ideas and looks like pip install --user
together with setting PYTHONUSERBASE
is the simplest ways to install dependencies to a side directory, e.g.:
FROM alpine AS builder
# Install your gcc, python3-dev, etc. here
apk add --no-cache python3
COPY . /src/
WORKDIR /src
ENV PYROOT /pyroot
RUN PYTHONUSERBASE=$PYROOT pip3 install --user -r requirements.txt
RUN PYTHONUSERBASE=$PYROOT pip3 install --user .
# The final image
FROM alpine
apk add --no-cache python3
ENV PYROOT /pyroot
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
The problem is that pipenv disregards PYTHONUSERBASE
:
$ docker run --rm -ti python:3.6-alpine sh
/ # pip install --upgrade pip; pip install pipenv==2018.10.13 # skipped output
/ # mkdir /tmp/foo; cd /tmp/foo
/tmp/foo # pipenv install requests # skipped output
/tmp/foo # PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)β¦
π ββββββββββββββββββββββββββββββββ 5/5 β 00:00:01
/tmp/foo # ls /pyroot
ls: /pyroot: No such file or directory
I found a workaround by using pipenv lock -r
and then installing requirements.txt
as in my original idea, but I'm not sure this is the best way to go, particularly if I have custom (private) source
s defined in my Pipefile - I don't want to replicate their configuration into pip.
Any other ideas?
Pipenv unsets user settings because they are incompatible with virtualenv settings. I understand the approach youβre taking, but maybe you can say more about what problem you are trying to solve? You want the wheels and sdists or whatever? If so, on Linux theyβre stored in ~/.cache/pipenv/wheels
Apologies if the original intent was not clear enough.
All I want is to install my package and its dependencies into a separate directory instead of python's site-packages; and then to copy that directory into the final docker image.
Since --system
flag does not use virtualenv (it seems) may be it worth supporting PIP* env vars in that special case?
I'll try to elaborate on the problem: I want to keep my final, "production" docker image as small as possible. Therefore I need to build/install my app dependencies in earlier stage.
Having dependencies installed into general site-packages (either system or of a dedicate venv) is problematic, since I can't reliably pick the actual packages my app depends on later on.
Wheels do no help much either because: a) I don't know which wheels to pick up from cache (some may belong to my app reqs, and some may belong to, e.g. a code generation tool my app uses during install); and b) I need to copy wheels to the final docker image prior to installing them there, meaning they will endup in the resulting docker image.
@techalchemy My comment appeared before yours for some reason (github still recovers?). Just making sure you got notification.
Ah okay, you may be able to just use PIPENV_VENV_IN_PROJECT=1
and copy the local .venv
directoryβs site packages
While it's nice to know about this option, I don't see show does it help me. Site packages from vanilla venv weigh 12MB (because of pip and setuptools)
What I'm trying to do is to run
pipenv install --dev
pipenv install
pipenv install . # May require stuff installed by --dev
and easily collect results of only the last two lines. So another venv does not help me. I need to "split" the installation paths within the venv.
Do you think PEEP that suggest obeying PIP* options if --system
flag is used gonna fly?
Why not just specify your library dependencies in your package metadata and just add uninstall steps for pip/setuptools via pipenv run pip uninstall setuptools
in any case we already support any pip environment variable you set β besides the user one. Iβm struggling to understand what you are gaining by doing all this extra work
OK, I'll try once again to explain. Bear with me please.
Here is what would've wanted to do (in dockerfile):
# Install all stuff in our system so we can install setup.py for our app later on
RUN pipenv install --deploy --system --dev
# Install our app LOCKED dependencies aside
RUN PYTHONUSERBASE=/pyroot pipenv install --deploy --system
# Install the app aside as well - all dependencies already present, so it's just the app
RUN PYTHONUSERBASE=/pyroot pip install --user .
# Take pyroot to the next docker image stage
This look very clear and "human" (as in pipenv's motto) to me :)
Other solutions that require fiddling with virtualenv look error-prone IMHO. I've tried PIPENV_VENV_IN_PROJECT=1
approach + uninstall and there are couple of issues:
Uninstall does not remove dependencies
(Skipped pipenv output for brevity)
$ export PIPENV_VENV_IN_PROJECT=1
$ pipenv run pip freeze # empty new vanilla
$ pipenv install --dev requests
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
requests==2.20.0
urllib3==1.24.1
$ pipenv uninstall --all-dev
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
urllib3==1.24.1
I.e. requests
package has gone, but its dependencies are left behind.
Venv bin directory contains other files
And I need to fish out my app's entrypoints from there.
Again, it's doable, we are in SW after all, but I think it should be easier.
Whatever we conclude in this discussion, I think there should be PEEP/doc explaining the recommended way to use pipenv with docker multi-stage.
After all, pipenv install --system --deploy
looks very good and almost nails it.
Here is what works: PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy
In action:
/tmp/foo $ grep -A 2 packages Pipfile
[packages]
requests = "*"
[dev-packages]
[requires]
/tmp/foo $ PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)β¦
π ββββββββββββββββββββββββββββββββ 5/5 β 00:00:02
/tmp/foo $ ls /pyroot/bin/
chardetect
/tmp/foo $ ls /pyroot/lib/python3.6/site-packages/
chardet chardet-3.0.4.dist-info idna idna-2.7.dist-info requests requests-2.20.0.dist-info urllib3 urllib3-1.24.1.dist-info
Hooray!
Thanks for the hint about "in any case we already support any pip environment variable you set β besides the user one" - I thought what is have a look on other vars I can override...
I'm happy now.
I mean I understand the steps you are attempting. I am trying to understand why there is a strict constraint on not including incidental dependencies and why you have to fish entry points out.
Historically we had documentation on using --system --deploy
but that doesnβt accomplish the goal of preserving artifacts, you need a shared cache directory for that. Also, we donβt really recommend running pipenv as root, even in docker containers.
I am trying to understand why there is a strict constraint on not including incidental dependencies
We'd like to have our docker images as lean as possible. One of the reasons that we have some systems running over mobile line internet, so when pushing upgrade for 20 microservices, those megabytes start to add up. On alpine, pip + setuptools alone weigh 10MB. Compare it to alpine itself (5MB) + python3 (40MB) and that's over 20% increase. Another reason is security - I want to include only what's really used by my app, to minimize attack surface. It's not just me freaking out. It seems where the whole industry is going. With Go, they compile statically and bundle docker image that even does not have a shell. Distroless is another example.
and why you have to fish entry points out.
My setup.py installs entrypoints that will go to venv's bin
dir. Since that dir contains other stuff that does not belong to my app, I now need to explicitly specify which files to collect from there. When developer adds new entrypoint, he now needs to remember to update Dockerfile - yet another thing to remember. And it also makes it harder to use one generic Dockerfile to build all my Python apps. With "aside" installation, my Dockerfile instruction can just take all of the /pyroot
.
Historically we had documentation on using --system --deploy but that doesnβt accomplish the goal of preserving artifacts, you need a shared cache directory for that.
I'm not sure what you mean. Can you please elaborate? Which artifacts are not preserved exactly?
Also, we donβt really recommend running pipenv as root, even in docker containers.
Even not during build stage?
Regardless, it works well under non-root user as well (inside docker container):
$ mkdir /pyroot; chown appinstall:appinstall /pyroot
$ PIP_USER=1 PYTHONUSERBASE=/pyroot su-exec appinstall:appinstall pipenv install --system --deploy
$ ls -lah /pyroot/
total 16
drwxr-xr-x 4 appinsta appinsta 4.0K Nov 6 02:25 .
drwxr-xr-x 19 root root 4.0K Nov 6 02:23 ..
drwxr-xr-x 2 appinsta appinsta 4.0K Nov 6 02:25 bin
drwxr-xr-x 3 appinsta appinsta 4.0K Nov 6 02:25 lib
where to copy these /bin and /lib dirs, to reach it from an other build stage?
@haizaar Can you publish a working Dockerfile? (without private code, of course)
@derPuntigamer It's all here: https://tech.zarmory.com/2018/09/docker-multi-stage-builds-for-python-app.html (scroll down for pipenv version). Questions are welcome.
@haizaar thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though
@ekhaydarov
thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though
Can you elaborate what you mean? Why pipenv steps for Ubuntu should be any different?
can confirm this works on the non-alpine python:3.7-slim-stretch
base image.
I only needed a simple script on top of the pipenv deps, I added an initial base
step for the env vars, and I didn't need to muck around with symlinks, console scripts, or entrypoints, so I love how simple this ended up, only 2-3 steps per stage:
FROM python:3.7-slim-stretch AS base
ENV PYROOT /pyroot
ENV PYTHONUSERBASE $PYROOT
FROM base AS builder
RUN pip install pipenv
COPY Pipfile* ./
RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy --ignore-pipfile
FROM base
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY myscript.py ./
CMD ["python","myscript.py"]
many thanks for the hard work here @haizaar!
I would suggest also to add $PYROOT/bin
to avoid command not found
issue when running python bin cmd like gunicorn
FROM base
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY --from=builder $PYROOT/bin/ $PYROOT/bin/
ENV PATH="$PYROOT/bin:$PATH"
COPY myscript.py ./
CMD ["python","myscript.py"]
Are there anything unresolved on this topic? It seems to me that everything in recent comments is working.
LGTM.
I'll revert my last comment - PIP_IGNORE_INSTALLED
is broken in the latest release: https://github.com/pypa/pipenv/issues/4453
Since running into #4432 which is possibly related to #4453 I have changed my multistage docker builds to utilize the venv method described here: https://sourcery.ai/blog/python-docker/ This does come with the drawbacks that @haizaar had mentioned of including unneeded additional files in the final image, however it appears to be the most stable approach, with using $PYTHONUSERBASE
causing packages to be missing or not found in the final image.
Most helpful comment
can confirm this works on the non-alpine
python:3.7-slim-stretch
base image.I only needed a simple script on top of the pipenv deps, I added an initial
base
step for the env vars, and I didn't need to muck around with symlinks, console scripts, or entrypoints, so I love how simple this ended up, only 2-3 steps per stage:many thanks for the hard work here @haizaar!