Pipenv: How to use pipenv with multistage docker builds?

Created on 4 Nov 2018  Β·  20Comments  Β·  Source: pypa/pipenv

Good day,

I'm exploring on how to use pipenv with multi-stage docker builds. In the nutshell, the idea is to "compile" stuff in base image and only copy the resulting artifacts to the final image.

With Python is gets tricky, since you need to copy package dependencies as well.

I've checked out several ideas and looks like pip install --user together with setting PYTHONUSERBASE is the simplest ways to install dependencies to a side directory, e.g.:

FROM alpine AS builder
# Install your gcc, python3-dev, etc. here
apk add --no-cache python3
COPY . /src/
WORKDIR /src
ENV PYROOT /pyroot
RUN PYTHONUSERBASE=$PYROOT pip3 install --user -r requirements.txt
RUN PYTHONUSERBASE=$PYROOT pip3 install --user .

# The final image
FROM alpine
apk add --no-cache python3
ENV PYROOT /pyroot
COPY --from=builder $PYROOT/lib/ $PYROOT/lib/

(The full story)

The problem is that pipenv disregards PYTHONUSERBASE:

$ docker run --rm -ti python:3.6-alpine sh
/ # pip install --upgrade pip; pip install pipenv==2018.10.13  # skipped output
/ # mkdir /tmp/foo; cd /tmp/foo
/tmp/foo # pipenv install requests  # skipped output
/tmp/foo # PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰ 5/5 β€” 00:00:01
/tmp/foo # ls /pyroot
ls: /pyroot: No such file or directory

I found a workaround by using pipenv lock -r and then installing requirements.txt as in my original idea, but I'm not sure this is the best way to go, particularly if I have custom (private) sources defined in my Pipefile - I don't want to replicate their configuration into pip.

Any other ideas?

Docker Type

Most helpful comment

can confirm this works on the non-alpine python:3.7-slim-stretch base image.

I only needed a simple script on top of the pipenv deps, I added an initial base step for the env vars, and I didn't need to muck around with symlinks, console scripts, or entrypoints, so I love how simple this ended up, only 2-3 steps per stage:

FROM python:3.7-slim-stretch AS base

ENV PYROOT /pyroot
ENV PYTHONUSERBASE $PYROOT


FROM base AS builder

RUN pip install pipenv

COPY Pipfile* ./

RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy --ignore-pipfile


FROM base

COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY myscript.py ./

CMD ["python","myscript.py"]

many thanks for the hard work here @haizaar!

All 20 comments

Pipenv unsets user settings because they are incompatible with virtualenv settings. I understand the approach you’re taking, but maybe you can say more about what problem you are trying to solve? You want the wheels and sdists or whatever? If so, on Linux they’re stored in ~/.cache/pipenv/wheels

Apologies if the original intent was not clear enough.

All I want is to install my package and its dependencies into a separate directory instead of python's site-packages; and then to copy that directory into the final docker image.

Since --system flag does not use virtualenv (it seems) may be it worth supporting PIP* env vars in that special case?

I'll try to elaborate on the problem: I want to keep my final, "production" docker image as small as possible. Therefore I need to build/install my app dependencies in earlier stage.
Having dependencies installed into general site-packages (either system or of a dedicate venv) is problematic, since I can't reliably pick the actual packages my app depends on later on.
Wheels do no help much either because: a) I don't know which wheels to pick up from cache (some may belong to my app reqs, and some may belong to, e.g. a code generation tool my app uses during install); and b) I need to copy wheels to the final docker image prior to installing them there, meaning they will endup in the resulting docker image.

@techalchemy My comment appeared before yours for some reason (github still recovers?). Just making sure you got notification.

Ah okay, you may be able to just use PIPENV_VENV_IN_PROJECT=1 and copy the local .venv directory’s site packages

While it's nice to know about this option, I don't see show does it help me. Site packages from vanilla venv weigh 12MB (because of pip and setuptools)

What I'm trying to do is to run

pipenv install --dev
pipenv install
pipenv install .    # May require stuff installed by --dev

and easily collect results of only the last two lines. So another venv does not help me. I need to "split" the installation paths within the venv.

Do you think PEEP that suggest obeying PIP* options if --system flag is used gonna fly?

Why not just specify your library dependencies in your package metadata and just add uninstall steps for pip/setuptools via pipenv run pip uninstall setuptools in any case we already support any pip environment variable you set β€” besides the user one. I’m struggling to understand what you are gaining by doing all this extra work

OK, I'll try once again to explain. Bear with me please.

Here is what would've wanted to do (in dockerfile):

# Install all stuff in our system so we can install setup.py for our app later on
RUN pipenv install --deploy --system --dev
# Install our app LOCKED dependencies aside
RUN PYTHONUSERBASE=/pyroot pipenv install --deploy --system
# Install the app aside as well - all dependencies already present, so it's just the app
RUN PYTHONUSERBASE=/pyroot pip install --user .

# Take pyroot to the next docker image stage

This look very clear and "human" (as in pipenv's motto) to me :)

Other solutions that require fiddling with virtualenv look error-prone IMHO. I've tried PIPENV_VENV_IN_PROJECT=1 approach + uninstall and there are couple of issues:

Uninstall does not remove dependencies
(Skipped pipenv output for brevity)

$ export PIPENV_VENV_IN_PROJECT=1
$ pipenv run pip freeze  # empty new vanilla
$ pipenv install --dev requests
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
requests==2.20.0
urllib3==1.24.1
$ pipenv uninstall --all-dev
$ pipenv run pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
urllib3==1.24.1

I.e. requests package has gone, but its dependencies are left behind.

Venv bin directory contains other files
And I need to fish out my app's entrypoints from there.
image
Again, it's doable, we are in SW after all, but I think it should be easier.

Whatever we conclude in this discussion, I think there should be PEEP/doc explaining the recommended way to use pipenv with docker multi-stage.
After all, pipenv install --system --deploy looks very good and almost nails it.

FOUND IT!!!

Here is what works: PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy

In action:

/tmp/foo $ grep -A 2 packages Pipfile
[packages]
requests = "*"

[dev-packages]

[requires]
/tmp/foo $ PIP_USER=1 PYTHONUSERBASE=/pyroot pipenv install --system --deploy
Installing dependencies from Pipfile.lock (b14837)…
  🐍   β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰β–‰ 5/5 β€” 00:00:02
/tmp/foo $ ls /pyroot/bin/
chardetect
/tmp/foo $ ls /pyroot/lib/python3.6/site-packages/
chardet                    chardet-3.0.4.dist-info    idna                       idna-2.7.dist-info         requests                   requests-2.20.0.dist-info  urllib3                    urllib3-1.24.1.dist-info

Hooray!
Thanks for the hint about "in any case we already support any pip environment variable you set β€” besides the user one" - I thought what is have a look on other vars I can override...

I'm happy now.

I mean I understand the steps you are attempting. I am trying to understand why there is a strict constraint on not including incidental dependencies and why you have to fish entry points out.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that. Also, we don’t really recommend running pipenv as root, even in docker containers.

I am trying to understand why there is a strict constraint on not including incidental dependencies

We'd like to have our docker images as lean as possible. One of the reasons that we have some systems running over mobile line internet, so when pushing upgrade for 20 microservices, those megabytes start to add up. On alpine, pip + setuptools alone weigh 10MB. Compare it to alpine itself (5MB) + python3 (40MB) and that's over 20% increase. Another reason is security - I want to include only what's really used by my app, to minimize attack surface. It's not just me freaking out. It seems where the whole industry is going. With Go, they compile statically and bundle docker image that even does not have a shell. Distroless is another example.

and why you have to fish entry points out.

My setup.py installs entrypoints that will go to venv's bin dir. Since that dir contains other stuff that does not belong to my app, I now need to explicitly specify which files to collect from there. When developer adds new entrypoint, he now needs to remember to update Dockerfile - yet another thing to remember. And it also makes it harder to use one generic Dockerfile to build all my Python apps. With "aside" installation, my Dockerfile instruction can just take all of the /pyroot.

Historically we had documentation on using --system --deploy but that doesn’t accomplish the goal of preserving artifacts, you need a shared cache directory for that.

I'm not sure what you mean. Can you please elaborate? Which artifacts are not preserved exactly?

Also, we don’t really recommend running pipenv as root, even in docker containers.

Even not during build stage?
Regardless, it works well under non-root user as well (inside docker container):

$ mkdir /pyroot; chown appinstall:appinstall /pyroot
$ PIP_USER=1 PYTHONUSERBASE=/pyroot su-exec appinstall:appinstall pipenv install --system --deploy
$ ls -lah /pyroot/
total 16
drwxr-xr-x    4 appinsta appinsta    4.0K Nov  6 02:25 .
drwxr-xr-x   19 root     root        4.0K Nov  6 02:23 ..
drwxr-xr-x    2 appinsta appinsta    4.0K Nov  6 02:25 bin
drwxr-xr-x    3 appinsta appinsta    4.0K Nov  6 02:25 lib

where to copy these /bin and /lib dirs, to reach it from an other build stage?

@haizaar Can you publish a working Dockerfile? (without private code, of course)

@derPuntigamer It's all here: https://tech.zarmory.com/2018/09/docker-multi-stage-builds-for-python-app.html (scroll down for pipenv version). Questions are welcome.

@haizaar thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though

@ekhaydarov

thats only for alpine it seems. will try work it out for deb/ubuntu, the pipenv user info should be enough to go on though

Can you elaborate what you mean? Why pipenv steps for Ubuntu should be any different?

can confirm this works on the non-alpine python:3.7-slim-stretch base image.

I only needed a simple script on top of the pipenv deps, I added an initial base step for the env vars, and I didn't need to muck around with symlinks, console scripts, or entrypoints, so I love how simple this ended up, only 2-3 steps per stage:

FROM python:3.7-slim-stretch AS base

ENV PYROOT /pyroot
ENV PYTHONUSERBASE $PYROOT


FROM base AS builder

RUN pip install pipenv

COPY Pipfile* ./

RUN PIP_USER=1 PIP_IGNORE_INSTALLED=1 pipenv install --system --deploy --ignore-pipfile


FROM base

COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY myscript.py ./

CMD ["python","myscript.py"]

many thanks for the hard work here @haizaar!

I would suggest also to add $PYROOT/bin to avoid command not found issue when running python bin cmd like gunicorn

FROM base

COPY --from=builder $PYROOT/lib/ $PYROOT/lib/
COPY --from=builder $PYROOT/bin/ $PYROOT/bin/

ENV PATH="$PYROOT/bin:$PATH"

COPY myscript.py ./

CMD ["python","myscript.py"]

Are there anything unresolved on this topic? It seems to me that everything in recent comments is working.

LGTM.

I'll revert my last comment - PIP_IGNORE_INSTALLED is broken in the latest release: https://github.com/pypa/pipenv/issues/4453

Since running into #4432 which is possibly related to #4453 I have changed my multistage docker builds to utilize the venv method described here: https://sourcery.ai/blog/python-docker/ This does come with the drawbacks that @haizaar had mentioned of including unneeded additional files in the final image, however it appears to be the most stable approach, with using $PYTHONUSERBASE causing packages to be missing or not found in the final image.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jacek-jablonski picture jacek-jablonski  Β·  3Comments

leileigong picture leileigong  Β·  3Comments

johnjiang picture johnjiang  Β·  3Comments

marc-fez picture marc-fez  Β·  3Comments

jeyraof picture jeyraof  Β·  3Comments