In the current iteration the Docker Environment requires users to only provide a base image, to which is installs dependencies on. This isn't a solid approach going forward and should accept a completely custom user Dockerfile with a default of our specified Dockerfile.
Default: I propose python:3.6 which installs our dependencies, similar to what we have now. @cicdw thoughts?
Yea, we might want to allow users to choose their python version, but otherwise +1 on that base image.
I still think there's lots of room for an implementation similar to what we have today, where users can provide pip/conda installable dependencies and they're inserted into the default dockerfile. I just think that we should in addition allow fully custom dockerfiles (as long as Prefect is properly set up, which users take responsibility for).
This is similar to how dask deployments have a "PIP_EXTRAS" environment variable (might be forgetting the exact name); I'm guessing most of the time a base image + dependencies will satisfy requirements.
I think we should get rid of the automatic copying of files honestly, but definitely agree with having a flexible base image + dependencies, as long as there's an equally obvious / easy way to provide a fully custom docker file.
I endorse that -- copying files is too tricky to do in a general way.
So here's a thought:
DockerEnvironment -> accepts fully custom DockerFile
PrefectDockerEnvironment -> subclasses DockerEnvironment and basically builds a simple dockerfile for you from a base + dependencies + your flow.
Open to any name changes...
@jlowin will there be an ID mismatch if a user:
LocalEnvironmentflow.deploy()^ The answer is strictly yes (because that's true any time you build a flow and overwrite the environment to point at another flow), but we'd avoid that workflow.
Users wouldn't build an image outside Prefect, they'd provide the Dockerfile to Prefect and be responsible for ensuring that it complied with Prefect's expectations; namely that it copied a serialized flow (that Prefect provides) into the container. Basically, that just means ensuring the following lines are in the Dockerfile (at a minimum), which we could simply explain through good documentation:
ENV PREFECT_ENVIRONMENT_FILE="/root/.prefect/flow_env.prefect"
COPY flow_env.prefect $PREFECT_ENVIRONMENT_FILE
As long as those lines are present, then flow.serialize() will work as expected.
That is true, but for various reasons related to how Docker interprets the FROM base_image command, I don't believe that we'll be able to execute a health check against the flow inside the docker container (at least this was my experience yesterday - the healthcheck failed unless all dependencies were created in the _same_ Dockerfile). So while we might gain a simpler API for providing fully custom Dockerfiles, we lose an important diagnostic tool for detecting deployment bugs.
Feels to me like that's a problem with some detail of our implementation; Docker images are just layered, and the FROM instruction simply says "start with the following layers". I don't think there could be a way for an image to be aware of whether an instruction was run before or after FROM (apart from implementation details).
However, easily solved by recommending people add the healthcheck COPY to the minimal example in my previous reply, or by spinning up the container and executing the healthcheck script directly from the container's shell.
I would like to revisit this soon. NB I merged in #1052 which allows for more customizability by enabling the use of local images. This gives a new workflow of:
There is another workflow that I would still like to support which is:
Providing the contents as a string is desirable and something I wish I currently had. I think we could get by with using BytesIO:
from io import BytesIO
from docker import APIClient
dockerfile = '''
# Shared Volume
FROM busybox:buildroot-2014.02
VOLUME /data
CMD ["/bin/sh"]
'''
f = BytesIO(dockerfile.encode('utf-8'))
cli = APIClient(base_url='tcp://127.0.0.1:2375')
response = [line for line in cli.build(
fileobj=f, rm=True, tag='yourname/volume'
)]
Closed with #1740
Most helpful comment
I still think there's lots of room for an implementation similar to what we have today, where users can provide pip/conda installable dependencies and they're inserted into the default dockerfile. I just think that we should in addition allow fully custom dockerfiles (as long as Prefect is properly set up, which users take responsibility for).
This is similar to how dask deployments have a "PIP_EXTRAS" environment variable (might be forgetting the exact name); I'm guessing most of the time a base image + dependencies will satisfy requirements.