Docker-stacks: images are huge

Created on 9 May 2016  路  13Comments  路  Source: jupyter/docker-stacks

Current images are quite huge, which is especially surprising regarding the minimal images. Are all these packages like editors (jed, vim, emacs), pandoc, tex, etc. really necessary for running the notebook?

Enhancement

Most helpful comment

I think we need to identify the use case for tiny before we can make a decision on what to have in it vs what to drop.

My take: It's the base image for someone looking to get the bare minimum conda and notebook dependencies, not to use it for work, but rather as the basis for his/her own custom Docker image stack where he/she can choose whether to include jed, emacs, vim, none, etc.

All 13 comments

jed, vim, emacs

These are not required, but were oft requested as being necessary for getting work done in the terminal window of the notebook server. (/cc @fperez)

pandoc, tex

These are required by the notebook server for many of the "Download as" menu items to work, otherwise they return 500 errors.

It's been asked in the past if there could an tiny-notebook stack that includes the bare minimum to get the notebook server to start, but doesn't enable all of its features. I bet there could be.

I'm assuming you're looking for a base image for your own stack?

Well, I only do a minimal customisation, but yeah a more modular stack would be useful. The problem with size is transfer time and obviously storage cost.

Mind you I haven't checked which exact packages are bumping the size significantly, these are just the ones that looked suspicious ;-)

As for editors, I usually avoid putting them in base image and instead just run another container with an editor (like the web-based brackets) and --volumes-from=FOO which allows me to edit files in data volumes. This works for me but I suppose it doesn't handle all scenarios (files outside of volumes).

Docker's linear inheritance makes it _super_ bad at doing anything modular. It's possible that we should remove tex from the minimal-notebook image, though it does mean that some notebook functionality (download as .pdf) would stop working. If we do remove it from minimal, we should add it back in ~every descendent image.

Aye, maybe modular wasn't the right word, I meant the split into tiny-notebook and minimal-notebook (or even minimal-notebook (which is really minimal) and full-notebook), like @parente mentioned.

tiny-notebook seems a good idea for the spartan, no-frills, no-extras version. We could add a note that some functionality like download as PDF is not available in this version.

I'm OK dropping those editors as well from the tiny one too. Do we have the package sizes for each? I thought jed was tiny, emacs is obviously much bigger and can probably be dropped from most (jed is a decent emacs stand-in at very small size). vim without too many plugins should also be pretty small, but we should check.

I think we need to identify the use case for tiny before we can make a decision on what to have in it vs what to drop.

My take: It's the base image for someone looking to get the bare minimum conda and notebook dependencies, not to use it for work, but rather as the basis for his/her own custom Docker image stack where he/she can choose whether to include jed, emacs, vim, none, etc.

Just as FYI, we have been looking into having a texlive package at conda-forge. The main use case for doing this was/is nbconvert. So it would be good to get some feedback on what is lite enough in the near future. Maybe this would be a way to cutdown on image size in the long run while retaining functionality.

I think @parente's definition is spot-on, and we can adopt it as the criteria for keeping/dropping packages (size aside). That would point to dropping _all_ editors, tex-related and other similar tools from it, even if it means the terminal isn't as useful or some functionality like pdf export doesn't work.

Using debian relationship terminology, we should keep in tiny only things that the system depends on, with the standard "works for most people as a day-to-day tool" being the equivalent of recommends.

From there, people can build a "bells-and-whistles" one that's the equivalent of recommends, including everything and the kitchen sink. In some sense, I think our try.jupyter.org is a bit like that, meant to demonstrate multiple kernels and lots of functionality.

Also something like scipy has images that are a few hundred megabytes (like https://hub.docker.com/r/termoshtt/scipy/tags/ for instance). Not sure at what level stack TeX becomes useful, or if there's a documented rationale for what packages are included in what stacks except for the self-descriptive minimal stack.

Built and pushed #209 to dockerhub, but the build is bad. Pushed a fix and doing a new build. Will close this when there's a _working_ version on Docker Hub.

Rebuild is working. Tag 8015c88c4b11.

Before creating a new issue, I found this existing (closed) issue about large images, so I tought to first ask here:

  • Is there a chance to optimise the various image sizes at all (potentially eliminating near-duplicate layers)
  • or is it just the combination of the many python modules + the chain of derived images (base->spark->pyspark) ?!
  • update: Approx 1GB of size could be explained from https://github.com/jupyter/docker-stacks/issues/474#issuecomment-333126019

Example: recent pyspark image size = 5.017 GB !

REPOSITORY                                                                    TAG                        IMAGE ID            CREATED             SIZE
jupyter/pyspark-notebook                                                      400c69639ea5               19c3fcaecea4        2 weeks ago         5.017 GB
Was this page helpful?
0 / 5 - 0 ratings

Related issues

maresb picture maresb  路  4Comments

aar0nTw picture aar0nTw  路  4Comments

MridulS picture MridulS  路  4Comments

sgloutnikov picture sgloutnikov  路  4Comments

akhmerov picture akhmerov  路  4Comments