Docker-stacks: images are huge

Created on 9 May 2016 · 13Comments · Source: jupyter/docker-stacks

Current images are quite huge, which is especially surprising regarding the minimal images. Are all these packages like editors (jed, vim, emacs), pandoc, tex, etc. really necessary for running the notebook?

Enhancement

Source

gimoh

👍1

Most helpful comment

I think we need to identify the use case for tiny before we can make a decision on what to have in it vs what to drop.

My take: It's the base image for someone looking to get the bare minimum conda and notebook dependencies, not to use it for work, but rather as the basis for his/her own custom Docker image stack where he/she can choose whether to include jed, emacs, vim, none, etc.

parente on 13 May 2016

👍5

All 13 comments

jed, vim, emacs

These are not required, but were oft requested as being necessary for getting work done in the terminal window of the notebook server. (/cc @fperez)

pandoc, tex

These are required by the notebook server for many of the "Download as" menu items to work, otherwise they return 500 errors.

It's been asked in the past if there could an tiny-notebook stack that includes the bare minimum to get the notebook server to start, but doesn't enable all of its features. I bet there could be.

I'm assuming you're looking for a base image for your own stack?

parente on 9 May 2016

Well, I only do a minimal customisation, but yeah a more modular stack would be useful. The problem with size is transfer time and obviously storage cost.

Mind you I haven't checked which exact packages are bumping the size significantly, these are just the ones that looked suspicious ;-)

As for editors, I usually avoid putting them in base image and instead just run another container with an editor (like the web-based brackets) and --volumes-from=FOO which allows me to edit files in data volumes. This works for me but I suppose it doesn't handle all scenarios (files outside of volumes).

gimoh on 13 May 2016

Docker's linear inheritance makes it _super_ bad at doing anything modular. It's possible that we should remove tex from the minimal-notebook image, though it does mean that some notebook functionality (download as .pdf) would stop working. If we do remove it from minimal, we should add it back in ~every descendent image.

minrk on 13 May 2016

Aye, maybe modular wasn't the right word, I meant the split into tiny-notebook and minimal-notebook (or even minimal-notebook (which is really minimal) and full-notebook), like @parente mentioned.

gimoh on 13 May 2016

👍1

tiny-notebook seems a good idea for the spartan, no-frills, no-extras version. We could add a note that some functionality like download as PDF is not available in this version.

willingc on 13 May 2016

I'm OK dropping those editors as well from the tiny one too. Do we have the package sizes for each? I thought jed was tiny, emacs is obviously much bigger and can probably be dropped from most (jed is a decent emacs stand-in at very small size). vim without too many plugins should also be pretty small, but we should check.

fperez on 13 May 2016

I think we need to identify the use case for tiny before we can make a decision on what to have in it vs what to drop.

parente on 13 May 2016

👍5

Just as FYI, we have been looking into having a texlive package at conda-forge. The main use case for doing this was/is nbconvert. So it would be good to get some feedback on what is lite enough in the near future. Maybe this would be a way to cutdown on image size in the long run while retaining functionality.

jakirkham on 13 May 2016

❤1 👍1

I think @parente's definition is spot-on, and we can adopt it as the criteria for keeping/dropping packages (size aside). That would point to dropping _all_ editors, tex-related and other similar tools from it, even if it means the terminal isn't as useful or some functionality like pdf export doesn't work.

Using debian relationship terminology, we should keep in tiny only things that the system depends on, with the standard "works for most people as a day-to-day tool" being the equivalent of recommends.

From there, people can build a "bells-and-whistles" one that's the equivalent of recommends, including everything and the kitchen sink. In some sense, I think our try.jupyter.org is a bit like that, meant to demonstrate multiple kernels and lots of functionality.

fperez on 13 May 2016

👍1

Also something like scipy has images that are a few hundred megabytes (like https://hub.docker.com/r/termoshtt/scipy/tags/ for instance). Not sure at what level stack TeX becomes useful, or if there's a documented rationale for what packages are included in what stacks except for the self-descriptive minimal stack.

lsb on 17 May 2016

Built and pushed #209 to dockerhub, but the build is bad. Pushed a fix and doing a new build. Will close this when there's a _working_ version on Docker Hub.

parente on 27 May 2016

Rebuild is working. Tag 8015c88c4b11.

parente on 29 May 2016

❤2

Before creating a new issue, I found this existing (closed) issue about large images, so I tought to first ask here:

Is there a chance to optimise the various image sizes at all (potentially eliminating near-duplicate layers)
or is it just the combination of the many python modules + the chain of derived images (base->spark->pyspark) ?!
update: Approx 1GB of size could be explained from https://github.com/jupyter/docker-stacks/issues/474#issuecomment-333126019

Example: recent pyspark image size = 5.017 GB !

REPOSITORY                                                                    TAG                        IMAGE ID            CREATED             SIZE
jupyter/pyspark-notebook                                                      400c69639ea5               19c3fcaecea4        2 weeks ago         5.017 GB