Docker-stacks: [Discussion] How to support additional stacks?

Created on 17 Dec 2017  路  5Comments  路  Source: jupyter/docker-stacks

We鈥檝e been collecting PRs that add new image definitions to this project (see: #512, #486 #444). On one hand, I'd love to showcase the diversity of languages supported by Jupyter. On the other, I鈥檓 not sure we have the maintainer bandwidth for PHP, Ruby, Cling, etc. How might we proceed? Some ideas:

  • Create a Jupyter "docker-stacks starter kit" project that people can use to set up their own custom image build process based on what we use here. Have a central place to link to such community images.
  • Go ahead and accept additional image PRs here, but ask the contributors if they're willing to help maintain the images. Put maintainer info in each stack README.
  • Go ahead and accept additional image PRs here, noting which we're most adept at maintaining and which are maintained by best effort.
  • Create separate repos under github.com/jupyter for each image. Maintain each project as best we can.
Question

Most helpful comment

I like option one the most. Building a generation tool and a wiki-style index of community images seems a lot more sustainable than growing this repo forever. Part of what's great about docker hub, etc. is that there's no reason for all of these images to be in one place, as long as they are discoverable. The only reason to do that is to consolidate maintenance effort, and we are already at (or past) our limit with the images we have, so we can't very responsibly add more. At least not without a commitment from each PR author that they will support their proposed image indefinitely.

Maybe we can learn from docker official repositories for how to draw these lines.

All 5 comments

I like option one the most. Building a generation tool and a wiki-style index of community images seems a lot more sustainable than growing this repo forever. Part of what's great about docker hub, etc. is that there's no reason for all of these images to be in one place, as long as they are discoverable. The only reason to do that is to consolidate maintenance effort, and we are already at (or past) our limit with the images we have, so we can't very responsibly add more. At least not without a commitment from each PR author that they will support their proposed image indefinitely.

Maybe we can learn from docker official repositories for how to draw these lines.

I'm in favor of the first option as well. I'm thinking that a ReadTheDocs site for this repo could serve as place to post a list of community maintained images, not to mention provide motivation to finally consolidate the duplicate text spread across the READMEs.

Frankly Eight containers seems like overkill.
The main advantage of this effort is complex integrations are done 'once'.

Maybe the maint. burden can be eased and value increased at the same time.
A defensible approach could be to delineate three/four containers:

Base: In memory work loads, single machine (but still multiple container use cases)
Middle: Mix of big data and in memory work. Several machines (<10).
Upper: Big Data Cloud environment use cases. Multiple locations many multiples of machines
Blue-sky: Experimental components and configurations

To help determine what goes in it helps to have additional defined (arbitrary) constraints.
Some constraints could be:

  • Container image sizes (compressed):

    • Base <1GB, Middle <2.5 Upper <3.5 GB, Blue-sky unlimited

  • Runtime memory use (at startup or for hello world type apps):

    • Base < 4 GB, Middle < 8, Upper <16 GB, Blue-sky unlimited

Another approach to consider is:
Identify 'official/lead containers' that run a component and make the jupyter container compatible out of the box. Maybe scripted to play nice with the component?

Anyway you view it, IMO, eight containers seems over kill.

Frankly Eight containers seems like overkill.

The docker hub pull counts agree with this sentiment. A very simple action we can take is to deprecate the r-notebook and pyspark-notebook images. They've been pulled 3-4x fewer times than the next least frequently pulled image (base-notebook) and 20-30x fewer times than the most pulled image (datascience-notebook, w/ Python, Julia, and R)

https://gist.github.com/parente/316d5c242aeb484484c8

We're well down the path of encouraging recipes and community stacks. repo2docker has grown in popularity as a way to build arbitrary images. There's the potential for work to be done here to reduce the number of images we maintain (#693) and to make the Spark images more cluster agnostic (#626). I think we can close this issue and track those efforts separately.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

statiksof picture statiksof  路  4Comments

jp68138743541 picture jp68138743541  路  4Comments

maresb picture maresb  路  4Comments

akhmerov picture akhmerov  路  4Comments

codingbutstillalive picture codingbutstillalive  路  3Comments