We鈥檝e been collecting PRs that add new image definitions to this project (see: #512, #486 #444). On one hand, I'd love to showcase the diversity of languages supported by Jupyter. On the other, I鈥檓 not sure we have the maintainer bandwidth for PHP, Ruby, Cling, etc. How might we proceed? Some ideas:
I like option one the most. Building a generation tool and a wiki-style index of community images seems a lot more sustainable than growing this repo forever. Part of what's great about docker hub, etc. is that there's no reason for all of these images to be in one place, as long as they are discoverable. The only reason to do that is to consolidate maintenance effort, and we are already at (or past) our limit with the images we have, so we can't very responsibly add more. At least not without a commitment from each PR author that they will support their proposed image indefinitely.
Maybe we can learn from docker official repositories for how to draw these lines.
I'm in favor of the first option as well. I'm thinking that a ReadTheDocs site for this repo could serve as place to post a list of community maintained images, not to mention provide motivation to finally consolidate the duplicate text spread across the READMEs.
Frankly Eight containers seems like overkill.
The main advantage of this effort is complex integrations are done 'once'.
Maybe the maint. burden can be eased and value increased at the same time.
A defensible approach could be to delineate three/four containers:
Base: In memory work loads, single machine (but still multiple container use cases)
Middle: Mix of big data and in memory work. Several machines (<10).
Upper: Big Data Cloud environment use cases. Multiple locations many multiples of machines
Blue-sky: Experimental components and configurations
To help determine what goes in it helps to have additional defined (arbitrary) constraints.
Some constraints could be:
Another approach to consider is:
Identify 'official/lead containers' that run a component and make the jupyter container compatible out of the box. Maybe scripted to play nice with the component?
Anyway you view it, IMO, eight containers seems over kill.
Frankly Eight containers seems like overkill.
The docker hub pull counts agree with this sentiment. A very simple action we can take is to deprecate the r-notebook and pyspark-notebook images. They've been pulled 3-4x fewer times than the next least frequently pulled image (base-notebook) and 20-30x fewer times than the most pulled image (datascience-notebook, w/ Python, Julia, and R)
We're well down the path of encouraging recipes and community stacks. repo2docker has grown in popularity as a way to build arbitrary images. There's the potential for work to be done here to reduce the number of images we maintain (#693) and to make the Spark images more cluster agnostic (#626). I think we can close this issue and track those efforts separately.
Most helpful comment
I like option one the most. Building a generation tool and a wiki-style index of community images seems a lot more sustainable than growing this repo forever. Part of what's great about docker hub, etc. is that there's no reason for all of these images to be in one place, as long as they are discoverable. The only reason to do that is to consolidate maintenance effort, and we are already at (or past) our limit with the images we have, so we can't very responsibly add more. At least not without a commitment from each PR author that they will support their proposed image indefinitely.
Maybe we can learn from docker official repositories for how to draw these lines.