What docker image you are using?
jupyter/datascience-notebook
What do you expect to happen?
I would be nice to have a way to know which Docker image is using which Python version.
Here is a use-case: Current image is using Python 3.7.1. I'm using PyArrow from pyarrow-0.9.0-cp36-cp36m-manylinux1_x86_64.whl. So, I would need Python 3.6. How can I tell which Docker image is using which Python version?
Understanding what's in each build is a common request. At the moment, the image tags are the prefix of the git SHA that triggered their build. This provides a link from binary to source, but lacks the exact manifest of packages and versions installed.
The Docker Cloud logs capture everything installed. I'm not sure if they're publicly available (https://cloud.docker.com/u/jupyter/repository/registry-1.docker.io/jupyter/base-notebook/builds/f4a7f579-5af0-4bbb-a587-79721afdd0ed) and they're not very convenient.
We talked about extending the image tag to capture key package version numbers in the past. I think of that as a partial solution since what counts as "key" varies from person to person. I don't think we can encode the version numbers of every package into the docker image tag.
We used to have a post-build hook that would update a wiki page here on GitHub with links to both the build and merge commit that triggered it. Maybe we can take that idea a step further by having a post-build processing step that scrapes the logs or runs commands (conda list, apt-get, etc.) to get a list of packages that are posted / committed back here and indexed by the image tag / git SHA?
I see. Makes sense. The Docker Cloud link did not work for me.
Having a page that lists the version info of the packages in the image would be a good start. Then we can turn that around and have a page with various versions of a package and the image that contains them.
Still, the image tag could contain the version numbers of the core languages, like Python, R, and Scala, since they will dictate what versions of other packages might be available.
For tagging language versions in an image like jupyter/all-spark-notebook, do you envision it would look something like:
jupyter/all-spark-notebook:7db1bd2a7511-py37-r31-sc211?
I think for building, what I'll call here, package manifests, we could do something like:
post_build hook, run apt list --installed, conda list, julia -E 'import Pkg; Pkg.status()', and R --silent -e 'installed.packages(.Library)[, c(1,3)]'.The "magic happens here" step is #2. We need a place to ship the package lists that does some form of authentication yet doesn't require use to expose credentials as part of the Docker Cloud build process.
Yes, for the language version tagging format. If we do the _package manifest_ as well, then the Git SHA could be skipped.
If we do the package manifest as well, then the Git SHA could be skipped.
We still need something unique to identify one build from the next in the name of immutable builds that people want to pin.
First example of a manifest: https://github.com/jupyter/docker-stacks/wiki/base-notebook-eb149a8c333a
Manifests for the other images should start showing up soon, assuming I don't have a bug.
The entire setup is still a work-in-progress. PR #838 has a note about the current status.
TODOs based on results:
Also, would it make sense to add a LABEL suggesting the git-version of the build, so users can continue using the latest build but if something is not compatible, they can easily find out the git version of the working docker image.
@rahulpshah I'm not sure how to inject that into the Docker image at build time on DockerHub. If you've done that before or would like to figure out how, a PR would be welcome.
I think a maintained document with the versions would be all that is required (even be a json or bash variable manifest which gets referenced). That way whenever there is a less complicated version bump, only one file gets a change. It would also then be a lot easier to find the tag (although scouring the source control should not be a requirement).
Since docker images and git commits can be referenced by multiple tags, a git tag like datascience-1.0.1 and docker tag like 1.0.1 could be applied every time the version manifest changes.
A dev then could look in one place at the version manifest and the tags would be clearly visible. Creating some documentation somewhere central would be trivial.
I am happy to help but I guess this might not be high on your priority list and something like this would require some buy in from a lot of people.
The nice thing is that the current system can remain in place. Extra tags wouldn't break anything for anyone downstream.
At least adding special tags for programming language changes would be ideal. For example, move the tag python3.6.6 to whichever the latest notebook docker image is with that version of python.
Just some ideas from a downstream user perspective. The system is usable now that I know how it works though. With a bit of digging I was able to find the appropriate image.
@dnk8n I'm not opposed to extra tags as long as they can be automatically assigned and updated. I'd be happy to review a PR from you or anyone extending the make and/or Docker Hub hook automation to enable it.
After the PR above, jupyter/base-notebook now receives tags like the following:
jupyter/base-notebook:ff9357a77d78
jupyter/base-notebook:python-3.7.3
jupyter/base-notebook:notebook-5.7.8
jupyter/base-notebook:lab-0.35.5
jupyter/base-notebook:hub-1.0.0
All but the git SHA tag will be overwritten on the next build+push to Docker Hub unless the version of Python, Jupyter Notebook, etc. change in the image. I'll document this fact on the wiki page tracking builds to avoid confusion about old build entries also showing tags like python-3.7.3.
I plan to apply the above tags to the other images, plus additional tags appropriate to each image (e.g., Julia and R versions in datascience-notebook).
May I know if there is any update for the python version tag?
For example, if i wanna use python3.5 instead, I could just change a variable in environment?
Thank you for this functionality, just one suggestion.
A reference from the main readme, docker hub or readthedocs to the wiki page tracking builds would be very helpful in my opinion.
Speaking from personal experience, only after seeing this issue I've understood this functionality was available(after several time spent searching). The github wiki is not the default place I look for in this kind of information.
@ABCurado thanks good suggestion, I've just created a dedicated issue to implement your idea.
If I'm right, my PR wil fix this issue as well, because tagging will be consistent and we will add tags for the things like python version for all the images.
Most helpful comment
Understanding what's in each build is a common request. At the moment, the image tags are the prefix of the git SHA that triggered their build. This provides a link from binary to source, but lacks the exact manifest of packages and versions installed.
The Docker Cloud logs capture everything installed. I'm not sure if they're publicly available (https://cloud.docker.com/u/jupyter/repository/registry-1.docker.io/jupyter/base-notebook/builds/f4a7f579-5af0-4bbb-a587-79721afdd0ed) and they're not very convenient.
We talked about extending the image tag to capture key package version numbers in the past. I think of that as a partial solution since what counts as "key" varies from person to person. I don't think we can encode the version numbers of every package into the docker image tag.
We used to have a post-build hook that would update a wiki page here on GitHub with links to both the build and merge commit that triggered it. Maybe we can take that idea a step further by having a post-build processing step that scrapes the logs or runs commands (conda list, apt-get, etc.) to get a list of packages that are posted / committed back here and indexed by the image tag / git SHA?