Docker-stacks: An easy way to update packages versions

Created on 25 Aug 2020 · 12Comments · Source: jupyter/docker-stacks

I see a lot of commits just updating several packages.

It would be great to have a tool, which automatically finds all the versions which are old (with some blacklist maybe) and tries to update these versions and if tests pass, it will automatically commit the changes.

So we will always have latest (and greatest packages), and reduce a lot of manual work.

Maintenance

Source

mathbunnyru

👍3

Most helpful comment

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

trallard on 13 Jan 2021

👍2

All 12 comments

One implementation thought: We could set up a GitHub Action workflow to run that tool on some cadence and open / close PRs based on test status.

parente on 26 Aug 2020

Good idea, note that this tool already exist and can be used at least as a base. It is described in the contributor documentation.

$ make check-outdated/base-notebook

# INFO     test_outdated:test_outdated.py:80 3/8 (38%) packages could be updated
# INFO     test_outdated:test_outdated.py:82
# Package     Current    Newest
# ----------  ---------  --------
# conda       4.7.12     4.8.2
# jupyterlab  1.2.5      2.0.0
# python      3.7.4      3.8.2

romainx on 26 Aug 2020

I think one of the best way to do it would be to use a tool like dependabot. However it does not support conda, and its Docker support will not help on our use cases.

romainx on 20 Oct 2020

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

trallard on 13 Jan 2021

👍2

I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).

I would be happy to give this a go

I think no one has been working on this issue for the past few months, so I say, give this a go, if you want to, it would be awesome :)

mathbunnyru on 13 Jan 2021

👍1

Great - will get working on this and create a draft PR as soon as possible

trallard on 14 Jan 2021

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

maresb on 23 Apr 2021

This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.

I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

That's not a silly question at all.

I see several positive things in pinning versions:

Reproducibility. If we don't fix versions and build the same code at different times, it will give different results. It's not something that I expect (I have some background in C++). This is really important.
It's quite an easy strategy to rebuild the images - they are rebuilt if someone pushes an update. If we do not fix version, when do we build the images? (should it be every day or should we track dependencies? )
I had troubles when not fixing versions with the dependency resolution. It was happening a long time ago, I hope it's better now with conda.
People see which versions we're using and they can decide if they want to use the image or not.
Let's assume the situation when you try to change datascience-notebook and you haven't changed scipy-notebook at all. And, something has broken in the dependencies of scipy-notebook. And now, instead of dealing just with datascience-notebook, you have to change the code you didn't touch.
When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.

But we can't say this to our users, right? So sometimes we will have to fix some versions.

mathbunnyru on 23 Apr 2021

@mathbunnyru, I used to think similarly, but my perspective has changed.

I think reproducibility is ultimately the responsibility of the end user, and that is easily achieved by pinning a Docker build number. Moreover, the current practice of pinning major/minor version numbers doesn't provide exact reproducibility. For that you'd need not only the patch number but also the conda-forge build number.

For exact reproducibility, I add the following command in my Dockerfile: conda env export > $CONDA_DIR/environment.yaml. From there, it's easy to generate a build artifact with (docker run --rm image-name cat /opt/conda/environment.yaml) > environment.yaml. I don't have any good ideas for how to publish it though... naively committing it would trigger an infinite loop in CI.

I do agree with your 5. While I think it's a fact of life that upstream dependencies will change and break things, I can see how pinning makes things more tame.

I'm not suggesting never to fix versions, just that fixing versions is overrated, and that environment.yaml may be a better way to guarantee reproducibility.

maresb on 23 Apr 2021

Thanks for your ideas @maresb.

It would be great to hear from @parente and @romainx

mathbunnyru on 24 Apr 2021

The major.minor version pinning approach used here originated in the early days of conda-forge when it was extremely difficult to get a working build with the number of packages in these images. I think it's reasonable to experiment with an unpinned strategy today as long as users are informed about the change, there is a manifest of what actually got installed during a build (there is on the wiki), and active maintainers are ok with troubleshooting a potential decrease in build stability.

parente on 24 Apr 2021

In fact if we not only to do it for conda dependencies but also for other parts of the stack like Ubuntu upstream image.

We should also change the build policy to switch to some kind of regular build (daily, weekly) vs build after a merge on the master branch.
The drawback is that the time that will not be spend in updating the images will certainly have to be spent in fixing the builds.
But I'm also Ok to give it a try 👍
Having everything correctly built at the first time will be a good indicator 😄

romainx on 26 Apr 2021

Was this page helpful?

0 / 5 - 0 ratings