I see a lot of commits just updating several packages.
It would be great to have a tool, which automatically finds all the versions which are old (with some blacklist maybe) and tries to update these versions and if tests pass, it will automatically commit the changes.
So we will always have latest (and greatest packages), and reduce a lot of manual work.
One implementation thought: We could set up a GitHub Action workflow to run that tool on some cadence and open / close PRs based on test status.
Good idea, note that this tool already exist and can be used at least as a base. It is described in the contributor documentation.
$ make check-outdated/base-notebook
# INFO test_outdated:test_outdated.py:80 3/8 (38%) packages could be updated
# INFO test_outdated:test_outdated.py:82
# Package Current Newest
# ---------- --------- --------
# conda 4.7.12 4.8.2
# jupyterlab 1.2.5 2.0.0
# python 3.7.4 3.8.2
I think one of the best way to do it would be to use a tool like dependabot. However it does not support conda, and its Docker support will not help on our use cases.
I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).
I would be happy to give this a go
I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).
I would be happy to give this a go
I think no one has been working on this issue for the past few months, so I say, give this a go, if you want to, it would be awesome :)
Great - will get working on this and create a draft PR as soon as possible
This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.
I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.
This is perhaps a silly question, but why pin the dependencies in the first place? Especially for scipy-notebook.
I've been gaining more experience with conda-forge, and it seems like a good strategy is to leave most things unpinned. When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.
That's not a silly question at all.
I see several positive things in pinning versions:
Reproducibility. If we don't fix versions and build the same code at different times, it will give different results. It's not something that I expect (I have some background in C++). This is really important.
It's quite an easy strategy to rebuild the images - they are rebuilt if someone pushes an update. If we do not fix version, when do we build the images? (should it be every day or should we track dependencies? )
I had troubles when not fixing versions with the dependency resolution. It was happening a long time ago, I hope it's better now with conda.
People see which versions we're using and they can decide if they want to use the image or not.
Let's assume the situation when you try to change datascience-notebook and you haven't changed scipy-notebook at all. And, something has broken in the dependencies of scipy-notebook. And now, instead of dealing just with datascience-notebook, you have to change the code you didn't touch.
When something unexpected happens, it's probably because of some error in the upstream dependencies, which can (and should) be fixed in the respective feedstock.
But we can't say this to our users, right? So sometimes we will have to fix some versions.
@mathbunnyru, I used to think similarly, but my perspective has changed.
I think reproducibility is ultimately the responsibility of the end user, and that is easily achieved by pinning a Docker build number. Moreover, the current practice of pinning major/minor version numbers doesn't provide exact reproducibility. For that you'd need not only the patch number but also the conda-forge build number.
For exact reproducibility, I add the following command in my Dockerfile: conda env export > $CONDA_DIR/environment.yaml. From there, it's easy to generate a build artifact with (docker run --rm image-name cat /opt/conda/environment.yaml) > environment.yaml. I don't have any good ideas for how to publish it though... naively committing it would trigger an infinite loop in CI.
I do agree with your 5. While I think it's a fact of life that upstream dependencies will change and break things, I can see how pinning makes things more tame.
I'm not suggesting never to fix versions, just that fixing versions is overrated, and that environment.yaml may be a better way to guarantee reproducibility.
Thanks for your ideas @maresb.
It would be great to hear from @parente and @romainx
The major.minor version pinning approach used here originated in the early days of conda-forge when it was extremely difficult to get a working build with the number of packages in these images. I think it's reasonable to experiment with an unpinned strategy today as long as users are informed about the change, there is a manifest of what actually got installed during a build (there is on the wiki), and active maintainers are ok with troubleshooting a potential decrease in build stability.
In fact if we not only to do it for conda dependencies but also for other parts of the stack like Ubuntu upstream image.
We should also change the build policy to switch to some kind of regular build (daily, weekly) vs build after a merge on the master branch.
The drawback is that the time that will not be spend in updating the images will certainly have to be spent in fixing the builds.
But I'm also Ok to give it a try 馃憤
Having everything correctly built at the first time will be a good indicator 馃槃
Most helpful comment
I think the idea of having a GitHub action to update the dependencies would be a good approach (and overcomes the limitations from dependabot).
I would be happy to give this a go