Hi,
Maybe this is already resolved but I can see it.
Simple use case: I have a repo with 2 notebooks, the first one with a very simple dependency and a second with a lot of heavy dependencies. I would like to define a specific environment for each notebook without having to create independent repositories.
For example, the following repo:
.
โโโ binder
โ โโโ requirements_abc.txt
โ โโโ Dockerfile_def
โโโ notebooks
โ โโโ abc.ipynb
โ โโโ def.ipynb
โโโ readme.txt
The actual url is:
https://mybinder.org/v2/<provider-name>/<org-name>/<repo-name>/<branch|commit|tag>?filepath=<path/to/notebook.ipynb>
And I (from a user perspective) would like something like:
https://mybinder.org/v2/<provider-name>/<org-name>/<repo-name>/<branch|commit|tag>?filepath=<path/to/notebook.ipynb>&dependencies=<binder/dependencies_file>
Maybe this could be achieved using tags and/or commits but this will require to overwrite the dependencies file on each new notebook commit.
Another workaround could be to use separate gists (#306) for each notebook and dependency file.
I don't know if this make sense.
Thanks in advance.
Hmmm - at this point, Binder follows a "1 repository -> 1 environment" rule. You could try accomplishing this with having different branches for your repo, and pointing binder to those branches, but I don't think the project will support defining multiple different environments within a single repo. Does that make sense?
@choldgraf Thanks for answering.
Branching could be an option but far from perfect, IMHO.
Our detailed use case in order to provide some context:
We have a free (no ads, no data collection, no business,...) static site talking about python and science in spanish. Most of the posts are educational and we would like to allow our readers to execute what we are talking about in a specific post without the need to install stuff and download data. As you know, a new post would have more visits during the first days/weeks and the new post only needs, for instance, numpy so maintaining a unique dependencies_file for all the posts would overload the launch of mybinder with unnecesary installations not related with this new post. Also, as we write about a lot of stuff, a dependencies_file for all the posts would be huge and maintaning a branch for each new post could be complex as we accept contributions from whoever want to contribute.
The traffic is not huge so I think we follow the binder usage guidelines. But if you think this way to use mybinder is not correct just let us know in order to find alternatives.
Thanks again.
Do you think the total set of dependencies for all posts would be that big? Maybe you only need about 10 or 15 dependencies and then new posts would not need a new package because it is already in the list of dependencies.
I am -1 on having different sets of dependencies per repository. Besides the technical complexity for Binderhub we'd also have to work out how to handle the case where a user tries to run a notebook that doesn't have all its dependencies installed (so you might have to hide it or some such ...).
Do you think the total set of dependencies for all posts would be that big? Maybe you only need about 10 or 15 dependencies and then new posts would not need a new package because it is already in the list of dependencies.
Yes, definitely. We are people with different backgrounds and we try to encourage people from other fields to write in our site so we talk about a lot of stuff.
I am -1 on having different sets of dependencies per repository. Besides the technical complexity for Binderhub we'd also have to work out how to handle the case where a user tries to run a notebook that doesn't have all its dependencies installed (so you might have to hide it or some such ...).
Right now you can point directly to a notebook so my idea was to point to a notebook and a reduced set of dependencies to run that notebook. If it is very complex then we should look for alternatives:
Thanks for your thoughts.
Thanks for the clarification @kikocorreoso ! I think your best bet is to try and install a list of dependencies you'll conceivably need for most posts. Binder will turn the environment into a Docker image, and installing extra dependencies that aren't used all the time generally isn't a big issue (unless the image is really big).
In a pinch, for users that write a script requiring some very specific dependency, you could also consider having them explicitly install that dependency at the top of the post. This could be informative for readers anyway, as often it's useful to highlight things that are not part of the "standard scipy stack". Think that'd work?
In a pinch, for users that write a script requiring some very specific dependency, you could also consider having them explicitly install that dependency at the top of the post. This could be informative for readers anyway, as often it's useful to highlight things that are not part of the "standard scipy stack". Think that'd work?
Yesterday, before reading your answer, I was thinking this could be a practical hack for our use case. Thanks for confirming it :smile:
Thanks for the constructive feedback.
Hi all, I'm just curious if the maintainers are interested in considering this again. Similar to @kikocorreoso , we are interested in spinning up a submission-based blog to showcase environmental data science projects/packages/viz.
I'm wondering if the create Binder page could have an optional "path to an environment file" argument like there is an optional "path to a notebook file" argument: https://mybinder.org/
The optional "path to an environment file" arg could override Binder from looking and selecting a repo's Dockerfile/environment.yaml/etc., only if the optional arg is supplied. I'd be interested in working on this if the maintainers are open to it.
I appreciate the suggestions above on how to work around the "1 repo 1 environment" model. Unfortunately in our case, it doesn't suit our needs.
Thanks for the clarification @kikocorreoso ! I think your best bet is to try and install a list of dependencies you'll conceivably need for most posts. Binder will turn the environment into a Docker image, and installing extra dependencies that aren't used all the time generally isn't a big issue (unless the image is really big).
Projects/posts will inevitably have non-overlapping dependencies and may require different versions of the same package, especially (from my experience) where geospatial python packages are required. So a single environment.yml file won't be able to serve all posts for our use case. Branches are tricky to use for our case where users would need to merge in posts but not the environment file, and there would need to be a branch for every post.
Hi @rbavery, would you mind posting this on the Discourse forum instead? https://discourse.jupyter.org/
There's a wider potential audience who may be interested in the pros and cons. All Jupyter projects are community projects, and if enough of the community are interested the decision here may change, or we might find an alternative way to achieve what you want.
I think https://github.com/jupyterhub/binderhub/issues/555#issuecomment-390112357 remains the best compromise.
An alternative option is the idea of "binder boxes" or splitting the environment from the content. I did a bit of searching on the forum and https://discourse.jupyter.org/t/tip-embed-custom-github-content-in-a-binder-link-with-nbgitpuller/922/20 was the best I could find. There are more threads to read though. For your use case you'd have several repos that define the environments and then pull in the notebook that you want (per blog post) via nbgitpuller or the like.
I realise both are workarounds but I think they represent the best trade-off between maintenance complexity, usability and functionality
edit after seeing @manics comment: i'd be happy to discuss in the forum.
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/allow-for-multiple-different-dependencies-per-repository/4109/1
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:
https://discourse.jupyter.org/t/allow-for-multiple-different-dependencies-per-repository/4109/2
Most helpful comment
Hi @rbavery, would you mind posting this on the Discourse forum instead? https://discourse.jupyter.org/
There's a wider potential audience who may be interested in the pros and cons. All Jupyter projects are community projects, and if enough of the community are interested the decision here may change, or we might find an alternative way to achieve what you want.