There are a bunch of other hosts out there other than vanilla github repositories. We should support most of these eventually!
Some that come to mind:
repo2docker expects to be given a URL that points to a .git repository, so the changes needed for each provider basically entail knowing how to mix a URL structure into a link that repo2docker can work with.
cc @yuvipanda @betatim @rsignell-usgs from the old issue
In https://github.com/jupyterhub/binderhub/blob/master/binderhub/repoproviders.py you see there's a 'RepoProvider' base class, and a GitHubRepoProvider subclass. We'll just need to implement similar code for other hosts.
The primary functionality they need to provide is to transform a non hashed commit ref (like a tag or branch name) to a commit hash, using an API of some sort. This lets us not rebuild images if they had already been built. It should be fairly straightforward for other hosts.
I think this is a great place for folks new to binder to get involved and make patches! :D
@yuvipanda how are we gonna handle things like rate limits on all these various providers? I imagine that any provider will have a process for throttling lots of hits.
Yeah, that too will be in the provider specific subclass. You can see the code for this in the GitHubRepoProvider, for example.
One of the missing things in provider right now is the ability to tell whether they can undersand a specific URL. In particular if I pass a GitLab URL, the UI will try to pushState /v2/gh/ we will likely need a endpoint that takes a URL as parameter and go through the providers (in order?) to ask whether they can handle it (probably asynchronously if provider can handle requests).
Yup, this needs to happen in the JS side too, since we probably wanna provide UX clues.
Is there a reason not to switch to a scheme like /v2/gl/ (gitlab) or /v2/bb/ (bitbucket) etc? And then making how to map from "gh" to a hostname something that is configurable by the admin of the binder instance?
@betatim that makes sense to me, and I think that's the 'scheme' we have now, where gh is just the key for github. Additional providers can be registered as new keys. I think we can start with a git provider, which accepts any git url as the escape hatch, then we can bless gitlab, bitbucket with special handling as we develop it.
@minrk curious why the need for a key or a scheme? seems like the shorthand
will just lead to pain in the future (e.g. how do you pick a provider shorthand
for provider #50?).
I do really like the ability to reverse engineer the repo being bound given the
URL, which might have been the motivation. Hmm.
One compromise might be to use the full site name (e.g. github) rather
than gh.
The API strictly defines what those keys mean. Right now, only gh has a meaning, which is find this repo on github.com. The key identifies the 'provider' and then the provider is responsible for interpreting the rest of the URL. It may be appropriate to use clearer, less compressed, provider keys, though.
The reason for special handling of github.com (and others in the future) is that we can resolve refs and check if a new build is necessary much more efficiently with the GitHub API than we can with git itself. So when we know something about the hosting provider, we can take a shortcut that's much less expensive than a shallow git clone. We can also do potential things in the future like provide links to the original, which we can't do in general for repo URLs.
We can also do potential things in the future like provide links to the original, which we can't do in general for repo URLs.
I think this is a useful future feature we can implement as a jupyter extension. Would be quite useful for people who want to go back to the original repo after clicking a binder link.
note that one of these got completed in #266 !
gists are coming in #306
I'm thinking of closing this and handling individual providers with their own issues. WDYT @willingc ?
Seems like a good idea at this point @choldgraf
ok, closing! see top comment for links to provider-specific issues
Nice touch editing the top issue with the links too. @choldgraf