I managed to setup BinderHub by following the steps described in the manual. However, naturally, I cannot reach repositories stored in my organization's directory (private ones). Is there any HOWTO configure the deployment to be able to access GitHub restricted parts?
That's awesome! yay :)
I think two things need to happen for support for private repos:
github.clientId and github.clientSecret in your secret.yaml file to an OAuth application you create that has access to your private repositoriesDoes this sound correct, @minrk?
@drorata do you think you'll have time to try to make a patch? That'd be awesome...
This would be an awesome addition, +1
+1 on adding the functionality, -1 to supporting it on mybinder.org -
I don't think the binder team funding should support private repos.
I agree with @ctb! The default deploy on mybinder.org won't have access to any private repos, so no private stuff will be built on mybinder.org.
yep I agree with that too - it's my understanding that @drorata is working on his own binderhub deployment!
First, thanks for the responses. Secondly, indeed, I am trying to deploy BinderHub in a private setting, namely both the endpoint and the repositories are private. I still have to figure out hot to set the firewall on the deployed Binder's endpoint, but that's a different story.
It goes without saying that the public mybinder.org should not support private repos. However, this use case is crucial if you want to turn this promising tool to something useful for wider audiences.
I promise I'll try to given feedback on patches. @yuvipanda can you be more specific what are github.clientId and github.clientSecret and how to get them? Where and how should they be included in secret.yaml?
As for the 2nd point of @yuvipanda, I'm afraid it is too cryptic for me.
@drorata when you set up binder, you can create a GitHub OAuth application. This will give you a client id and secret.
These can go in secret.yaml:
github:
clientId: '...'
clientSecret: '...'
Then the binder instance ought to be accessing GitHub with credentials, which can have access to private repos.
We may want to support passing a personal access token, which is easier to limit in terms of specific access scope.
@minrk I have added the section to secret.ymal and executed:
helm upgrade binder jupyterhub/binderhub --version=v0.1.0-789e30a -f secret.yaml -f config.yaml
This setting still doesn't manage to access the private repo. I get the following error:
Could not resolve ref for gh:org-name/repo-name/master. Double check your URL.
I am not sure what should have I put in the Homepage URL and Authorization callback URL fields. I filled the former with the IP of binder and the latter with the IP which is opened by my Binder's deployment. Do you have a hint? It might be the cause of the problem.
I'm not sure that an OAuth application can access anything on GitHub. I thought all it can do is let your app obtain a token from a user so it can act on their behalf.
I tested with repo2docker and there I can execute something like:
jupyter-repo2docker https://username:[email protected]/org/reponame.git
However, it doesn't work when passing this tp the binder's interface. The error I get is:
Spec is not of the form "user/repo/ref", provided: "https://username:[email protected]/org/reponame.git/master".
It is coming from this line.
Is it a reasonable direction to enable accessing to private repos?
Now when I look a little deeper, it seems @minrk already implemented the way to go with providing username and token (https://github.com/jupyterhub/binderhub/commit/d9d2229272012ec2a747fd5552e3644470967391#diff-c5688934f1e6dc3e932b6c84c1bbbd5d). However, I don't yet understand how to deploy binderhub and providing it with the username and token.
Seems like https://github.com/jupyterhub/binderhub/blob/master/helm-chart/binderhub/values.yaml#L31 could also be helpful/related, but I don't know how to set it during the deployment process.
Can I somehow assist on this one? I would really like to have a way to access private repos :)
Thinking out loud here.
I see two cases:
Case 1 is probably easier to handle. Case 2 seems harder because the hub keeps the image it has built which means that version of the private repo is now hanging around on the hub with very little the original user can do to have it removed.
Things to do for case 1:
@betatim At least in my case, the first use case is enough. For internal usage purposes (1) an organization would host both binderhub on its servers (cloud or in-house) and (2) the content that needs to be shared is derived from private repos. Regarding the steps you mentioned:
repo2docker and by providing the token in the URL I could produce the correct image. So, at least as a first step, wouldn't it be possible to pass a URL containing the token from binderhub to repo2docker?binderhub on the organization's cloud, it is easy to grant access only from within the organization's network. There can also be other issues with this setting, like security of the cluster; after all this setting allows code execution on the cluster. However, at least at first, should not be, in my mind, a concern. I'm +1 on case 1, and agree that case 2, while more flexible, sounds a lot more fraught with difficulties. I think basic binderhub authentication for a single organization/repo is a good first step.
As repo2docker already supports accessing private repos (using token and the syntax: jupyter-repo2docker https://username:[email protected]/org/reponame.git) shouldn't it be simple to allow binderhub to forward such a URL?
It seems to me that this would be fine if the binderhub was running w/ authentication in front of it and with the caveat that there might be some within-team security risks associated with the process, yeah? I'm assuming here that a private github repo BinderHub would be in the context of a small team / company / etc that doesn't care so much about security risks _within_ the company.
@choldgraf This is exactly my assumption; in-team security risks are not a problem. I don't think it is a good idea to have authentication at the binderhub level. If a user accessed the hub it is safe to assume he has the rights to access the private repos that the hub can access. Otherwise, authentication at binderhub level would mean another layer of complexity in terms of practices: issue passwords, save them, reset them etc. The use case is a simple in-house deployment that allows serving of private repos.
Is there any update or change on this front? I would still find it useful and worthy to be able to start an environment from a private repo.
Related issue(s):
Nothing specific has been implemented yet - I think we all agree it'd be a very useful feature, just a question of finite person-hours and a long list of to-do items. Do you think it'd be possible to enable this by setting up credentials for git like this: https://git-scm.com/docs/gitcredentials on the binderhub machine?
To be honest I don't know... I would like to try and fiddle with the token approach. As it works out-of-the-box for repo2docker I guess it shouldn't be too hard.
I'd give that a shot as a one-off solution to this...maybe that'll unblock
On Fri, Jan 5, 2018 at 7:45 AM Dror Atariah notifications@github.com
wrote:
To be honest I don't know... I would like to try and fiddle with the token
approach. As it works out-of-the-box for repo2docker I guess it shouldn't
be too hard.—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/237#issuecomment-355586463,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHZfwgLEydTwnqEk-iVGL9I6azc1Mks5tHkOLgaJpZM4QP2PC
.
After changing something with the code, what steps should I redo to check the effect?
On Fri, Jan 5, 2018 at 7:49 AM Dror Atariah notifications@github.com
wrote:
After changing something with the code, what steps
https://github.com/jupyterhub/binderhub/blob/master/CONTRIBUTING.md#installation
should I redo to check the effect?—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/237#issuecomment-355587537,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHcnYNQMGboUnzXgqKFlgMJlDFWNDks5tHkRzgaJpZM4QP2PC
.
Yeah... at least to start with I guess... I'm not sure yet :)
if you're changing binderhub code, a stop/restart of the command to begin binderhub should work fine...though in your case I don't know you'd have to modify binderhub itself, just get the computer set up with the credentials needed
Here's a naive attempt on accessing private repos using username and token provided via environment variables: https://github.com/drorata/binderhub/commit/025837d249269cde96baa692a03cc91da13937c6
This way, when starting bhub, if GITHUB_CLIENT_ID and GITHUB_ACCESS_TOKEN are set, they will be used. I tested it on a local deployment and it works as expected. I don't how unsecured this approach is. A workflow in this case could be, when setting bhub, start a dedicated github user and use its token.
What do you (@choldgraf) think?
hmmm - generating the URL with username/pass hard-coded in there does sound insecure. I think the best thing to do is create an oauth application for your binderhub deployment (https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/) then put the credentials on the machine on which you're running BinderHub (e..g maybe this points in the right direction: https://stackoverflow.com/questions/2505096/cloning-a-private-github-repo)
that said, I think @yuvipanda 's point would still remain as to how to pass the oauth credential to the builder that's spawned in kubernetes, since that's what does the actual git cloning etc. (correct me if I'm wrong on that yuvi)
@choldgraf just to make sure we're on the same page: it is not hard-coded; the credentials are passed a environment variables. Are they, at any point communicated outside of bhub (except of course when it clones the repo)?
I was under the impression that somewhere in the logs it prints out the URL of the repo, but I could be mistaken! Can take a look tomorrow if I have a moment
But the logs are stored on the same machine where bhub is running, right? Is it sent out to some other server?
BTW, is there some pointer to instruction how to deploy a modified version of bhub?
Yep, though they're displayed in the browser I think (in the little drop-down terminal)...but that's where I'm not sure exactly what is and isn't displayed in there, so we'd need to double check.
re: modified versions of bhub, I'm not really sure how to do this with Kubernetes but I agree we should have documentation up there even if it's just to say "no you can't do this"
(have a bunch of other things I need to take care of this week so sorry for the slow responses!)
Are there instructions on how one could set this up? I can not seem to find guidance in the docs about what exactly goes and into the secrets.yml file.
@tallamjr check out the changes made in: https://github.com/jupyterhub/binderhub/pull/783
that adds a section to talk about how to get access to private repositories in a binderhub
Most helpful comment
That's awesome! yay :)
I think two things need to happen for support for private repos:
github.clientIdandgithub.clientSecretin your secret.yaml file to an OAuth application you create that has access to your private repositoriesDoes this sound correct, @minrk?
@drorata do you think you'll have time to try to make a patch? That'd be awesome...