Binderhub: Accessing private GitHub repositories

Created on 2 Nov 2017  Â·  37Comments  Â·  Source: jupyterhub/binderhub

I managed to setup BinderHub by following the steps described in the manual. However, naturally, I cannot reach repositories stored in my organization's directory (private ones). Is there any HOWTO configure the deployment to be able to access GitHub restricted parts?

configuration

Most helpful comment

That's awesome! yay :)

I think two things need to happen for support for private repos:

  1. Set github.clientId and github.clientSecret in your secret.yaml file to an OAuth application you create that has access to your private repositories
  2. Somehow add support for passing this to the builder that's spawned. I don't think we support this yet, but should be easy to add I think. We need to find out how to pass OAuth credentials to git to be able to clone. It'd probably require a patch to github.com/jupyter/repo2docker and https://github.com/jupyterhub/binderhub/blob/master/binderhub/build.py#L45

Does this sound correct, @minrk?

@drorata do you think you'll have time to try to make a patch? That'd be awesome...

All 37 comments

That's awesome! yay :)

I think two things need to happen for support for private repos:

  1. Set github.clientId and github.clientSecret in your secret.yaml file to an OAuth application you create that has access to your private repositories
  2. Somehow add support for passing this to the builder that's spawned. I don't think we support this yet, but should be easy to add I think. We need to find out how to pass OAuth credentials to git to be able to clone. It'd probably require a patch to github.com/jupyter/repo2docker and https://github.com/jupyterhub/binderhub/blob/master/binderhub/build.py#L45

Does this sound correct, @minrk?

@drorata do you think you'll have time to try to make a patch? That'd be awesome...

This would be an awesome addition, +1

+1 on adding the functionality, -1 to supporting it on mybinder.org -
I don't think the binder team funding should support private repos.

I agree with @ctb! The default deploy on mybinder.org won't have access to any private repos, so no private stuff will be built on mybinder.org.

yep I agree with that too - it's my understanding that @drorata is working on his own binderhub deployment!

First, thanks for the responses. Secondly, indeed, I am trying to deploy BinderHub in a private setting, namely both the endpoint and the repositories are private. I still have to figure out hot to set the firewall on the deployed Binder's endpoint, but that's a different story.

It goes without saying that the public mybinder.org should not support private repos. However, this use case is crucial if you want to turn this promising tool to something useful for wider audiences.

I promise I'll try to given feedback on patches. @yuvipanda can you be more specific what are github.clientId and github.clientSecret and how to get them? Where and how should they be included in secret.yaml?

As for the 2nd point of @yuvipanda, I'm afraid it is too cryptic for me.

@drorata when you set up binder, you can create a GitHub OAuth application. This will give you a client id and secret.

These can go in secret.yaml:

github:
  clientId: '...'
  clientSecret: '...'

Then the binder instance ought to be accessing GitHub with credentials, which can have access to private repos.

We may want to support passing a personal access token, which is easier to limit in terms of specific access scope.

@minrk I have added the section to secret.ymal and executed:

helm upgrade binder jupyterhub/binderhub --version=v0.1.0-789e30a -f secret.yaml -f config.yaml

This setting still doesn't manage to access the private repo. I get the following error:

Could not resolve ref for gh:org-name/repo-name/master. Double check your URL.

I am not sure what should have I put in the Homepage URL and Authorization callback URL fields. I filled the former with the IP of binder and the latter with the IP which is opened by my Binder's deployment. Do you have a hint? It might be the cause of the problem.

I'm not sure that an OAuth application can access anything on GitHub. I thought all it can do is let your app obtain a token from a user so it can act on their behalf.

I tested with repo2docker and there I can execute something like:

jupyter-repo2docker https://username:[email protected]/org/reponame.git

However, it doesn't work when passing this tp the binder's interface. The error I get is:

Spec is not of the form "user/repo/ref", provided: "https://username:[email protected]/org/reponame.git/master".

It is coming from this line.

Is it a reasonable direction to enable accessing to private repos?

Now when I look a little deeper, it seems @minrk already implemented the way to go with providing username and token (https://github.com/jupyterhub/binderhub/commit/d9d2229272012ec2a747fd5552e3644470967391#diff-c5688934f1e6dc3e932b6c84c1bbbd5d). However, I don't yet understand how to deploy binderhub and providing it with the username and token.

Seems like https://github.com/jupyterhub/binderhub/blob/master/helm-chart/binderhub/values.yaml#L31 could also be helpful/related, but I don't know how to set it during the deployment process.

Can I somehow assist on this one? I would really like to have a way to access private repos :)

Thinking out loud here.

I see two cases:

  1. binderhub instance owns the credentials (organisation X sets up the hub, provides a token that allows access to repositories owned by that org for all users of the hub)
  2. the user comes with their own token (the hub can now access everything I can access, limited by scope of the token)

Case 1 is probably easier to handle. Case 2 seems harder because the hub keeps the image it has built which means that version of the private repo is now hanging around on the hub with very little the original user can do to have it removed.

Things to do for case 1:

  1. modify repo2docker to know to look for a token and use it when cloning a repo (use an environment variable, not a command line argument)
  2. pass the token from binderhub to repo2docker. Maybe the secrets are already mounted in the repo2docker container??
  3. setup authentication for accessing the binderhub instance or firewall it or ??

@betatim At least in my case, the first use case is enough. For internal usage purposes (1) an organization would host both binderhub on its servers (cloud or in-house) and (2) the content that needs to be shared is derived from private repos. Regarding the steps you mentioned:

  1. I experimented with repo2docker and by providing the token in the URL I could produce the correct image. So, at least as a first step, wouldn't it be possible to pass a URL containing the token from binderhub to repo2docker?
  2. An organization should be careful enough and take the precautions needed to keep it safe. By hosting binderhub on the organization's cloud, it is easy to grant access only from within the organization's network. There can also be other issues with this setting, like security of the cluster; after all this setting allows code execution on the cluster. However, at least at first, should not be, in my mind, a concern.

I'm +1 on case 1, and agree that case 2, while more flexible, sounds a lot more fraught with difficulties. I think basic binderhub authentication for a single organization/repo is a good first step.

As repo2docker already supports accessing private repos (using token and the syntax: jupyter-repo2docker https://username:[email protected]/org/reponame.git) shouldn't it be simple to allow binderhub to forward such a URL?

It seems to me that this would be fine if the binderhub was running w/ authentication in front of it and with the caveat that there might be some within-team security risks associated with the process, yeah? I'm assuming here that a private github repo BinderHub would be in the context of a small team / company / etc that doesn't care so much about security risks _within_ the company.

@choldgraf This is exactly my assumption; in-team security risks are not a problem. I don't think it is a good idea to have authentication at the binderhub level. If a user accessed the hub it is safe to assume he has the rights to access the private repos that the hub can access. Otherwise, authentication at binderhub level would mean another layer of complexity in terms of practices: issue passwords, save them, reset them etc. The use case is a simple in-house deployment that allows serving of private repos.

Is there any update or change on this front? I would still find it useful and worthy to be able to start an environment from a private repo.

Related issue(s):

Nothing specific has been implemented yet - I think we all agree it'd be a very useful feature, just a question of finite person-hours and a long list of to-do items. Do you think it'd be possible to enable this by setting up credentials for git like this: https://git-scm.com/docs/gitcredentials on the binderhub machine?

To be honest I don't know... I would like to try and fiddle with the token approach. As it works out-of-the-box for repo2docker I guess it shouldn't be too hard.

I'd give that a shot as a one-off solution to this...maybe that'll unblock

your problem!

On Fri, Jan 5, 2018 at 7:45 AM Dror Atariah notifications@github.com
wrote:

To be honest I don't know... I would like to try and fiddle with the token
approach. As it works out-of-the-box for repo2docker I guess it shouldn't
be too hard.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/237#issuecomment-355586463,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHZfwgLEydTwnqEk-iVGL9I6azc1Mks5tHkOLgaJpZM4QP2PC
.

After changing something with the code, what steps should I redo to check the effect?

what code are you changing? something in binderhub?

On Fri, Jan 5, 2018 at 7:49 AM Dror Atariah notifications@github.com
wrote:

After changing something with the code, what steps
https://github.com/jupyterhub/binderhub/blob/master/CONTRIBUTING.md#installation
should I redo to check the effect?

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/binderhub/issues/237#issuecomment-355587537,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABwSHcnYNQMGboUnzXgqKFlgMJlDFWNDks5tHkRzgaJpZM4QP2PC
.

Yeah... at least to start with I guess... I'm not sure yet :)

if you're changing binderhub code, a stop/restart of the command to begin binderhub should work fine...though in your case I don't know you'd have to modify binderhub itself, just get the computer set up with the credentials needed

Here's a naive attempt on accessing private repos using username and token provided via environment variables: https://github.com/drorata/binderhub/commit/025837d249269cde96baa692a03cc91da13937c6

This way, when starting bhub, if GITHUB_CLIENT_ID and GITHUB_ACCESS_TOKEN are set, they will be used. I tested it on a local deployment and it works as expected. I don't how unsecured this approach is. A workflow in this case could be, when setting bhub, start a dedicated github user and use its token.

What do you (@choldgraf) think?

hmmm - generating the URL with username/pass hard-coded in there does sound insecure. I think the best thing to do is create an oauth application for your binderhub deployment (https://developer.github.com/apps/building-oauth-apps/creating-an-oauth-app/) then put the credentials on the machine on which you're running BinderHub (e..g maybe this points in the right direction: https://stackoverflow.com/questions/2505096/cloning-a-private-github-repo)

that said, I think @yuvipanda 's point would still remain as to how to pass the oauth credential to the builder that's spawned in kubernetes, since that's what does the actual git cloning etc. (correct me if I'm wrong on that yuvi)

@choldgraf just to make sure we're on the same page: it is not hard-coded; the credentials are passed a environment variables. Are they, at any point communicated outside of bhub (except of course when it clones the repo)?

I was under the impression that somewhere in the logs it prints out the URL of the repo, but I could be mistaken! Can take a look tomorrow if I have a moment

But the logs are stored on the same machine where bhub is running, right? Is it sent out to some other server?

BTW, is there some pointer to instruction how to deploy a modified version of bhub?

Yep, though they're displayed in the browser I think (in the little drop-down terminal)...but that's where I'm not sure exactly what is and isn't displayed in there, so we'd need to double check.

re: modified versions of bhub, I'm not really sure how to do this with Kubernetes but I agree we should have documentation up there even if it's just to say "no you can't do this"

(have a bunch of other things I need to take care of this week so sorry for the slow responses!)

Are there instructions on how one could set this up? I can not seem to find guidance in the docs about what exactly goes and into the secrets.yml file.

@tallamjr check out the changes made in: https://github.com/jupyterhub/binderhub/pull/783

that adds a section to talk about how to get access to private repositories in a binderhub

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yuvipanda picture yuvipanda  Â·  4Comments

jpivarski picture jpivarski  Â·  6Comments

ashtonmv picture ashtonmv  Â·  6Comments

betatim picture betatim  Â·  3Comments

choldgraf picture choldgraf  Â·  3Comments