Jupyterhub: Kubernetes Spawner

Created on 17 Sep 2016  Â·  32Comments  Â·  Source: jupyterhub/jupyterhub

I have been working a little bit in a Jupyter Hub Kubernetes Spawner (here: https://github.com/danielfrg/jupyterhub-kubernetes_spawner) and I was wondering if there is some interest on moving it to the main supported spawners on the Jupyter Hub organisation on GitHub.

I have been testing it this week and its been working perfectly but of course there is more work to do. I plan to maintain it and add a couple of other features and example deployments.

enhancement

Most helpful comment

/me nods. If there's an officially maintained one that has 'upstream
support' I'll be happy to switch over to that :)

On Mon, Nov 21, 2016 at 1:42 PM, Daniel Rodriguez [email protected]
wrote:

Nice, I see that they are using the swagger generated code. There have
always been a couple of these around github but its awesome to see one
officially supported.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/jupyterhub/issues/752#issuecomment-262076319,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB23t4fpzKSXry7w4hzH-YD7hoqMnIoks5rAhBVgaJpZM4J_nBu
.

Yuvi Panda T
http://yuvi.in/blog

All 32 comments

Thanks for letting us know about your work with Kubernetes. :smile:

@minrk Thoughts?

@danielfrg Were you aware of https://github.com/yuvipanda/jupyterhub-kubernetes-spawner? If so, how do the two compare?

cc @yuvipanda

Thanks @ryanlovett for passing along.

Hah, awesome! @danielfrg would you be interested in merging these two into one that we can then put in the official repo?

Yes, I am definitely ok with that.

I would say the big difference is how they are connecting to the Kubernetes API. For mine I generated a bunch of code from the official Kubernetes swagger definition using swagger codegen and I use that generated code as a REST client but I think that in general both implementations are similar.

I've been meaning to move to using https://github.com/yuvipanda/kubesession
for connecting to it. My experience with codegen from swagger for an API as
big as k8s hasn't been very positive, especially since we're only using a
tiny subset of it. How tied are you to using that?

On Tue, Sep 20, 2016 at 12:26 PM, Daniel Rodriguez <[email protected]

wrote:

Yes, I am definitely ok with that.

I would say the big difference is how they are connecting to the
Kubernetes API. For mine I generated a bunch of code from the official Kubernetes
swagger definition
https://github.com/kubernetes/kubernetes/blob/master/api/swagger-spec/v1.json
using swagger codegen https://github.com/swagger-api/swagger-codegen
and I use that generated code as a REST client but I think that in general
both implementations are similar.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/jupyterhub/issues/752#issuecomment-248406501,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB23nYE_3gkA-CyC2JFMJb2rCujwdQxks5qsDNVgaJpZM4J_nBu
.

Yuvi Panda T
http://yuvi.in/blog

Since the API is so big I haven't tested all of it but I have had zero issues with it so far.

We are indeed using a small part of their API but if thats the issue we could remove the generated files that are not used.

I am not tied to using it but I like it since I don't didn't have to write it or maintain as much as a hand written version.

kubesession is just a small wrapper around requests that only does authentication and parsing .kube/config / serviceaccount files, and nothing else. It's a very tiny piece, so not that much maint, and is much easier to understand (at least for me!) vs codegen from swagger. So you'd make API requests like how you'd normally use requests (or the tornado http client, which is what we should be using). We can even inline this into the spawner, since it isn't much code.

So I guess the question is: do we use code gen or not for the client.

I'll take a look through your code later today (hopefully!) and form more opinions! Excited to have more people working on it :D

Yes, thats the question.

I agree that I don't love the generated code but I think it gives me a level of security when I start forming the pod spec based those generated objects instead of me just adding fields to a dictionary. That security might be a little false but its something i can hold to.

Another solution could be to use the generated code to generate a valid json Kubernetes spec and make the request using tornado as you mention.

I've thought about this some more, and I still think we shouldn't use the generated code, for the following reasons:

  1. It's a very leaky abstraction. Ultimately we'll have to reason about what JSON goes on the wire and how it reacts, so it's not like we can actually forget about it. Using generated code only means we've to worry about two things instead of one now.
  2. Kubernetes' API is actually very well documented (via Swagger!) and quite sanely designed. This is the reason why autogenerating code actually works, and is also a reason why it isn't needed, IMO.
  3. When the API gets updated, we'll have to regenerate all the code. If the code generator changes the API of the generated objects when this happens, we've to deal with it before being able to use the new APIs.
  4. It's also 17691 lines of generated code we'll be shipping (not counting comments), and most of those will never be used.
  5. The generated code is still not very good - I see you've had to add another layer of abstraction on top (https://github.com/danielfrg/jupyterhub-kubernetes_spawner/blob/master/kubernetes_spawner/kube.py) to make it nice and usable. This is 3 layers of leaky abstractions when we can get away with none.

Ultimately, I don't think it gives us any security. I agree we should do a good job of validating the objects we send, but given that we don't really have strong types in python we should do so via unit tests rather than rely on this.

I also have some deployments that progress from less to more complex in https://github.com/yuvipanda/jupyterhub-simplest-k8s, so we should probably also collaborate on that. I'm currently trying to finish up https://github.com/yuvipanda/jupyterhub-singlenode-deploy to a solid end point (along with the systemd spawner) before moving back to the k8s work. I'm also travelling for the next few weeks so I might not be able to work on any kubernetes code in the short term unfortunately :(

I don't fully agree with some of the comments.

  1. The generated code does generate one extra layer of abstraction in order to generated a valid kubernetes json but I believe this is something we would have to add in anyway, even if its a templates.py file that has a bunch of dictionaries
  2. Yes the Kubernetes API is well defined and its via swagger thats another reason I chose to use the generated code. I haven't seen the code but I know that inside Kubernetes there is some code generation happening, from Go structs to JSON and maybe protobuf soon: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/protobuf.md
  3. When the API gets updated we do have to regenerate the code or if we write our own layer of abstraction update it there is no way we get away from this. I don't think the API (that we need) will change a lot to be honest so not a big deal
  4. I agree that we should not include all the generated code just the one that we actually use and maybe even have a Kubernetes client lib that is independent of the spawner. When I did some research I couldn't found one that actually worked. I still have plans to add a replica set and other features and thats the reason I have not deleted a lot of the generated code.
  5. I am not a fan of the generated code quality but personally after using thrift, protobuf and so many other generated code tools I have learn to trust them and appreciate them :P. I also think one layer of abstraction will be good to decouple the spawner logic from the Kubernetes rest requests

At the end of the day its not that important.

We should collaborate on the deployments I also have a couple of them here: https://github.com/danielfrg/jupyterhub-kubernetes_spawner/tree/master/examples and I those examples are more important than the other discussion I think.

Are there no acceptable kubernetes Python APIs that can be used? The Kubernetes docs recommend pykube.

Is there any progress on this? I've almost decided to use Kubernetes with JupyterHub. Having a supported spawner would be great.

Sorry that I was out for a while - I would like to see progress here.

pykube is nice to connect to the API but it doesn't provide an object creation like the swagger objects do.

I still feel that some code that translates from python objects to json representation with some validation is very useful but like I mentioned before this might just be me that I have used this type of tools (like protobuf) in the past and I find them very useful.

I too agree on pykube being awkward middle ground where it is a bit worst
of both worlds...

On Wed, Oct 12, 2016 at 3:46 PM, Daniel Rodriguez [email protected]
wrote:

Sorry that I was out for a while I would like to see progress here.

pykube is nice to connect to the API but it doesn't provide an object
creation like the swagger objects do.

I still feel that some code that translates from python objects to json
representation with some validation is very useful but like I mentioned
before this might just be me that I have used this type of tools (like
protobuf) in the past and I find them very useful.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/jupyterhub/issues/752#issuecomment-253361514,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB23sic1Fwrbn7z7DadmYinjpme66prks5qzWMzgaJpZM4J_nBu
.

Yuvi Panda T
http://yuvi.in/blog

I'm going to set aside Saturday to play around with the swagger generated code for the k8s api. Hopefully, that'll give me enough information to either accept that they're useful in this case, or stronger points on why we shouldn't use them. Either way, let's try to get this resolved by end of next week?

I'd also like others to chime in :) If other members of the community have strong feelings one way or the other too, I can be easily persuaded!

I'm not generally a fan of generated client APIs, and I don't love the idea of maintaining our own Kubernetes API wrapper (even if it is generated). Is there really nobody in the world using Kubernetes from Python?

Perhaps work with Eldarion folks (who typically are very nice) to improve pykube, if possible, as Eldarion will likely continue to maintain pykube since their biz depends on it.

I'd also like others to chime in :) If other members of the community have strong feelings one way or the other too, I can be easily persuaded!

Same here.

Is there really nobody in the world using Kubernetes from Python?

There is alternatives and I have tested some of them but none of them is really complete or up to date (v1beta api for deployments) for example: https://github.com/mnubo/kubernetes-py

There is also some pkgs in pypi that have the swagger generated code: https://pypi.python.org/pypi/python-k8sclient. I don't know whats the source for that library but I could maintain something similar independent of the JupyterHub spawner.

In general I don't see a lot of action of using python to deploy and manage kubernetes.

I spent more time looking through the generated code, and I continue to be unconvinced we should include that. However, I'd also like it to be flexibile, so we can use it to generate the JSON objects in the future if we need to. So here's what I'm going to do:

  1. Cleanup https://github.com/yuvipanda/jupyterhub-kubernetes-spawner to separate code that generates the objects from the code that makes API calls even more
  2. Write better unit tests for the code that generates objects
  3. Write actual documentation
  4. Incorporate the features in https://github.com/danielfrg/jupyterhub-kubernetes_spawner into it - particularly, to better support persistent volume claims. Me and @ryanlovett have been discussing this over the last few weeks, and have a good test case for this.
  5. Work with @danielfrg to get it to a shape that he's comfortable with too :)
  6. Continue adding more features / polish as we see fit, along with @danielfrg and other contributors

Since most of my systemd work is over now, I'll start working on this tomorrow, and hopefully aim to finish all 5 points and make a release by end of next week.

As a side note, I also looked at pykube - we've used it at wikimedia for other k8s clients, and are not big fans of the approach. It's a fairly leaky abstraction, except unlike generated code it's hand maintained and hence prone to falling behind. It didn't really give us much, and we went back to just making JSON objects by hand.

@danielfrg I hope this is acceptable.

@nnashok I don't think you need to wait for this to settle before using it - I think both the spawners right now are already being used (mine is in use at Wikimedia rn), and am pretty sure it will not be too hard to migrate off that / upgrade from that eventually. Feel free to start using either one :)

@yuvipanda I'm at Grace Hopper this week but happy to review docs and iterate with you.

@willingc awesome! I'm currently working on docs to setup a local dev environment that enables rapid development, will push it when done and poke you :)

You can see the code improvements I'm making in the https://github.com/yuvipanda/jupyterhub-kubernetes-spawner/tree/fixup branch.

A significant chunk of my refactoring work is done, and I'm maintaining a PR with TODO items at https://github.com/yuvipanda/jupyterhub-kubernetes-spawner/pull/9 :) I'll spend some time early next week:

  1. writing a lot more docs (I've already written a good chunk of inline docs)
  2. Adding first class support for persistent volume claims (there's already good support for volumes), esp. alongside dynamic volume provisioning (http://blog.kubernetes.io/2016/10/dynamic-provisioning-and-storage-in-kubernetes.html). This will allow us to easily provision a disk per user.
  3. Add a configuration with dynamic volume provisioning to https://github.com/yuvipanda/jupyterhub-simplest-k8s, and write a lot more docs there too
  4. Move kubespawner to the jupyterhub org, and make a release <#

How does that sound to everyone?

(@minrk @willingc @danielfrg feel free to comment on the PR!)

@danielfrg I've now merged my refactoring and moved it to the org, at https://github.com/jupyterhub/kubespawner. I got all of the same features you had right now, and am currently working on StorageClass + PVC support for easier persistent storage for single user notebooks.

Thanks @danielfrg and @yuvipanda. Looking forward to future iterations :smile:

As an update, https://github.com/kubernetes-incubator/client-python is in the kubernetes-incubator. I'll keep an eye on it to see if it graduates.

Nice, I see that they are using the swagger generated code. There have always been a couple of these around github but its awesome to see one officially supported.

/me nods. If there's an officially maintained one that has 'upstream
support' I'll be happy to switch over to that :)

On Mon, Nov 21, 2016 at 1:42 PM, Daniel Rodriguez [email protected]
wrote:

Nice, I see that they are using the swagger generated code. There have
always been a couple of these around github but its awesome to see one
officially supported.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jupyterhub/jupyterhub/issues/752#issuecomment-262076319,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB23t4fpzKSXry7w4hzH-YD7hoqMnIoks5rAhBVgaJpZM4J_nBu
.

Yuvi Panda T
http://yuvi.in/blog

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danielballan picture danielballan  Â·  50Comments

parente picture parente  Â·  23Comments

yvan picture yvan  Â·  21Comments

gagan-preet picture gagan-preet  Â·  26Comments

satendrakumar picture satendrakumar  Â·  53Comments