Zero-to-jupyterhub-k8s: Document how to enable user-selectable Docker images

Created on 12 Jan 2018 · 29Comments · Source: jupyterhub/zero-to-jupyterhub-k8s

UPDATED

Add documentation on how to utilize KubeSpawners profile_list traitlet implemented here: https://github.com/jupyterhub/kubespawner/pull/137

See https://gitter.im/jupyterhub/jupyterhub?at=5a59236e6117191e615b6729 for a recent question about this, it is possible to set up JupyterHub to let users select a Docker Image. We should document how to do this.

documentation enhancement

Source

choldgraf

Most helpful comment

With https://github.com/jupyterhub/kubespawner/pull/137 the profile list is finally built-in!

clkao on 26 Mar 2018

🎉2 👍2

All 29 comments

This would be of high value to me personally. I'd also welcome pointers to dev documentation, examples, source code, etc. that might help me on my way.

mrocklin on 9 Feb 2018

I'd be curious as well as following the existing examples via kubeflow didn't quite give me enough of the info necessary to proceed.

Thanks a bunch!

sylus on 14 Mar 2018

I'd also like to see how to do this :) Happy to be a guinea pig!

cam72cam on 15 Mar 2018

With https://github.com/jupyterhub/kubespawner/pull/137 the profile list is finally built-in!

clkao on 26 Mar 2018

🎉2 👍2

I'm using profile_list, but it keeps selecting the default image in the config.yml file specified under singleuser.image, but when I don't specify singleuser.image, the image puller hangs. Ideas? I'm having difficulty finding much documentation on profile_list with z2jh

vilhelmen on 27 Nov 2018

I found my issue, by the way. I was looking at the master branch and using image instead of image_spec. image_spec is the latest tagged release way and image is the new way.

vilhelmen on 28 Nov 2018

Example configuration

A benefit of setting this here instead of in hub.extraConfig under c.KubeSpawner.profile_list, is that we can access the overrides and search for images to pull in the image-pullers.

singleuser:
  profileList:
    - display_name: "Default: Shared, 8 CPU cores"
      description: "By selecting this choice, you will be assigned a environment that will run on a shared machine with CPU only."
      default: True
    - display_name: "Dedicated, 4 CPU cores & 26GB RAM, 1 NVIDIA Tesla K80 GPU"
      description: "By selecting this choice, you will be assigned a environment that will run on a dedicated machine with a single GPU, just for you."
      kubespawner_override:
        extra_resource_limits:
          nvidia.com/gpu: "1"

NOTE: the actual machine types chosen depends on your node pools etc. In this case I imagined a n1-highmem-8 machine on Google Cloud for the default profile, allowing for about 50 users with settings not seen in example above. For the GPU profile I imagine a dedicated n1-highmem-4 machine to spin up whenever someone needs it. It is currently not possible to share GPUs and their memory as one is sharing CPUs and normal memory, but work is progressing towards it.

consideRatio on 20 Dec 2018

👍1

That YAML looks a lot nicer than the JSON in hub.extraConfig 😀

manics on 20 Dec 2018

👍2

Yepp it does! Thanks @vilhelmen for making it possible!

consideRatio on 20 Dec 2018

I think we should have some sort of narrative descriptive documentation for this as well.

yuvipanda on 21 Dec 2018

👍1

@consideRatio What do you mean by _Personal_ (vs. _Shared_) machine in your example above? Do you imply by the _Personal_ term an exclusive use of a K8s host by a specific user or something else? Please clarify ...

ablekh on 21 Dec 2018

@ablekh ah, I revised my example!

consideRatio on 21 Dec 2018

@consideRatio Thank you! So, if I understand correctly, by _Dedicated_ you refer to a machine with a GPU card present, but that does not mean that that machine is dedicated / reserved for a particular user. That is, the GPU-containing machine is still technically shared as a part of given K8s cluster (and, thus, could be used as shared for users, selecting other, CPU-only, profiles). Sorry, I'm not trying to nitpick, but rather to fully understand the profiles functionality (using your example). Would appreciate some further clarification. :-)

ablekh on 21 Dec 2018

By including GPU in the example, I raised a lot of questions not relating to profileList :D

Yes, technically it is still shared as part of a given k8s cluster. I updated the example again :p

Okay, so in a setup I imagined, if using GPUs, I would not allow any CPU-only-needing users or pods to fill it up and then make it unavailable for GPU-needing users. Actually, this is done automatically with the extra resource type of nvidia.com/gpu. Nodes that register they have such GPU resources will only allow pods requesting such resources by taints being applied automatic on such nodes and tolerations applied automatically by pods requesting such resources.

So, it such GPU machine, if it only has a single GPU, will not be shared among users, only those requesting nvidia.com/gpu will get to share with it, but if it only has one, then only one user will fit on it, and hence it will be dedicated to the user.

But, during neurips 2018 we ran into a limitation of the amount of nodes and CPUs we had available, so what we did then was to use massive nodes with n1-highmem-16 nodes with 8 NVIDIA Tesla K80 GPUs attached. Then, up to 8 users would share the node, as the users only requested 1 GPU. (You cannot request a fraction of a GPU currently)

consideRatio on 21 Dec 2018

👍1

@consideRatio Thank you very much for prompt and detailed reply. I'm glad that my understanding was pretty close to reality (I was not aware of K8s' automated GPU bounding based on using nvidia.com/gpu resource type). Clearly, I still have to cover a lot of both K8s and JupyterHub ground. :-) Happy New Year!

ablekh on 21 Dec 2018

:D your welcome @ablekh happy new year!

consideRatio on 21 Dec 2018

❤1

@consideRatio I'd love to put together a little tutorial-style page that shows how to let users select the docker images or the resources available to them. Do you have an already-working configuration that I can use to start writing narrative around?

choldgraf on 21 Dec 2018

❤1

@choldgraf the one above does the job and is in use for the JH i have deployed. But that is not all whats required for a GPU, but it allows the user to customize the environment they end up with. I wrote down some stuff I think may be relevant to have in mind.

Relevant knowledge:

Now with recent PR #1039 by @vilhelmen, we are automatically pulling the images utilized in the profileList.
Using a singleuser.startTimeout longer than 5 minutes is important if by choosing some profile may trigger an cluster autoscaling scale-up event that you need to wait for, as that can take a while: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/9ad9dd3b1c426a7401e632af5921a357bbf5d05d/jupyterhub/values.yaml#L220
I know sometimes one would like certain service account to be mounted for certain users, this could allow them to do operations in the kubernetes clusters that they normally cant. I think pangeo does this, but perhaps not using a profileList.
I can imagine the need to let the user manually enter something to populate a kubespawner field to be overridden, this is currently not supported afaik, an example use case would be to allow the user to choose an image themselves or a user to decide exactly how much CPU is to be requested.
Perhaps as administrator you ends up wanting some users to be able to choose something, and others not, then you would perhaps do something in the c.Spawner.pre_spawn_hook. I have no working configuration of this, but I expect it to go quite smooth if implementing it with the example provided. One could read the username and based on that configure fields in c.KubeSpawner.profile_list based on this.

consideRatio on 21 Dec 2018

TBH I was strongly considering just dropping a text box in the container select page on our instructor dev cluster for container selection.

You could get fancy and override the container select page to put in resource values, but I don't have any examples or the bandwidth to make one right now. We need all our systems deployed yesterday and I just fixed an XFS deadlock bug hosing our container build system.

I had this for our GPU configuration blob:

   -  display_name: 'Tensorflow-CPU'
      default: False
      kubespawner_override:
        image_spec: 'giant path to internal server'
        cpu_limit: 2
        mem_limit: '2G'
   -  display_name: 'Tensorflow-GPU'
      default: False
      kubespawner_override:
        image_spec: 'giant path to internal server'                                                                                                    
        extra_resource_guarantees:
          nvidia.com/gpu: 1
        extra_resource_limits: 
          nvidia.com/gpu: 1

I'm fuzzy on the details now, but the system complained until I had both extra_resource_limits and extra_resource_guarantees set since GPU devices are not a shared resource.

vilhelmen on 21 Dec 2018

@vilhelmen oh i recall having some issue like that using both guarantees and limits, but it has worked later at one point I remember.

PS: kubespawners image_spec is deprecated in favor for image with the latest kubespawner 0.10 if i recall correctly.

consideRatio on 21 Dec 2018

@consideRatio Ah, right, image_spec, I think the switch to KubeSpawner 0.10 just happened this week. I have a todo in our internal mirror to migrate that. I _just_ got our internal ci system building all the docker-stacks containers, GPU/CPU TF images, and overlaying our secret sauce layers. I have to untangle my other patches on top of z2jh first so we don't end up in some weird state. Thanks for the reminder!

vilhelmen on 21 Dec 2018

I just installed zero-to-jupyterhub 0.7.0 and wanted to use the profileList to give our uses the option to select an image but I always get redirected immediately to the default image with no way of selecting an image. I had a simpler setup enabled before, then updated the config and ran helm upgrade ... - is that not good enough and do I have to delete/reinstall? I tried both image and image_spec inside the kubespanwer_overrides but neither worked. Any help would be much appreciated!

danyx23 on 2 Jan 2019

@danyx23 Could you post your configuration file? The configuration format I show hasn't been released yet, so 0.7.0 won't have access to it and you need to configure it through the hub's extra config block.

vilhelmen on 2 Jan 2019

👍1

@vilhelmen ah, that explains it! I was trying to add a profileList option directly below singleuser. I'll add it via the extra config block for now then. Thanks for the help!

danyx23 on 2 Jan 2019

@danyx23 you can use the latest release listed here as well, it should be fine I think.

At the time of writing, the latest release is chart version: 0.8-68b9a91

consideRatio on 3 Jan 2019

👍2

Closed by @choldgraf's work in #1098 !

consideRatio on 15 Jan 2019

Hey! Could we get a fresh release because of this? I am using a78b3d6 and profilelist is still happily ignored.