Add documentation on how to utilize KubeSpawners profile_list traitlet implemented here: https://github.com/jupyterhub/kubespawner/pull/137
See https://gitter.im/jupyterhub/jupyterhub?at=5a59236e6117191e615b6729 for a recent question about this, it is possible to set up JupyterHub to let users select a Docker Image. We should document how to do this.
This would be of high value to me personally. I'd also welcome pointers to dev documentation, examples, source code, etc. that might help me on my way.
I'd be curious as well as following the existing examples via kubeflow didn't quite give me enough of the info necessary to proceed.
Thanks a bunch!
I'd also like to see how to do this :) Happy to be a guinea pig!
With https://github.com/jupyterhub/kubespawner/pull/137 the profile list is finally built-in!
I'm using profile_list, but it keeps selecting the default image in the config.yml file specified under singleuser.image, but when I don't specify singleuser.image, the image puller hangs. Ideas? I'm having difficulty finding much documentation on profile_list with z2jh
I found my issue, by the way. I was looking at the master branch and using image instead of image_spec. image_spec is the latest tagged release way and image is the new way.
A benefit of setting this here instead of in hub.extraConfig under c.KubeSpawner.profile_list, is that we can access the overrides and search for images to pull in the image-pullers.
singleuser:
profileList:
- display_name: "Default: Shared, 8 CPU cores"
description: "By selecting this choice, you will be assigned a environment that will run on a shared machine with CPU only."
default: True
- display_name: "Dedicated, 4 CPU cores & 26GB RAM, 1 NVIDIA Tesla K80 GPU"
description: "By selecting this choice, you will be assigned a environment that will run on a dedicated machine with a single GPU, just for you."
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1"
NOTE: the actual machine types chosen depends on your node pools etc. In this case I imagined a n1-highmem-8 machine on Google Cloud for the default profile, allowing for about 50 users with settings not seen in example above. For the GPU profile I imagine a dedicated n1-highmem-4 machine to spin up whenever someone needs it. It is currently not possible to share GPUs and their memory as one is sharing CPUs and normal memory, but work is progressing towards it.
That YAML looks a lot nicer than the JSON in hub.extraConfig 馃榾
Yepp it does! Thanks @vilhelmen for making it possible!
I think we should have some sort of narrative descriptive documentation for this as well.
@consideRatio What do you mean by _Personal_ (vs. _Shared_) machine in your example above? Do you imply by the _Personal_ term an exclusive use of a K8s host by a specific user or something else? Please clarify ...
@ablekh ah, I revised my example!
@consideRatio Thank you! So, if I understand correctly, by _Dedicated_ you refer to a machine with a GPU card present, but that does not mean that that machine is dedicated / reserved for a particular user. That is, the GPU-containing machine is still technically shared as a part of given K8s cluster (and, thus, could be used as shared for users, selecting other, CPU-only, profiles). Sorry, I'm not trying to nitpick, but rather to fully understand the profiles functionality (using your example). Would appreciate some further clarification. :-)
By including GPU in the example, I raised a lot of questions not relating to profileList :D
Yes, technically it is still shared as part of a given k8s cluster. I updated the example again :p
Okay, so in a setup I imagined, if using GPUs, I would not allow any CPU-only-needing users or pods to fill it up and then make it unavailable for GPU-needing users. Actually, this is done automatically with the extra resource type of nvidia.com/gpu. Nodes that register they have such GPU resources will only allow pods requesting such resources by taints being applied automatic on such nodes and tolerations applied automatically by pods requesting such resources.
So, it such GPU machine, if it only has a single GPU, will not be shared among users, only those requesting nvidia.com/gpu will get to share with it, but if it only has one, then only one user will fit on it, and hence it will be dedicated to the user.
But, during neurips 2018 we ran into a limitation of the amount of nodes and CPUs we had available, so what we did then was to use massive nodes with n1-highmem-16 nodes with 8 NVIDIA Tesla K80 GPUs attached. Then, up to 8 users would share the node, as the users only requested 1 GPU. (You cannot request a fraction of a GPU currently)
@consideRatio Thank you very much for prompt and detailed reply. I'm glad that my understanding was pretty close to reality (I was not aware of K8s' automated GPU bounding based on using nvidia.com/gpu resource type). Clearly, I still have to cover a lot of both K8s and JupyterHub ground. :-) Happy New Year!
:D your welcome @ablekh happy new year!
@consideRatio I'd love to put together a little tutorial-style page that shows how to let users select the docker images or the resources available to them. Do you have an already-working configuration that I can use to start writing narrative around?
@choldgraf the one above does the job and is in use for the JH i have deployed. But that is not all whats required for a GPU, but it allows the user to customize the environment they end up with. I wrote down some stuff I think may be relevant to have in mind.
Relevant knowledge:
singleuser.startTimeout longer than 5 minutes is important if by choosing some profile may trigger an cluster autoscaling scale-up event that you need to wait for, as that can take a while: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/9ad9dd3b1c426a7401e632af5921a357bbf5d05d/jupyterhub/values.yaml#L220c.Spawner.pre_spawn_hook. I have no working configuration of this, but I expect it to go quite smooth if implementing it with the example provided. One could read the username and based on that configure fields in c.KubeSpawner.profile_list based on this.TBH I was strongly considering just dropping a text box in the container select page on our instructor dev cluster for container selection.
You could get fancy and override the container select page to put in resource values, but I don't have any examples or the bandwidth to make one right now. We need all our systems deployed yesterday and I just fixed an XFS deadlock bug hosing our container build system.
I had this for our GPU configuration blob:
- display_name: 'Tensorflow-CPU'
default: False
kubespawner_override:
image_spec: 'giant path to internal server'
cpu_limit: 2
mem_limit: '2G'
- display_name: 'Tensorflow-GPU'
default: False
kubespawner_override:
image_spec: 'giant path to internal server'
extra_resource_guarantees:
nvidia.com/gpu: 1
extra_resource_limits:
nvidia.com/gpu: 1
I'm fuzzy on the details now, but the system complained until I had both extra_resource_limits and extra_resource_guarantees set since GPU devices are not a shared resource.
@vilhelmen oh i recall having some issue like that using both guarantees and limits, but it has worked later at one point I remember.
PS: kubespawners image_spec is deprecated in favor for image with the latest kubespawner 0.10 if i recall correctly.
@consideRatio Ah, right, image_spec, I think the switch to KubeSpawner 0.10 just happened this week. I have a todo in our internal mirror to migrate that. I _just_ got our internal ci system building all the docker-stacks containers, GPU/CPU TF images, and overlaying our secret sauce layers. I have to untangle my other patches on top of z2jh first so we don't end up in some weird state. Thanks for the reminder!
I just installed zero-to-jupyterhub 0.7.0 and wanted to use the profileList to give our uses the option to select an image but I always get redirected immediately to the default image with no way of selecting an image. I had a simpler setup enabled before, then updated the config and ran helm upgrade ... - is that not good enough and do I have to delete/reinstall? I tried both image and image_spec inside the kubespanwer_overrides but neither worked. Any help would be much appreciated!
@danyx23 Could you post your configuration file? The configuration format I show hasn't been released yet, so 0.7.0 won't have access to it and you need to configure it through the hub's extra config block.
@vilhelmen ah, that explains it! I was trying to add a profileList option directly below singleuser. I'll add it via the extra config block for now then. Thanks for the help!
@danyx23 you can use the latest release listed here as well, it should be fine I think.
At the time of writing, the latest release is chart version: 0.8-68b9a91
Closed by @choldgraf's work in #1098 !
Hey! Could we get a fresh release because of this? I am using a78b3d6 and profilelist is still happily ignored.
profileList works for me: https://github.com/openmicroscopy/kubernetes-apps/pull/24/files
I am very sorry - it works, had wrong version deployed :woman_facepalming:
Most helpful comment
With https://github.com/jupyterhub/kubespawner/pull/137 the profile list is finally built-in!