Zero-to-jupyterhub-k8s: Using the pre-puller on a scalable kubernetes cluster

Created on 11 Dec 2017  路  9Comments  路  Source: jupyterhub/zero-to-jupyterhub-k8s

I've been struggling to get the pre-puller to work. This is probably down to RBAC issues that I could get to the bottom of if I tried. However when looking through how the pre-puller works I noticed that it calls a series of tasks using helm hooks.

I was wondering if anyone has any thoughts on how well this works on a kubernetes cluster which can autoscale?

I imagine the scenario where I deploy jupyterhub onto a cluster with two nodes, the image gets pre-pulled and jupyterhub gets up and running. Then some users log onto the cluster and that causes an extra two nodes to be added to the cluster by the cluster autoscaler. Do those two new nodes get the image pre-pulled also? I have a feeling the answer is no.

As a workaround for both this issue and also my general pre-pulling issues I've been using a DaemonSet which runs a pre-pull container as an initContainer and then the Google pause container (original article). This results in the image being pulled on all nodes in the cluster, even if they are added at a later time. Using the initContainer and pause also avoids the container constantly repulling the image and restarting.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: prepull
spec:
  selector:
    matchLabels:
      name: prepull
  template:
    metadata:
      labels:
        name: prepull
    spec:
      initContainers:
      - name: prepull
        image: docker
        command: ["docker", "pull", "{{ .Values.singleuser.image.name }}:{{ .Values.singleuser.image.tag }}"]
        volumeMounts:
        - name: docker
          mountPath: /var/run/docker.sock
      volumes:
      - name: docker
        hostPath:
          path: /var/run/docker.sock
      containers:
      - name: pause
        image: gcr.io/google_containers/pause

I can see some downsides, for instance a notebook may be created on the node before the pre-puller has finished causing it to be pulled twice. But over all this feels neater and more robust than the current method. I would be interested to hear from others, particularly @yuvipanda.

All 9 comments

heya!

I totally agree this method is better. The only problem, as you said, is notebooks getting scheduled onto this before image has been pulled. But that seems better than the alternative :)

My hope is that we write a simple controller that can do the following:

  1. Do docker image pulls when required
  2. Use a taint or a label on the node to indicate when pulling has been completed
  3. Have the hub use a node selector to only target nodes with the label / taint

This should ensure that notebooks don't get scheduled before the image is pulled. If this is implemented as a CRD, that'd also allow us to wait in helm install / upgrade for pulling to happen (which makes deploy checks easy - I want helm install to fail if the image can't be found, for example).

It would be awesome if you could help us with the CRD based controller, but if not using this as a daemonset is very appropriate. We might want to document this in z2jh too.

I totally agree about the helm install failing if the image can't be found.

Some more thoughts:

  • Some images can take a long time to pull, ours takes 5-10 minutes for example. This is a long time for the helm install to appear to hang for.
  • If a new user logs in and there is not enough resource for their notebook the cluster is scaled up. We could keep them waiting until the prepuller has pulled the image, or we could just let their pod create and pull the image itself. Either way they will just see the "Your notebook is starting" message for a while. Therefore what benefit do the taints add?
  • Is a CRD not overkill for this?

Another possibility could be to run an internal registry- could this be setup by zero-to-jupyterhub? Presumably pulling internally should be a lot faster than from an external registry, and this might be more useful on large shared Kubernetes clusters.

@manics I imagine that would improve the startup time of a node but doesn't necessarily solve the issues above.

Assuming everyone doesn't already know everything about CRDs, I'm posting about this talk from KubeCon which describes them.

https://www.youtube.com/watch?v=yn04ERW0SbI

I'm doing a lot of autoscaler friendly work for v0.6, and I think setting up pre-puller to work like this is important!

Here are the requirements I want for 0.6:

  1. helm install should block until image is all on nodes by default (you should be able to turn this off if you do not want!)
  2. we should have daemonsets that make sure images are being pulled on new nodes as they come up. By default it is ok if pods come up on new node before image is pulled.
  3. Get rid of dependency on https://github.com/yuvipanda/kube-image-puller

I'm going to play with a few options and see where I land!

Thanks for starting this, @jacobtomlinson!

Another requirement is that the images used for the prepuller themselves must be small! No point bringing in a large image to pull a smaller (or only a bit larger) image.

@jacobtomlinson not entirely done yet, but take a look at https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/399 :)

Closed via #399 and #418!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

consideRatio picture consideRatio  路  4Comments

tylere picture tylere  路  4Comments

aurashn picture aurashn  路  4Comments

jgerardsimcock picture jgerardsimcock  路  4Comments

consideRatio picture consideRatio  路  3Comments