Zero-to-jupyterhub-k8s: Core pods (hub/proxy/autohttps/user-scheduler) doesn't get the hub.jupyter.org/dedicated=core:NoSchedule toleration

Created on 21 Nov 2019 · 4Comments · Source: jupyterhub/zero-to-jupyterhub-k8s

I'm unable to leverage taints on an instance group dedicated for core pods.
Following is the configuration for jhub-core instance group created for core pods.

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: null
  name: jhub-core
spec:
  image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-09-26
  machineType: t2.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    hub.jupyter.org/node-purpose: core
    kops.k8s.io/instancegroup: jhub-core
  role: Node
  subnets:
  - us-west-2a
  taints:
  - hub.jupyter.org/dedicated=core:NoSchedule

The core pods don't seem to have the relevant tolerances applied when deployed via helm. Following is the config.yaml used by helm

proxy:
  secretToken: "-------------"
  pdb:
    enabled: false
singleuser:
  defaultUrl: "/lab"
  hub:
    pdb:
      enabled: false
    extraConfig:
      jupyterlab: |
        c.Spawner.cmd = ['jupyter-labhub']
  image:
    name: jupyter/all-spark-notebook
    tag: latest

scheduling:
  corePods:
    nodeAffinity:
      matchNodePurpose: require
  userPods:
    nodeAffinity:
      matchNodePurpose: require
  userScheduler:
    pdb:
      enabled: false

Jhub Version = 0.9-445a953
helm upgrade -i jhub jupyterhub/jupyterhub --namespace jhub --version=0.9-445a953 -f .\aws-manifests\jhub\helm\config.yaml
For example the hub pod appears to have tolerations

"tolerations": [
      {
        "key": "node.kubernetes.io/not-ready",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      },
      {
        "key": "node.kubernetes.io/unreachable",
        "operator": "Exists",
        "effect": "NoExecute",
        "tolerationSeconds": 300
      }
    ],

How can I add hub.jupyter.org/dedicated=core:NoSchedule to all the core pods in order to prevent any other pods getting scheduled on this jhub-core instance group?

Thank you!

bug help wanted

Source

jaskiratr

👍2

Most helpful comment

Thanks again @jaskiratr for your thorough description of the issue and what you did to resolve it.

@phoban01, a PR doing step 4 as described https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1486#issuecomment-557570792 makes sense! Currently the following configuration is not working as it was intended, tolerations should be added.

An update should be made in the following pod definitions.

jupyterhub/templates/hub/deployment.yaml
jupyterhub/templates/proxy/deployment.yaml
jupyterhub/templates/proxy/autohttps/deployment.yaml
jupyterhub/templates/scheduling/user-scheduler/deployment.yaml
jupyterhub/templates/image-puller/job.yaml

consideRatio on 24 Jun 2020

❤1 👍1

All 4 comments

I eventually ended up customizing the helm chart and implemented the needed by taking folliwing steps.

1. Create Instance Groups
Create two instance groups jhub-core and jhub-user with following taints.

# jhub-core
...
spec:
   ...
  taints:
  - hub.jupyter.org/dedicated=core:NoSchedule

```yaml

jhub-user

...
spec:
...
taints:

hub.jupyter.org/dedicated=user:NoSchedule


**2. Create Helm chart configuration**
```yaml
# config.yaml
proxy:
  # Generate random hex to use as security token `openssl rand -hex 32`
  secretToken: "xxxxxxxxx"
singleuser:
  defaultUrl: "/lab"
  hub:
    extraConfig:
      jupyterlab: |
        c.Spawner.cmd = ['jupyter-labhub']
  image:
    name: jupyter/all-spark-notebook
    tag: latest
scheduling:
  corePods:
    nodeAffinity:
      matchNodePurpose: require
  userPods:
    nodeAffinity:
      matchNodePurpose: require

3. Fetch the chart
Whichever version you prefer
helm fetch jupyterhub/jupyterhub --version=0.9-445a953 --untar

4. Customize the chart
For example, edit ./templates/hub/deployment.yaml and add the following toleration to it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hub
spec:
  ...
  template:
    spec: 
      ...
      tolerations:
        - key: hub.jupyter.org/dedicated
          operator: Equal
          value: core
          effect: NoSchedule
        - key: hub.jupyter.org_dedicated
          operator: Equal
          value: core
          effect: NoSchedule

Similarly, add tolerations to other core deployment templates, ./proxy/deployment.yaml and ./sheduling/user-scheduler/deployment.yaml.

5. Package the chart

Edit Chart.yaml and tag the chart with a new version.

...
version: 0.9-445a953-1

Package the chart

helm package .\jupyterhub\

This will package the customized chart and create a file jupyterhub-0.9-445a953-1.tgz

6. Deploy the chart

Use helm to deploy jhub

helm install -n jhub ./jupyterhub-0.9-445a953-1.tgz --namespace jhub -f ./config.yaml

7. Verify the customization

Get nodes for jhub-core instance group

kubectl get nodes --show-labels | grep core

ip-172-20-311-23.us-west-2.compute.internal    Ready    node     15h   v1.12.10   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t2.medium,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,hub.jupyter.org/node-purpose=core,kops.k8s.io/instancegroup=jhub-core,kubernetes.io/hostname=ip-172-20-311-23.us-west-2.compute.internal,kubernetes.io/role=node,node-role.kubernetes.io/node=

We have ip-172-20-311-23.us-west-2.compute.internal the name for the node assigned to jhub-core instance group.

Get pods and their nodes.

k get po -n jhub -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP             NODE                                          NOMINATED NODE
continuous-image-puller-mkrxx     1/1     Running   0          4m44s   100.96.15.21   ip-172-20-492-121.us-west-2.compute.internal   <none>
hub-6c4f7bc9f7-8nc22              1/1     Running   0          4m43s   100.96.16.61   ip-172-20-311-23.us-west-2.compute.internal    <none>
proxy-676fbc695d-gmzf7            1/1     Running   0          4m43s   100.96.16.62   ip-172-20-311-23.us-west-2.compute.internal    <none>
user-scheduler-7c5b5bd699-2f8qx   1/1     Running   0          4m43s   100.96.16.63   ip-172-20-311-23.us-west-2.compute.internal    <none>
user-scheduler-7c5b5bd699-vv9cn   1/1     Running   0          4m43s   100.96.16.64   ip-172-20-311-23.us-west-2.compute.internal    <none>

All the core pods are now assigned to ip-172-20-311-23.us-west-2.compute.internal node.

Describe the configuration for a pod.

k describe po hub-6c4f7bc9f7-8nc22 -n jhub

Namespace:          jhub
Labels:             app=jupyterhub
                    component=hub
                    hub.jupyter.org/network-access-proxy-api=true
                    hub.jupyter.org/network-access-proxy-http=true
                    hub.jupyter.org/network-access-singleuser=true
                    pod-template-hash=6c4f7bc9f7
                    release=jhub

...
Node-Selectors:  <none>
Tolerations:     hub.jupyter.org/dedicated=core:NoSchedule
                 hub.jupyter.org_dedicated=core:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

Note that the tolerations have been applied.

jaskiratr on 22 Nov 2019

They should have been automatically added, so why wereny they? I will try verify it works in my deployment.

consideRatio on 22 Nov 2019

👍1

This still seems to be unsupported; I'm happy to open a PR on it unless there's something I'm missing making this more difficult than it appears to be?