I'm unable to leverage taints on an instance group dedicated for core pods.
Following is the configuration for jhub-core instance group created for core pods.
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: null
name: jhub-core
spec:
image: kope.io/k8s-1.12-debian-stretch-amd64-hvm-ebs-2019-09-26
machineType: t2.medium
maxSize: 1
minSize: 1
nodeLabels:
hub.jupyter.org/node-purpose: core
kops.k8s.io/instancegroup: jhub-core
role: Node
subnets:
- us-west-2a
taints:
- hub.jupyter.org/dedicated=core:NoSchedule
The core pods don't seem to have the relevant tolerances applied when deployed via helm. Following is the config.yaml used by helm
proxy:
secretToken: "-------------"
pdb:
enabled: false
singleuser:
defaultUrl: "/lab"
hub:
pdb:
enabled: false
extraConfig:
jupyterlab: |
c.Spawner.cmd = ['jupyter-labhub']
image:
name: jupyter/all-spark-notebook
tag: latest
scheduling:
corePods:
nodeAffinity:
matchNodePurpose: require
userPods:
nodeAffinity:
matchNodePurpose: require
userScheduler:
pdb:
enabled: false
Jhub Version = 0.9-445a953
helm upgrade -i jhub jupyterhub/jupyterhub --namespace jhub --version=0.9-445a953 -f .\aws-manifests\jhub\helm\config.yaml
For example the hub pod appears to have tolerations
"tolerations": [
{
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
},
{
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 300
}
],
How can I add hub.jupyter.org/dedicated=core:NoSchedule to all the core pods in order to prevent any other pods getting scheduled on this jhub-core instance group?
Thank you!
I eventually ended up customizing the helm chart and implemented the needed by taking folliwing steps.
1. Create Instance Groups
Create two instance groups jhub-core and jhub-user with following taints.
# jhub-core
...
spec:
...
taints:
- hub.jupyter.org/dedicated=core:NoSchedule
```yaml
...
spec:
...
taints:
**2. Create Helm chart configuration**
```yaml
# config.yaml
proxy:
# Generate random hex to use as security token `openssl rand -hex 32`
secretToken: "xxxxxxxxx"
singleuser:
defaultUrl: "/lab"
hub:
extraConfig:
jupyterlab: |
c.Spawner.cmd = ['jupyter-labhub']
image:
name: jupyter/all-spark-notebook
tag: latest
scheduling:
corePods:
nodeAffinity:
matchNodePurpose: require
userPods:
nodeAffinity:
matchNodePurpose: require
3. Fetch the chart
Whichever version you prefer
helm fetch jupyterhub/jupyterhub --version=0.9-445a953 --untar
4. Customize the chart
For example, edit ./templates/hub/deployment.yaml and add the following toleration to it.
apiVersion: apps/v1
kind: Deployment
metadata:
name: hub
spec:
...
template:
spec:
...
tolerations:
- key: hub.jupyter.org/dedicated
operator: Equal
value: core
effect: NoSchedule
- key: hub.jupyter.org_dedicated
operator: Equal
value: core
effect: NoSchedule
Similarly, add tolerations to other core deployment templates, ./proxy/deployment.yaml and ./sheduling/user-scheduler/deployment.yaml.
5. Package the chart
Edit Chart.yaml and tag the chart with a new version.
...
version: 0.9-445a953-1
Package the chart
helm package .\jupyterhub\
This will package the customized chart and create a file jupyterhub-0.9-445a953-1.tgz
6. Deploy the chart
Use helm to deploy jhub
helm install -n jhub ./jupyterhub-0.9-445a953-1.tgz --namespace jhub -f ./config.yaml
7. Verify the customization
Get nodes for jhub-core instance group
kubectl get nodes --show-labels | grep core
ip-172-20-311-23.us-west-2.compute.internal Ready node 15h v1.12.10 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=t2.medium,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,hub.jupyter.org/node-purpose=core,kops.k8s.io/instancegroup=jhub-core,kubernetes.io/hostname=ip-172-20-311-23.us-west-2.compute.internal,kubernetes.io/role=node,node-role.kubernetes.io/node=
We have ip-172-20-311-23.us-west-2.compute.internal the name for the node assigned to jhub-core instance group.
Get pods and their nodes.
k get po -n jhub -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
continuous-image-puller-mkrxx 1/1 Running 0 4m44s 100.96.15.21 ip-172-20-492-121.us-west-2.compute.internal <none>
hub-6c4f7bc9f7-8nc22 1/1 Running 0 4m43s 100.96.16.61 ip-172-20-311-23.us-west-2.compute.internal <none>
proxy-676fbc695d-gmzf7 1/1 Running 0 4m43s 100.96.16.62 ip-172-20-311-23.us-west-2.compute.internal <none>
user-scheduler-7c5b5bd699-2f8qx 1/1 Running 0 4m43s 100.96.16.63 ip-172-20-311-23.us-west-2.compute.internal <none>
user-scheduler-7c5b5bd699-vv9cn 1/1 Running 0 4m43s 100.96.16.64 ip-172-20-311-23.us-west-2.compute.internal <none>
All the core pods are now assigned to ip-172-20-311-23.us-west-2.compute.internal node.
Describe the configuration for a pod.
k describe po hub-6c4f7bc9f7-8nc22 -n jhub
Namespace: jhub
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=6c4f7bc9f7
release=jhub
...
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=core:NoSchedule
hub.jupyter.org_dedicated=core:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Note that the tolerations have been applied.
They should have been automatically added, so why wereny they? I will try verify it works in my deployment.
This still seems to be unsupported; I'm happy to open a PR on it unless there's something I'm missing making this more difficult than it appears to be?
Thanks again @jaskiratr for your thorough description of the issue and what you did to resolve it.
@phoban01, a PR doing step 4 as described https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1486#issuecomment-557570792 makes sense! Currently the following configuration is not working as it was intended, tolerations should be added.
An update should be made in the following pod definitions.
Most helpful comment
Thanks again @jaskiratr for your thorough description of the issue and what you did to resolve it.
@phoban01, a PR doing step 4 as described https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1486#issuecomment-557570792 makes sense! Currently the following configuration is not working as it was intended, tolerations should be added.
An update should be made in the following pod definitions.