I've created a new cluster (which validates) with kops on AWS, but when I install the z2js helm chart:
helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version 0.7.0-beta.2 --values config.yaml
(with only secretToken in the config.yaml) the hub pod never creates:
kubectl -n jhub get pods
NAME READY STATUS RESTARTS AGE
hub-5ff7fcb7bf-sgz27 0/1 ContainerCreating 0 8m
proxy-7b4fd468c9-27h92 1/1 Running 0 8m
When I describe the pod, I see some FailedAttachVolume warnings:
kubectl -n jhub describe pod hub-5ff7fcb7bf-sgz27
Name: hub-5ff7fcb7bf-sgz27
Namespace: jhub
Node: ip-172-20-152-79.ec2.internal/172.20.152.79
Start Time: Thu, 23 Aug 2018 16:26:35 +0000
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=1993976369
release=jhub
Annotations: checksum/config-map=55a5924b375f6b733949f0d8f7290957e3097fe9b364c6425b7022ad3c79722e
checksum/secret=3430b5b3781de0f84b057a70745a15c4f8d6b53e2032ab73ee1970693c9a436d
prometheus.io/path=/hub/metrics
prometheus.io/scrape=true
Status: Pending
IP:
Controlled By: ReplicaSet/hub-5ff7fcb7bf
Containers:
hub:
Container ID:
Image: jupyterhub/k8s-hub:0.7.0-beta.2
Image ID:
Port: 8081/TCP
Host Port: 0/TCP
Command:
jupyterhub
--config
/srv/jupyterhub_config.py
--upgrade-db
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 200m
memory: 512Mi
Environment:
SINGLEUSER_IMAGE: jupyterhub/k8s-singleuser-sample:0.7.0-beta.2
POD_NAMESPACE: jhub (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'proxy.token' in secret 'hub-secret'> Optional: false
Mounts:
/etc/jupyterhub/config/ from config (rw)
/etc/jupyterhub/secret/ from secret (rw)
/srv/jupyterhub from hub-db-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-fpxqd (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub-config
Optional: false
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub-secret
Optional: false
hub-db-dir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: hub-db-dir
ReadOnly: false
hub-token-fpxqd:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-fpxqd
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9m (x2 over 9m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
Normal Scheduled 9m default-scheduler Successfully assigned hub-5ff7fcb7bf-sgz27 to ip-172-20-152-79.ec2.internal
Normal SuccessfulMountVolume 9m kubelet, ip-172-20-152-79.ec2.internal MountVolume.SetUp succeeded for volume "config"
Normal SuccessfulMountVolume 9m kubelet, ip-172-20-152-79.ec2.internal MountVolume.SetUp succeeded for volume "secret"
Normal SuccessfulMountVolume 9m kubelet, ip-172-20-152-79.ec2.internal MountVolume.SetUp succeeded for volume "hub-token-fpxqd"
Warning FailedAttachVolume 9m (x2 over 9m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6" : "Error attaching EBS volume \"vol-02eb1ad199ffd5a95\"" to instance "i-0ed6f7e287bfe728c" since volume is in "creating" state
Normal SuccessfulAttachVolume 9m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6"
Warning FailedMount 43s (x4 over 7m) kubelet, ip-172-20-152-79.ec2.internal Unable to mount volumes for pod "hub-5ff7fcb7bf-sgz27_jhub(4d3621d3-a6f1-11e8-9524-0e4bc75ccd0c)": timeout expired waiting for volumes to attach or mount for pod "jhub"/"hub-5ff7fcb7bf-sgz27". list of unmounted volumes=[hub-db-dir]. list of unattached volumes=[config secret hub-db-dir hub-token-fpxqd]
I'm guessing I missed or screwed up some step in the z2jh guide, but how do I debug this?
Hmmm a quick note from me on mobile:
You can use kubectl to get describe and get --output yaml all three of these: pvc,pv,storageclass - perhaps that gives us more insight
@consideRatio , I do have the PVC named hub-db-dir:
$ kubectl -n jhub get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hub-db-dir Bound pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6 1Gi RWO gp2 1d
$ kubectl -n jhub describe pvc hub-db-dir
Name: hub-db-dir
Namespace: jhub
StorageClass: gp2
Status: Bound
Volume: pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6
Labels: app=jupyterhub
chart=jupyterhub-0.7.0-beta.2
component=hub
heritage=Tiller
release=jhub
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
Events: <none>
and here's the info on the Volume:
$ kubectl -n jhub describe pv pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6
Name: pvc-4d068ae7-a6f1-11e8-8a54-023b5bd89ff6
Labels: failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1d
Annotations: kubernetes.io/createdby=aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller=yes
pv.kubernetes.io/provisioned-by=kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: gp2
Status: Bound
Claim: jhub/hub-db-dir
Reclaim Policy: Delete
Access Modes: RWO
Capacity: 1Gi
Node Affinity: <none>
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://us-east-1d/vol-02eb1ad199ffd5a95
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
and the StorageClass:
[ec2-user@ip-172-31-29-161 ~]$ kubectl -n jhub describe StorageClass gp2
Name: gp2
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"},"name":"gp2","namespace":""},"parameters":{"type":"gp2"},"provisioner":"kubernetes.io/aws-ebs"}
,storageclass.beta.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: type=gp2
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
Does that give any clues?
@jacobtomlinson, any ideas here?
@rsignell-usgs I'm not confident about this, I think it is not related to this repository but the cloud provider and/or Kubernetes running on it.
Googling on the errors in the event logs for when you ran "describe", I found some comments about kubernetes version and kops version.
I am too facing same issue after doing kops upgrade, which moved the kublet version to 1.9.6
I've upgraded all my nodes to 1.8.2 and the redis pod started and volume seems to be mounted normally.
My suggested action:
kubelet (initialized by kops I figure), your kops and your kubectl are the same version. I think kubectl can be a higher version and thats fine but don't let it be lower. Thanks for a good summary of your logs etc @rsignell-usgs ! I hope you get past this troublesome issue.
Oh hmm btw, you may also want to try delete the PVC. That should in turn make the cloud provider perform a cleanup of the provisioned PV within a minute or so. Verify that this happened. I don't know if this could help in some way, just an additional way to reset the state.
To accomplish getting the same kubelet version as kops, if kops have an associated kubernetes version at all which I assumed... You may want to recreate instances, or upgrade your instance group, or something like this. Beware: I'm just guessing terminology and available tech without ever used an Amazon cloud.
I downgraded my kubectl client to be the same as the server, so I get:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
and kops is also same major/minor version:
$ kops version
Version 1.10.0 (git-8b52ea6d1)
tried deleting and installing the JH chart:
$ kubectl -n jhub get pods
$ helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version 0.7.0-beta.2 --values config.yaml
and got back:
Release "jhub" does not exist. Installing it now.
NAME: jhub
LAST DEPLOYED: Tue Aug 28 17:42:17 2018
NAMESPACE: jhub
STATUS: DEPLOYED
RESOURCES:
==> v1/ConfigMap
NAME DATA AGE
hub-config 36 1s
==> v1/PersistentVolumeClaim
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hub-db-dir Pending gp2 1s
==> v1/ServiceAccount
NAME SECRETS AGE
hub 1 1s
==> v1beta1/Role
NAME AGE
hub 1s
==> v1beta1/RoleBinding
NAME AGE
hub 1s
==> v1/Secret
NAME TYPE DATA AGE
hub-secret Opaque 1 1s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hub ClusterIP 100.69.215.85 <none> 8081/TCP 1s
proxy-public LoadBalancer 100.69.3.32 <pending> 80:31455/TCP 1s
proxy-api ClusterIP 100.66.150.224 <none> 8001/TCP 0s
==> v1beta2/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
hub 1 1 1 0 0s
proxy 1 0 0 0 0s
==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
hub 1 N/A 0 0s
proxy 1 N/A 0 0s
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
hub-5ff7fcb7bf-hw64r 0/1 Pending 0 0s
proxy-7b4fd468c9-6vhrt 0/1 ContainerCreating 0 0s
NOTES:
Thank you for installing JupyterHub!
but unfortunately I have the same situation again where the hub never leaves the ContainerCreating state:
$ kubectl -n jhub get pods
NAME READY STATUS RESTARTS AGE
hub-5ff7fcb7bf-hw64r 0/1 ContainerCreating 0 6m
proxy-7b4fd468c9-6vhrt 1/1 Running 0 6m
What do you find if you run:
kubectl describe node | grep "Kubelet Version"
$ kubectl describe node | grep "Kubelet Version"
Kubelet Version: v1.10.3
Kubelet Version: v1.10.3
Kubelet Version: v1.10.3
Kubelet Version: v1.10.3
Kubelet Version: v1.10.3
Oh man. It's working now.
I was using m5.2xlarge instances in my cluster, but from this answer on github issues I discovered that m5 instances have their EBS volumes are exposed as NVMe block devices.
So when I switched to m4.2xlarge instances, the problem went away:
kops create cluster kopscluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f \
--authorization RBAC \
--master-size t2.small \
--master-volume-size 10 \
--node-size m4.2xlarge \
--master-count 3 \
--node-count 2 \
--node-volume-size 120 \
--yes
$ kubectl --namespace=jhub get pod
NAME READY STATUS RESTARTS AGE
hub-5ff7fcb7bf-lqfwl 1/1 Running 0 58s
proxy-7b4fd468c9-qk98v 1/1 Running 0 58s
Should something be added to the z2jh guide to prevent this from happening to others, or is this just something that users should know?
ah awesome!!! hmm nah i think this issue may be enough, it is sooooo much that would be needed documenting and kept up to date, or hmmm was it part of the guide to recommend the setting you had?
Or perhaps, I dont know yet what it means to be a m4 vs m5.large, i need to learn more as usual :D
The issue with NVMe is a kops issue rather than a jupyter issue.
The workaround for it is to switch the kops os image to debian stretch where NVMe support has been added.
image: kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-02-08
Thanks for documenting this @jacobtomlinson !
@rsignell-usgs the title of the issue is great, I bet others will find this if they run into the same issue!
Most helpful comment
Oh man. It's working now.
I was using
m5.2xlargeinstances in my cluster, but from this answer on github issues I discovered thatm5instances have their EBS volumes are exposed as NVMe block devices.So when I switched to
m4.2xlargeinstances, the problem went away:Should something be added to the z2jh guide to prevent this from happening to others, or is this just something that users should know?