I am trying to spin up JupyterHub using helm and all resources start successfully, but after a short time hub pod enters CrashLoopBackOff.
Installation was performed using the following command:
helm install jupyterhub/jupyterhub --version=v0.6 --name=jupyterhub --namespace jupyterhub -f ./jupyterhub/config.yaml --timeout=1000000
I've also tested version 0.5 and got the same results.
Logs:
$ kubectl logs po/hub-56d985bfb8-vb6pl --namespace jupyterhub
[I 2018-02-10 08:21:27.439 JupyterHub app:830] Loading cookie_secret from env[JPY_COOKIE_SECRET]
[W 2018-02-10 08:21:27.673 JupyterHub app:955] No admin users, admin interface will be unavailable.
[W 2018-02-10 08:21:27.673 JupyterHub app:956] Add any administrative users to `c.Authenticator.admin_users` in config.
[I 2018-02-10 08:21:27.673 JupyterHub app:983] Not using whitelist. Any authenticated user will be allowed.
[I 2018-02-10 08:21:28.025 JupyterHub app:1528] Hub API listening on http://0.0.0.0:8081/hub/
[I 2018-02-10 08:21:28.026 JupyterHub app:1538] Not starting proxy
[I 2018-02-10 08:21:28.026 JupyterHub app:1544] Starting managed service cull-idle
[I 2018-02-10 08:21:28.026 JupyterHub service:266] Starting service 'cull-idle': ['/usr/local/bin/cull_idle_servers.py', '--timeout=3600', '--cull-every=600', '--url=http://127.0.0.1:8081/hub/api']
[I 2018-02-10 08:21:28.053 JupyterHub service:109] Spawning /usr/local/bin/cull_idle_servers.py --timeout=3600 --cull-every=600 --url=http://127.0.0.1:8081/hub/api
[I 2018-02-10 08:21:28.263 JupyterHub log:122] 200 GET /hub/api/users ([email protected]) 25.95ms
[E 2018-02-10 08:21:48.064 JupyterHub app:1623]
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1621, in launch_instance_async
yield self.start()
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1569, in start
yield self.proxy.check_routes(self.users, self._service_map)
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 294, in check_routes
routes = yield self.get_all_routes()
File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 589, in get_all_routes
resp = yield self.api_request('', client=client)
tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds
Namespace status:
kubectl get all --namespace dljupyterhub
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/hub 1 1 1 0 22m
deploy/proxy 1 1 1 1 22m
NAME DESIRED CURRENT READY AGE
rs/hub-5479595c8d 1 1 0 22m
rs/proxy-6fbf784dbd 1 1 1 22m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/hub 1 1 1 0 22m
deploy/proxy 1 1 1 1 22m
NAME DESIRED CURRENT READY AGE
rs/hub-5479595c8d 1 1 0 22m
rs/proxy-6fbf784dbd 1 1 1 22m
NAME READY STATUS RESTARTS AGE
po/hub-5479595c8d-7qhzb 0/1 CrashLoopBackOff 7 22m
po/proxy-6fbf784dbd-pt5q6 2/2 Running 0 22m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/glusterfs-dynamic-hub-db-dir ClusterIP 10.104.97.71 <none> 1/TCP 15m
svc/hub ClusterIP 10.106.231.99 <none> 8081/TCP 22m
svc/proxy-api ClusterIP 10.104.175.74 <none> 8001/TCP 22m
svc/proxy-http ClusterIP 10.106.46.200 <none> 8000/TCP 22m
svc/proxy-public LoadBalancer 10.109.72.153 <pending> 80:32500/TCP,443:31790/TCP 22m
config.yamlhub:
cookieSecret: "aaa"
proxy:
secretToken: "bbb"
singleuser:
storage:
capacity: 2Gi
dynamic:
storageClass: gluster-heketi
ingress:
enabled: true
hosts:
- host1
Heya @kdubovikov! Thanks for filing this issue!
It looks like the hub pod can not reach the proxy pod. Is Pod networking and kube-proxy working properly? I suspect this is an OpenStack installation / bare-metal setup. Are other services on the cluster working fine? Does https://scanner.heptio.com/ find any issues?
Hey @yuvipanda , thanks for the response. All other services are working fine (we also run glusterfs). I've ran the tests and no issues have been found:
Ran 125 of 710 Specs in 3156.548 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 585 Skipped PASS
Also, I am able to run Jupyter Hub with KubeSpawner outside of the cluster without issues.
Hmm, in that case I'm at a loss about what is going on :(
Ping @minrk. Any thoughts?
Are you still seeing this issue @kdubovikov?
It does seem like a networking problem, but I'm not sure what the best way to debug it would be. You could edit the Hub command to run a while true; do sleep 10; done and then kubectl exec hub-pod bash and see if you can communicate with the proxy via curl/etc.
You could also try communicating with the proxy from another context (e.g. outside the cluster, another pod, etc.) to be sure that the proxy pod is accepting connections.
Do you have any NetworkPolicy config on the cluster?
@minrk, I think no NetworkPolicy is present. The cluster was set up using kubeadm. Cloud you clarify on where do I need to change the Hub command?
You can edit the jupyterhub command with:
kubectl edit deployment hub
and change the command that looks like:
- command:
- jupyterhub
- --config
- /srv/jupyterhub_config.py
- --upgrade-db
to
- command:
- sh
- -c
- while true; do sleep 10; done
This will create a new hub pod with the new command, which you can kubectl exec -it into.
did you got this solution? i alse have this problem.who can help me? thanks for all of you.
Name: hub-86d676cf88-jw8ws
Namespace: jupyterhubtest
Node: 192.168.0.5/192.168.0.5
Start Time: Tue, 10 Apr 2018 11:34:12 +0800
Labels: app=jupyterhub
component=hub
heritage=Tiller
name=hub
pod-template-hash=4282327944
release=jupyterhubfork8s
Status: Running
IP: 172.18.0.26
Controllers: ReplicaSet/hub-86d676cf88
Containers:
hub-container:
Container ID: docker://550c1ae33d73c965a87a50bd87f2b87fcafa498f3b4a7e59b807828ef15cea63
Image: jupyterhub/k8s-hub:4b122ad
Image ID: docker-pullable://jupyterhub/k8s-hub@sha256:b1fb9dd9eec9a9aab583addd8f03fd035494681ac224cfaa55126de442eeecd3
Port: 8081/TCP
Command:
jupyterhub
--config
/srv/jupyterhub_config.py
Requests:
cpu: 200m
memory: 512Mi
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 10 Apr 2018 12:06:03 +0800
Finished: Tue, 10 Apr 2018 12:06:04 +0800
Ready: False
Restart Count: 11
Volume Mounts:
/etc/jupyterhub/config/ from config (rw)
/etc/jupyterhub/secret/ from secret (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-lwmrd (ro)
Environment Variables:
SINGLEUSER_IMAGE: jupyterhub/k8s-singleuser-sample:5d060de
JPY_COOKIE_SECRET:
POD_NAMESPACE: jupyterhubtest (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub-config
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub-secret
hub-token-lwmrd:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-lwmrd
QoS Class: Burstable
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
32m 32m 1 {default-scheduler } Normal Scheduled Successfully assigned hub-86d676cf88-jw8ws to 192.168.0.5
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "config"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "secret"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "hub-token-lwmrd"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulling pulling image "jupyterhub/k8s-hub:4b122ad"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Successfully pulled image "jupyterhub/k8s-hub:4b122ad"
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Created Created container
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Started Started container
32m 31m 3 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Container image "jupyterhub/k8s-hub:4b122ad" already present on machine
32m 17m 67 {kubelet 192.168.0.5} spec.containers{hub-container} Warning BackOff Back-off restarting failed container
32m 2m 135 {kubelet 192.168.0.5} Warning FailedSync Error syncing pod
I'm seeing this on GKE. We were running v0.6 and tried to upgrade to the latest chart. After some helm failures I reverted to v0.6 but ran into this. I've tried deleting the pods and deployments. I'll do some debugging.
There's no curl or wget in the pod. With python3+requests I can confirm the tornado.curl_httpclient.CurlError error that the proxy-api endpoint times out. proxy-public and proxy-http are responsive.
The cluster has:
addonsConfig:
networkPolicyConfig:
disabled: true
The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.
I'm ran into this again on Azure after a helm upgrade. Unlike last time, I couldn't access any of the service endpoints. I have a feeling this last occasion is due to the infrastructure and not z2jh, but I just thought I'd leave a trail marker.
Hmmm, @ryanlovett wrote:
The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.
Does this mean that our proxy pod did not trigger restart as it should, or that it persisted some faulty state that needed to be refreshed? Ideas on what state that were outdated?
@ryanlovett we have now released 0.7.0, any feedback on your update to that would be very relevant. If you do, just make sure to follow upgrade instructions in the changelog.md file.
any thoughs on this?
I found that these errors happen when the hub and proxy gets an update at the same time. The hub is going to crash if it fails to communicate with the proxy, but realizing the failure happens 20 seconds later and by this time, the hub can be apparently functional. When we bump the JupyterHub version the next time, we will get to use https://github.com/jupyterhub/jupyterhub/pull/2750, it will make the hub pod look stay unavailable until it actually will function reliable.
Perhaps we bump it along with #1422, or earlier.