Zero-to-jupyterhub-k8s: JupyterHub CrashLoopBackOff

Created on 10 Feb 2018 · 15Comments · Source: jupyterhub/zero-to-jupyterhub-k8s

I am trying to spin up JupyterHub using helm and all resources start successfully, but after a short time hub pod enters CrashLoopBackOff.

Installation was performed using the following command:

helm install jupyterhub/jupyterhub --version=v0.6 --name=jupyterhub --namespace jupyterhub -f ./jupyterhub/config.yaml --timeout=1000000

I've also tested version 0.5 and got the same results.

Logs:

$ kubectl logs po/hub-56d985bfb8-vb6pl --namespace jupyterhub
[I 2018-02-10 08:21:27.439 JupyterHub app:830] Loading cookie_secret from env[JPY_COOKIE_SECRET]
[W 2018-02-10 08:21:27.673 JupyterHub app:955] No admin users, admin interface will be unavailable.
[W 2018-02-10 08:21:27.673 JupyterHub app:956] Add any administrative users to `c.Authenticator.admin_users` in config.
[I 2018-02-10 08:21:27.673 JupyterHub app:983] Not using whitelist. Any authenticated user will be allowed.
[I 2018-02-10 08:21:28.025 JupyterHub app:1528] Hub API listening on http://0.0.0.0:8081/hub/
[I 2018-02-10 08:21:28.026 JupyterHub app:1538] Not starting proxy
[I 2018-02-10 08:21:28.026 JupyterHub app:1544] Starting managed service cull-idle
[I 2018-02-10 08:21:28.026 JupyterHub service:266] Starting service 'cull-idle': ['/usr/local/bin/cull_idle_servers.py', '--timeout=3600', '--cull-every=600', '--url=http://127.0.0.1:8081/hub/api']
[I 2018-02-10 08:21:28.053 JupyterHub service:109] Spawning /usr/local/bin/cull_idle_servers.py --timeout=3600 --cull-every=600 --url=http://127.0.0.1:8081/hub/api
[I 2018-02-10 08:21:28.263 JupyterHub log:122] 200 GET /hub/api/users ([email protected]) 25.95ms
[E 2018-02-10 08:21:48.064 JupyterHub app:1623]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1621, in launch_instance_async
        yield self.start()
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/app.py", line 1569, in start
        yield self.proxy.check_routes(self.users, self._service_map)
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 294, in check_routes
        routes = yield self.get_all_routes()
      File "/usr/local/lib/python3.5/dist-packages/jupyterhub/proxy.py", line 589, in get_all_routes
        resp = yield self.api_request('', client=client)
    tornado.curl_httpclient.CurlError: HTTP 599: Connection timed out after 20000 milliseconds

Namespace status:

kubectl get all --namespace dljupyterhub   

NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/hub     1         1         1            0           22m
deploy/proxy   1         1         1            1           22m

NAME                  DESIRED   CURRENT   READY     AGE
rs/hub-5479595c8d     1         1         0         22m
rs/proxy-6fbf784dbd   1         1         1         22m

NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/hub     1         1         1            0           22m
deploy/proxy   1         1         1            1           22m

NAME                  DESIRED   CURRENT   READY     AGE
rs/hub-5479595c8d     1         1         0         22m
rs/proxy-6fbf784dbd   1         1         1         22m

NAME                        READY     STATUS             RESTARTS   AGE
po/hub-5479595c8d-7qhzb     0/1       CrashLoopBackOff   7          22m
po/proxy-6fbf784dbd-pt5q6   2/2       Running            0          22m

NAME                               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
svc/glusterfs-dynamic-hub-db-dir   ClusterIP      10.104.97.71    <none>        1/TCP                        15m
svc/hub                            ClusterIP      10.106.231.99   <none>        8081/TCP                     22m
svc/proxy-api                      ClusterIP      10.104.175.74   <none>        8001/TCP                     22m
svc/proxy-http                     ClusterIP      10.106.46.200   <none>        8000/TCP                     22m
svc/proxy-public                   LoadBalancer   10.109.72.153   <pending>     80:32500/TCP,443:31790/TCP   22m

Contents of `config.yaml`

hub:
  cookieSecret: "aaa"
proxy:
  secretToken: "bbb"
singleuser:
  storage:
    capacity: 2Gi
    dynamic:
      storageClass: gluster-heketi
ingress:
    enabled: true
    hosts:
     - host1

configuration question

Source

kdubovikov

👍1

All 15 comments

Heya @kdubovikov! Thanks for filing this issue!

It looks like the hub pod can not reach the proxy pod. Is Pod networking and kube-proxy working properly? I suspect this is an OpenStack installation / bare-metal setup. Are other services on the cluster working fine? Does https://scanner.heptio.com/ find any issues?

yuvipanda on 13 Feb 2018

Hey @yuvipanda , thanks for the response. All other services are working fine (we also run glusterfs). I've ran the tests and no issues have been found:

Ran 125 of 710 Specs in 3156.548 seconds
SUCCESS! -- 125 Passed | 0 Failed | 0 Pending | 585 Skipped PASS

Also, I am able to run Jupyter Hub with KubeSpawner outside of the cluster without issues.

kdubovikov on 14 Feb 2018

Hmm, in that case I'm at a loss about what is going on :(

yuvipanda on 22 Feb 2018

Ping @minrk. Any thoughts?

Are you still seeing this issue @kdubovikov?

willingc on 27 Feb 2018

It does seem like a networking problem, but I'm not sure what the best way to debug it would be. You could edit the Hub command to run a while true; do sleep 10; done and then kubectl exec hub-pod bash and see if you can communicate with the proxy via curl/etc.

You could also try communicating with the proxy from another context (e.g. outside the cluster, another pod, etc.) to be sure that the proxy pod is accepting connections.

Do you have any NetworkPolicy config on the cluster?

minrk on 28 Feb 2018

@minrk, I think no NetworkPolicy is present. The cluster was set up using kubeadm. Cloud you clarify on where do I need to change the Hub command?

kdubovikov on 1 Mar 2018

You can edit the jupyterhub command with:

kubectl edit deployment hub

and change the command that looks like:

      - command:
        - jupyterhub
        - --config
        - /srv/jupyterhub_config.py
        - --upgrade-db

      - command:
        - sh
        - -c
        - while true; do sleep 10; done

This will create a new hub pod with the new command, which you can kubectl exec -it into.

minrk on 2 Mar 2018

did you got this solution? i alse have this problem.who can help me? thanks for all of you.

Name: hub-86d676cf88-jw8ws
Namespace: jupyterhubtest
Node: 192.168.0.5/192.168.0.5
Start Time: Tue, 10 Apr 2018 11:34:12 +0800
Labels: app=jupyterhub
component=hub
heritage=Tiller
name=hub
pod-template-hash=4282327944
release=jupyterhubfork8s
Status: Running
IP: 172.18.0.26
Controllers: ReplicaSet/hub-86d676cf88
Containers:
hub-container:
Container ID: docker://550c1ae33d73c965a87a50bd87f2b87fcafa498f3b4a7e59b807828ef15cea63
Image: jupyterhub/k8s-hub:4b122ad
Image ID: docker-pullable://jupyterhub/k8s-hub@sha256:b1fb9dd9eec9a9aab583addd8f03fd035494681ac224cfaa55126de442eeecd3
Port: 8081/TCP
Command:
jupyterhub
--config
/srv/jupyterhub_config.py
Requests:
cpu: 200m
memory: 512Mi
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 10 Apr 2018 12:06:03 +0800
Finished: Tue, 10 Apr 2018 12:06:04 +0800
Ready: False
Restart Count: 11
Volume Mounts:
/etc/jupyterhub/config/ from config (rw)
/etc/jupyterhub/secret/ from secret (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-lwmrd (ro)
Environment Variables:
SINGLEUSER_IMAGE: jupyterhub/k8s-singleuser-sample:5d060de
JPY_COOKIE_SECRET:
POD_NAMESPACE: jupyterhubtest (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub-config
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub-secret
hub-token-lwmrd:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-lwmrd
QoS Class: Burstable
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
32m 32m 1 {default-scheduler } Normal Scheduled Successfully assigned hub-86d676cf88-jw8ws to 192.168.0.5
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "config"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "secret"
32m 32m 1 {kubelet 192.168.0.5} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "hub-token-lwmrd"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulling pulling image "jupyterhub/k8s-hub:4b122ad"
32m 32m 1 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Successfully pulled image "jupyterhub/k8s-hub:4b122ad"
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Created Created container
32m 31m 4 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Started Started container
32m 31m 3 {kubelet 192.168.0.5} spec.containers{hub-container} Normal Pulled Container image "jupyterhub/k8s-hub:4b122ad" already present on machine
32m 17m 67 {kubelet 192.168.0.5} spec.containers{hub-container} Warning BackOff Back-off restarting failed container
32m 2m 135 {kubelet 192.168.0.5} Warning FailedSync Error syncing pod

yuandongfang on 10 Apr 2018

I'm seeing this on GKE. We were running v0.6 and tried to upgrade to the latest chart. After some helm failures I reverted to v0.6 but ran into this. I've tried deleting the pods and deployments. I'll do some debugging.

ryanlovett on 9 Jul 2018

There's no curl or wget in the pod. With python3+requests I can confirm the tornado.curl_httpclient.CurlError error that the proxy-api endpoint times out. proxy-public and proxy-http are responsive.

The cluster has:

addonsConfig:
  networkPolicyConfig:
    disabled: true

ryanlovett on 9 Jul 2018

The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.

ryanlovett on 9 Jul 2018

👍1

I'm ran into this again on Azure after a helm upgrade. Unlike last time, I couldn't access any of the service endpoints. I have a feeling this last occasion is due to the infrastructure and not z2jh, but I just thought I'd leave a trail marker.

ryanlovett on 3 Aug 2018

👍1

Hmmm, @ryanlovett wrote:

The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working.

Does this mean that our proxy pod did not trigger restart as it should, or that it persisted some faulty state that needed to be refreshed? Ideas on what state that were outdated?

@ryanlovett we have now released 0.7.0, any feedback on your update to that would be very relevant. If you do, just make sure to follow upgrade instructions in the changelog.md file.

consideRatio on 4 Sep 2018

any thoughs on this?

diegodorgam on 25 Jun 2019

I found that these errors happen when the hub and proxy gets an update at the same time. The hub is going to crash if it fails to communicate with the proxy, but realizing the failure happens 20 seconds later and by this time, the hub can be apparently functional. When we bump the JupyterHub version the next time, we will get to use https://github.com/jupyterhub/jupyterhub/pull/2750, it will make the hub pod look stay unavailable until it actually will function reliable.

Perhaps we bump it along with #1422, or earlier.

consideRatio on 30 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

PR Discussion - Add usage documentation to values.yaml

consideRatio · 3Comments

In docs, reference.txt links to non-existing page (user-experience.html)

tylere · 4Comments

Adding a CI step to test the upgrade path

betatim · 4Comments

Preserving user space storage on deployment deletion

jgerardsimcock · 4Comments

warning: cannot overwrite table with non table for extraConfig (map[])

jgerardsimcock · 4Comments

Zero-to-jupyterhub-k8s: JupyterHub CrashLoopBackOff

Contents of config.yaml

All 15 comments

Related issues

Contents of `config.yaml`