Zero-to-jupyterhub-k8s: hub pod's readinessProbe too strict

Created on 8 Dec 2020  路  7Comments  路  Source: jupyterhub/zero-to-jupyterhub-k8s

Bug description

Using a custom authenticator for jupyterhub (on kubernetes) which takes some time todo the authentication - > 10sec. Upgrading from helm chart version 0.9.1 (where everything was working) to 0.10.6.
Now after the login the redirect to /hub/ times out after 1 minute, resulting in HTTP 503. Logs below.
A refresh of the browser (repeating the call to /hub) works then without problems.
It seems to be related to the time the authenticator takes - with 5 sec all is fine, > 10 sec not

Expected behaviour

The call to /hub succedes (does the spawn)

Actual behaviour

HTTP 503

How to reproduce

Use below as custom authenticator reproduces,

def authenticate(self, handler, data):


    print("enter authenticate") 

    time.sleep(10)
    return {
        'name': 'testuser'}

Your personal set up

Jupyterhub (helm chart 0.10.6 on Kubernetes (EKS)). On 0.9.1 works fine.
```

paste relevant logs here, if any

`
proxy

6:42:45.269 [ConfigProxy] info: 200 GET /api/routes
16:43:43.427 [ConfigProxy] info: 200 GET /api/routes
16:43:43.437 [ConfigProxy] info: 200 GET /api/routes
16:43:51.066 [ConfigProxy] debug: PROXY WEB /hub/ to http://hub:8081
16:43:51.092 [ConfigProxy] debug: PROXY WEB /hub/login?next=%2Fhub%2F to http://hub:8081
16:44:01.129 [ConfigProxy] debug: PROXY WEB /hub/ to http://hub:8081
16:44:43.443 [ConfigProxy] info: 200 GET /api/routes
16:45:01.130 [ConfigProxy] error: 503 GET /hub/ socket hang up
16:45:01.437 [ConfigProxy] debug: PROXY WEB /favicon.ico to http://hub:8081
16:45:43.443 [ConfigProxy] info: 200 GET /api/routes
16:46:43.453 [ConfigProxy] info: 200 GET /api/routes

Hub

D 2020-12-08 16:43:50.526 JupyterHub log:181] 200 GET /hub/health (@xxxx) 0.73ms
[I 2020-12-08 16:43:51.070 JupyterHub log:181] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@x) 0.90ms
enter authenticate
[D 2020-12-08 16:44:01.108 JupyterHub base:519] Setting cookie jupyterhub-session-id: {'httponly': True}
[D 2020-12-08 16:44:01.109 JupyterHub base:523] Setting cookie for schaper: jupyterhub-hub-login
[D 2020-12-08 16:44:01.109 JupyterHub base:519] Setting cookie jupyterhub-hub-login: {'httponly': True, 'path': '/hub/'}
[I 2020-12-08 16:44:01.110 JupyterHub base:755] User logged in: testuser
[I 2020-12-08 16:44:01.110 JupyterHub log:181] 302 GET /hub/login?next=%2Fhub%2F -> /hub/ (@x) 10015.43ms
[D 2020-12-08 16:44:01.114 JupyterHub log:181] 200 GET /hub/health (@x) 2.52ms
[D 2020-12-08 16:44:01.116 JupyterHub log:181] 200 GET /hub/health (@x) 3.94ms
[D 2020-12-08 16:44:01.116 JupyterHub log:181] 200 GET /hub/health (@x) 4.06ms
...
[D 2020-12-08 16:44:34.527 JupyterHub log:181] 200 GET /hub/health (@x) 0.66ms
[D 2020-12-08 16:44:36.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.69ms
[D 2020-12-08 16:44:38.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.65ms
[D 2020-12-08 16:44:40.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.64ms
[D 2020-12-08 16:44:42.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.61ms
[D 2020-12-08 16:44:43.439 JupyterHub proxy:788] Proxy: Fetching GET http://proxy-api:8001/api/routes
[I 2020-12-08 16:44:43.444 JupyterHub proxy:320] Checking routes
[D 2020-12-08 16:44:44.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.75ms
[D 2020-12-08 16:44:46.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.81ms
...
[D 2020-12-08 16:45:00.526 JupyterHub log:181] 200 GET /hub/health (@x) 0.82ms
[D 2020-12-08 16:45:01.133 JupyterHub pages:654] No template for 503
[I 2020-12-08 16:45:01.169 JupyterHub log:181] 200 GET /hub/error/503?url=%2Fhub%2F (@x) 35.98ms
[D 2020-12-08 16:45:01.441 JupyterHub log:181] 200 GET /favicon.ico (@x) 0.98ms

bug

Most helpful comment

:) know hat it is. It is actually the helm chart, Readiness probe. This fails while the authenticator is running , 3 * 2 = 6 seconds until failure and update of pod which then causes subsequent issues. Thats why 5 sec work and 10 sec not. So I have a workaround for me at least - change the readiness probe.

All 7 comments

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:

Welcome to the Jupyter community! :tada:

Upgrade with --set hub.networkPolicy.enabled=false or configure it in your passed config. If this still persists, I assume you have an issue with JupyterHub itself rather than this Helm chart deploying JupyterHub. If it goes away, I assume you had some custom logic that required network access to the hub pod directly instead of going through the proxy which was failing.

Btw, did you reproduce this with the dummy authenticator you examplified, or do you believe it would reproduce with that authenticator?

Thanks for the very quick reply ! It鈥檚 not network policy, that鈥檚 the first thing I disabled in the upgrade . And yes , it reproduces with the dummy Authenticator. And changing the sleep to 5 sec makes it work .
When I try to manually connect from the proxy to the hub while a long authentication is ongoing I see the same - hangs and times out after a minute . Shortly before and after all works .

As further info - its as well really the upgrade which caused this behaviour and nothing else which may have by chance happened in our environment at the same time. Just downgrading back to 0.9.1 (and undoing the minimal changes we needed to do in values to support 0.10.6 - network policy disabling and different place for imagepullsecret) => works. Upgrading again => doesnt work. All with the dummy authenticator above.
Looking further .... but any hint what change could relate to this would be appreciated .

:) know hat it is. It is actually the helm chart, Readiness probe. This fails while the authenticator is running , 3 * 2 = 6 seconds until failure and update of pod which then causes subsequent issues. Thats why 5 sec work and 10 sec not. So I have a workaround for me at least - change the readiness probe.

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:

Welcome to the Jupyter community! :tada:

Ah!!!! Nice catch isolating this!!! Thank you for your work on this @chrisbulgaria! This issue is related: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1732.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

consideRatio picture consideRatio  路  3Comments

consideRatio picture consideRatio  路  3Comments

consideRatio picture consideRatio  路  3Comments

sgibson91 picture sgibson91  路  3Comments

consideRatio picture consideRatio  路  3Comments