Describe the bug:
NAME READY STATUS RESTARTS AGE IP
cert-manager-webhook-5677b9b48d-jdz5l 1/1 Running 108 21d 10.32.0.16
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x71 over 6h8m) kubelet, server Liveness probe failed: Get http://10.32.0.16:6080/livez: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 4m25s (x220 over 6h8m) kubelet, server Readiness probe failed: Get http://10.32.0.16:6080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Expected behaviour:
The webhook-pod is only restarted if needed :wink:
Steps to reproduce the bug:
No special steps
Anything else we need to know?:
I can do a PR if desired to increase the timeout.
Environment details::
/kind bug
Hi @ckotzbauer, you should be able to do a PR to extend the helm chart to offer configurable values for the probe timeouts. You can probably crib from another helm chart with similar probe configuration, e.g. haproxy-ingress
That said, 1s is a long time for cluster-internal requests. In my tests this health probe is ~10ms. So you might have something else going on.
NAME READY STATUS RESTARTS AGE
cert-manager-webhook-66976469d7-hhhp8 1/1 Running 0 50d
# time curl -i 10.20.55.43:6080/healthz
HTTP/1.1 200 OK
Date: Sun, 16 Aug 2020 18:02:24 GMT
Content-Length: 0
real 0m0.010s
user 0m0.004s
sys 0m0.004s
Agreed with @whereisaaron here, a PR to allow to modify it is welcome!
Might be worth investigating why it takes a second as this will affect the speed of cert-manager and all deployments.
/help
/priority backlog
/area deploy
/remove-kind bug
/kind feature
@meyskens:
This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
Agreed with @whereisaaron here, a PR to allow to modify it is welcome!
Might be worth investigating why it takes a second as this will affect the speed of cert-manager and all deployments./help
/priority backlog
/area deploy
/remove-kind bug
/kind feature
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Great, thanks @whereisaaron @meyskens for your feedback! I will open a PR for this.