Cert-manager: Helm-Chart: Increase timeouts of probes of webhook-deployment

Created on 16 Aug 2020 · 4Comments · Source: jetstack/cert-manager

Describe the bug:

The liveness- and readinessProbe of the webhook-deployment uses the (default) timeout of 1s. This seems to be too short and the pod gets restarted very often.

NAME                                       READY   STATUS    RESTARTS   AGE   IP
cert-manager-webhook-5677b9b48d-jdz5l      1/1     Running   108        21d   10.32.0.16

Events:
  Type     Reason     Age                     From             Message
  ----     ------     ----                    ----             -------
  Warning  Unhealthy  47m (x71 over 6h8m)     kubelet, server  Liveness probe failed: Get http://10.32.0.16:6080/livez: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  4m25s (x220 over 6h8m)  kubelet, server  Readiness probe failed: Get http://10.32.0.16:6080/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Expected behaviour:
The webhook-pod is only restarted if needed :wink:

Steps to reproduce the bug:
No special steps

Anything else we need to know?:
I can do a PR if desired to increase the timeout.

Environment details::

Kubernetes version (e.g. v1.10.2): 1.18.8
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): bare-metal
cert-manager version (e.g. v0.4.0): 0.16.0
Install method (e.g. helm or static manifests): helm

/kind bug

aredeploy help wanted kinfeature prioritbacklog

Source

ckotzbauer

All 4 comments

Hi @ckotzbauer, you should be able to do a PR to extend the helm chart to offer configurable values for the probe timeouts. You can probably crib from another helm chart with similar probe configuration, e.g. haproxy-ingress

That said, 1s is a long time for cluster-internal requests. In my tests this health probe is ~10ms. So you might have something else going on.

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-webhook-66976469d7-hhhp8      1/1     Running   0          50d

# time curl -i 10.20.55.43:6080/healthz
HTTP/1.1 200 OK
Date: Sun, 16 Aug 2020 18:02:24 GMT
Content-Length: 0


real    0m0.010s
user    0m0.004s
sys     0m0.004s

whereisaaron on 16 Aug 2020

👍1

Agreed with @whereisaaron here, a PR to allow to modify it is welcome!
Might be worth investigating why it takes a second as this will affect the speed of cert-manager and all deployments.

/help
/priority backlog
/area deploy
/remove-kind bug
/kind feature

meyskens on 17 Aug 2020

👍1

@meyskens:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Agreed with @whereisaaron here, a PR to allow to modify it is welcome!
Might be worth investigating why it takes a second as this will affect the speed of cert-manager and all deployments.

/help
/priority backlog
/area deploy
/remove-kind bug
/kind feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.