Cert-manager: Certmanager webhook wrong certificate

Created on 24 Mar 2020 · 5Comments · Source: jetstack/cert-manager

Describe the bug:
cert-manager logs:
E0324 12:17:12.963122 1 controller.go:140] cert-manager/controller/clusterissuers "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.sre.svc:443/mutate?timeout=30s: x509: certificate is valid for acme-cert-manager-webhook, acme-cert-manager-webhook.sre, acme-cert-manager-webhook.sre.svc, not cert-manager-webhook.sre.svc" "key"="letsencrypt-production"

I have deleted the certificate, but it keeps generating for the wrong domain.

Environment details::

Kubernetes version (e.g. v1.10.2): v.1.15
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): GKE
cert-manager version (e.g. v0.4.0): 0.14.0
Install method (e.g. helm or static manifests): Helm

/kind bug

kinbug

Source

pvanderlinden

👍1

All 5 comments

This is due to it being difficult since a (few) versions to deploy with different namespaces/release names with the current helm chart, and a bug in helm.
Some more info: https://github.com/helm/helm/issues/7735

pvanderlinden on 24 Mar 2020

Guys I am seeing the same issue. The cluster is in Amazon via EKS service. K8s version 1.14.9

I am getting this error when it tries to generate a certificate via the already created let's encrypt clusterissuers.

I0325 14:56:10.310305 1 controller.go:129] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="featureflags/ffs-api-feature-flags-service-api"
E0325 14:56:10.316197 1 controller.go:131] cert-manager/controller/ingress-shim "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.slr-system.svc:443/mutate?timeout=30s: x509: certificate is valid for cert-manager-webhook, cert-manager-webhook.cert-manager, cert-manager-webhook.cert-manager.svc, not cert-manager-webhook.slr-system.svc" "key"="featureflags/ffs-api-feature-flags-service-api"

These are the webhook logs:
I0325 08:40:45.540230 1 main.go:64] "msg"="enabling TLS as certificate file flags specified"
I0325 08:40:45.540567 1 server.go:126] "msg"="listening for insecure healthz connections" "address"=":6080"
I0325 08:40:45.540612 1 server.go:138] "msg"="listening for secure connections" "address"=":10250"
I0325 08:40:45.540694 1 server.go:155] "msg"="registered pprof handlers"
I0325 08:40:45.541014 1 tls_file_source.go:144] "msg"="detected private key or certificate data on disk has changed. reloading certificate"
2020/03/25 14:50:55 http: TLS handshake error from 10.0.66.48:39660: remote error: tls: bad certificate
2020/03/25 14:51:00 http: TLS handshake error from 10.0.66.48:39750: remote error: tls: bad certificate
2020/03/25 14:51:10 http: TLS handshake error from 10.0.66.48:39820: remote error: tls: bad certificate
2020/03/25 14:51:30 http: TLS handshake error from 10.0.66.48:39974: remote error: tls: bad certificate
2020/03/25 14:52:10 http: TLS handshake error from 10.0.66.48:40328: remote error: tls: bad certificate
2020/03/25 14:53:30 http: TLS handshake error from 10.0.66.48:41022: remote error: tls: bad certificate
2020/03/25 14:56:10 http: TLS handshake error from 10.0.66.48:42490: remote error: tls: bad certificate
2020/03/25 15:01:10 http: TLS handshake error from 10.0.66.48:45270: remote error: tls: bad certificate
2020/03/25 15:06:10 http: TLS handshake error from 10.0.66.48:47966: remote error: tls: bad certificate

I tried deleting the certificate and restarting the webhook pod but still getting these errors.. I have been installing cert-manager and it's components via manifests. The version is v0.13.1.
I have moved cert-manager to a different namespace though. Could that be an issue causing this? I didn't have such issue before.

Do you have any idea?

bogdanalov-sw on 25 Mar 2020

@bogdanalov-sw There are 2 solutions I think:

simple solution: go for the assumed helm params (release name: cert-manager;
namespace: cert-manager)
I went for this solution as I could and is much easier (I was using non-defaults before which worked, but with the newer helm manifests it doesn't)

If you can't: I think all issues come from the static CRD's installed before the helm deployment (assuming this has reached the helm chart: https://github.com/jetstack/cert-manager/pull/2733 ). You have to patch the CRD static manifests with sed for example, there are references to the namespace and the service name of the webhook (some are clear, others aren't, just grep for cert-manager and look at it on a case by case base).

pvanderlinden on 26 Mar 2020

😕1 👍1

Same here https://github.com/jetstack/cert-manager/issues/2752