Cert-manager: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout

Created on 15 Feb 2020 · 63Comments · Source: jetstack/cert-manager

Bugs should be filed for issues encountered whilst operating cert-manager.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors.
Please provide as much detail as possible.

Describe the bug:

Cluster Issuer installation fails with TLS handshake timeout

kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager
Error from server (InternalError): error when creating "cert-issuer-letsencrypt-prd.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout

Expected behaviour:

kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager

works successfully and does not generate an error

Steps to reproduce the bug:

Create cert-manager ns
```
kubectl create ns cert-manager
```

Install cert-manager using helm 3

helm install cert-manager jetstack/cert-manager --namespace cert-manager
NAME: cert-manager
LAST DEPLOYED: Sat Feb 15 11:40:28 2020
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager has been deployed successfully!

Add secret letsencrypt-prd

kubectl -n cert-manager apply -f cert-cloudflare-api-key-secret.yml

Create cluster-issuer

kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager

cert-issuer-letsencrypt-prd.yml

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-prd
  namespace: cert-manager
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: xxxx
    privateKeySecretRef: 
      name: letsencrypt-prd
    solvers:
    - dns01:
        cloudflare:
          email: xxxx
          apiKeySecretRef:
            name: cloudflare-api-key-secret
            key: api-key

Anything else we need to know?:

Environment details::

Kubernetes version (e.g. v1.10.2): v.1.17.2
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): bare-metal
cert-manager version (e.g. v0.4.0): 0.13.0
Install method (e.g. helm or static manifests): helm 3

/kind bug

Pods are running fine, no restarts

kubectl -n cert-manager get pods
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-c6cb4cbdf-djqt4              1/1     Running   0          37m
cert-manager-cainjector-76f7596c4-wsb4z   1/1     Running   0          37m
cert-manager-webhook-8575f88c85-xf7w2     1/1     Running   0          31m

crd are there

kubectl get crd | grep cert-manager
certificaterequests.cert-manager.io                     2020-02-15T10:39:37Z
certificates.cert-manager.io                            2020-02-15T10:39:38Z
challenges.acme.cert-manager.io                         2020-02-15T10:39:38Z
clusterissuers.cert-manager.io                          2020-02-15T10:39:39Z
issuers.cert-manager.io                                 2020-02-15T10:39:40Z
orders.acme.cert-manager.io                             2020-02-15T10:39:40Z

logs of cert-manager-webhook pod repreatetly show http: TLS handshake error from 10.42.152.128:5067: EOF

kubectl -n cert-manager logs cert-manager-webhook-8575f88c85-xf7w2
I0215 10:47:05.409158       1 main.go:64]  "msg"="enabling TLS as certificate file flags specified"  
I0215 10:47:05.409423       1 server.go:126]  "msg"="listening for insecure healthz connections"  "address"=":6080"
I0215 10:47:05.409471       1 server.go:138]  "msg"="listening for secure connections"  "address"=":10250"
I0215 10:47:05.409495       1 server.go:155]  "msg"="registered pprof handlers"  
I0215 10:47:05.409672       1 tls_file_source.go:144]  "msg"="detected private key or certificate data on disk has changed. reloading certificate"  
2020/02/15 10:48:46 http: TLS handshake error from 10.42.152.128:25427: EOF
2020/02/15 10:53:56 http: TLS handshake error from 10.42.152.128:48126: EOF
2020/02/15 10:59:06 http: TLS handshake error from 10.42.152.128:21683: EOF
2020/02/15 11:04:16 http: TLS handshake error from 10.42.152.128:9457: EOF
2020/02/15 11:09:26 http: TLS handshake error from 10.42.152.128:41640: EOF
2020/02/15 11:14:36 http: TLS handshake error from 10.42.152.128:56638: EOF

here the logs from cert-manager-pod

kubectl -n cert-manager logs cert-manager-c6cb4cbdf-fzdmj
I0215 12:23:31.410690       1 start.go:76] cert-manager "msg"="starting controller"  "git-commit"="6d9200f9d" "version"="v0.13.0"
W0215 12:23:31.410750       1 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0215 12:23:31.411825       1 controller.go:167] cert-manager/controller/build-context "msg"="configured acme dns01 nameservers" "nameservers"=["10.43.0.10:53"] 
I0215 12:23:31.412082       1 controller.go:130] cert-manager/controller "msg"="starting leader election"  
I0215 12:23:31.412188       1 metrics.go:202] cert-manager/metrics "msg"="listening for connections on" "address"="0.0.0.0:9402" 
I0215 12:23:31.412836       1 leaderelection.go:242] attempting to acquire leader lease  kube-system/cert-manager-controller...
I0215 12:24:50.921660       1 leaderelection.go:252] successfully acquired lease kube-system/cert-manager-controller
I0215 12:24:50.922007       1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered"  "type"="selfsigned"
I0215 12:24:50.922139       1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered"  "type"="venafi"
I0215 12:24:50.922149       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-selfsigned" 
I0215 12:24:50.922214       1 controller.go:74] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="starting control loop"  
I0215 12:24:50.922245       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-venafi" 
I0215 12:24:50.922291       1 controller.go:74] cert-manager/controller/certificaterequests-issuer-venafi "msg"="starting control loop"  
I0215 12:24:50.922307       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="clusterissuers" 
I0215 12:24:50.922338       1 controller.go:74] cert-manager/controller/clusterissuers "msg"="starting control loop"  
I0215 12:24:50.922376       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="webhook-bootstrap" 
I0215 12:24:50.922407       1 controller.go:74] cert-manager/controller/webhook-bootstrap "msg"="starting control loop"  
I0215 12:24:50.922412       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="issuers" 
I0215 12:24:50.922474       1 controller.go:74] cert-manager/controller/issuers "msg"="starting control loop"  
I0215 12:24:50.922537       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="orders" 
I0215 12:24:50.922578       1 controller.go:74] cert-manager/controller/orders "msg"="starting control loop"  
I0215 12:24:50.922602       1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered"  "type"="acme"
I0215 12:24:50.922711       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-acme" 
I0215 12:24:50.922736       1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered"  "type"="vault"
I0215 12:24:50.922740       1 controller.go:74] cert-manager/controller/certificaterequests-issuer-acme "msg"="starting control loop"  
I0215 12:24:50.922855       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-vault" 
I0215 12:24:50.922904       1 controller.go:74] cert-manager/controller/certificaterequests-issuer-vault "msg"="starting control loop"  
I0215 12:24:50.922982       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificates" 
I0215 12:24:50.923031       1 controller.go:74] cert-manager/controller/certificates "msg"="starting control loop"  
I0215 12:24:50.923042       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="ingress-shim" 
I0215 12:24:50.923073       1 controller.go:74] cert-manager/controller/ingress-shim "msg"="starting control loop"  
I0215 12:24:51.025320       1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered"  "type"="ca"
I0215 12:24:51.025331       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="challenges" 
I0215 12:24:51.025385       1 controller.go:74] cert-manager/controller/challenges "msg"="starting control loop"  
I0215 12:24:51.025430       1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-ca" 
I0215 12:24:51.025473       1 controller.go:74] cert-manager/controller/certificaterequests-issuer-ca "msg"="starting control loop"  
I0215 12:24:51.122618       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-ca" 
I0215 12:24:51.122638       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cloudflare-api-key-secret" 
I0215 12:24:51.122650       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-tls" 
I0215 12:24:51.122669       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/sh.helm.release.v1.cert-manager.v1" 
I0215 12:24:51.122674       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cloudflare-api-key-secret" 
I0215 12:24:51.122705       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-token-5tdm7" 
I0215 12:24:51.122729       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-token-5tdm7" 
I0215 12:24:51.122780       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-token-6hpwz" 
I0215 12:24:51.122729       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/sh.helm.release.v1.cert-manager.v1" 
I0215 12:24:51.122805       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-token-6hpwz" 
I0215 12:24:51.122840       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/default-token-vlftn" 
I0215 12:24:51.122867       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/default-token-vlftn" 
I0215 12:24:51.122618       1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-cainjector-token-lzwbt" 
I0215 12:24:51.122903       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-cainjector-token-lzwbt" 
I0215 12:24:51.123241       1 controller.go:129] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="kube-system/dashboard-kubernetes-dashboard" 
I0215 12:24:51.123256       1 controller.go:197] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="ca certificate already up to date" "resource_kind"="Secret" "resource_name"="cert-manager-webhook-ca" "resource_namespace"="cert-manager" 
I0215 12:24:51.123281       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-ca" 
I0215 12:24:51.123420       1 controller.go:255] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="serving certificate already up to date" "resource_kind"="Secret" "resource_name"="cert-manager-webhook-tls" "resource_namespace"="cert-manager" 
I0215 12:24:51.123450       1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-tls"

kinbug

Source

papanito

Most helpful comment

I had issue with deploy ClusterIssuer, error was:

Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s: context deadline exceeded

Solved as:

$ helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v0.16.0 \
  --set installCRDs=true

$ kubectl delete mutatingwebhookconfiguration.admissionregistration.k8s.io cert-manager-webhook
$ kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cert-manager-webhook

ivanovUA on 5 Aug 2020

👍17 🚀12

All 63 comments

hi @papanito my configuration is the same, but Kubernetes version 1.16 and I tried to install cert-manager today using static file instead of helm.

I had exactly the same issue and solved it by following this page https://cert-manager.io/docs/installation/compatibility/ . Particularly, I have used cert-manager-no-webhook.yaml instead of cert-manager.yaml. You can consider if this option is suitable for you.

So now I finished my configuration and HTTPS works fine. Followed https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes . A note that I'm using bare metal.

laimison on 15 Feb 2020

👍3

output of kubectl get secret -n cert-manager

arghya88 on 16 Feb 2020

kubectl get secret -n cert-manager
NAME                                  TYPE                                  DATA   AGE
cert-manager-cainjector-token-lzwbt   kubernetes.io/service-account-token   3      42h
cert-manager-token-5tdm7              kubernetes.io/service-account-token   3      42h
cert-manager-webhook-ca               kubernetes.io/tls                     3      42h
cert-manager-webhook-tls              kubernetes.io/tls                     3      42h
cert-manager-webhook-token-6hpwz      kubernetes.io/service-account-token   3      42h
cloudflare-api-key-secret             Opaque                                1      41h
default-token-vlftn                   kubernetes.io/service-account-token   3      42h
sh.helm.release.v1.cert-manager.v1    helm.sh/release.v1                    1      42h

papanito on 17 Feb 2020

@laimison yep that worked.

papanito on 17 Feb 2020

Still not sure why it does not work with the webhook. Also not sure whether I really sure if this is the best approach as

Doing this may expose your cluster to miss-configuration problems that in some cases could cause cert-manager to stop working altogether (i.e. if invalid types are set for fields on cert-manager resources).

Also interestingly, webhook was working on the initial setup of my cluster back in January. I did add an additional node and updated the underlying OS. Not sure yet why it stopped working...

papanito on 17 Feb 2020

👍6

I am having the same symptom. And I am sure it is something with my Weave CNI, because it worked with AWS VPC CNI.

I even tried tcpdump on cert-manager and cert-manager-webhook pods, surprisingly, there is no traffic on webhook port.

Magicloud on 20 Feb 2020

👍5

I just ran into this with BKPR 1.4.0 on a new GKE 1.15 cluster. I tried to recreating the node pool, but kept getting the same error during the BKPR install. I eventually I destroy my cluster, and built a new one.

skinlayers on 25 Feb 2020

I resolved the problem at https://github.com/jetstack/cert-manager/issues/2319#issuecomment-590210052

Magicloud on 25 Feb 2020

Guys I am seeing the same issue. The cluster is in Amazon via EKS service. K8s version 1.14.9

I am getting this error when it tries to generate a certificate via the already created let's encrypt clusterissuers.

I0325 14:56:10.310305 1 controller.go:129] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="featureflags/ffs-api-feature-flags-service-api"
E0325 14:56:10.316197 1 controller.go:131] cert-manager/controller/ingress-shim "msg"="re-queuing item due to error processing" "error"="Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.slr-system.svc:443/mutate?timeout=30s: x509: certificate is valid for cert-manager-webhook, cert-manager-webhook.cert-manager, cert-manager-webhook.cert-manager.svc, not cert-manager-webhook.slr-system.svc" "key"="featureflags/ffs-api-feature-flags-service-api"

These are the webhook logs:
I0325 08:40:45.540230 1 main.go:64] "msg"="enabling TLS as certificate file flags specified"
I0325 08:40:45.540567 1 server.go:126] "msg"="listening for insecure healthz connections" "address"=":6080"
I0325 08:40:45.540612 1 server.go:138] "msg"="listening for secure connections" "address"=":10250"
I0325 08:40:45.540694 1 server.go:155] "msg"="registered pprof handlers"
I0325 08:40:45.541014 1 tls_file_source.go:144] "msg"="detected private key or certificate data on disk has changed. reloading certificate"
2020/03/25 14:50:55 http: TLS handshake error from 10.0.66.48:39660: remote error: tls: bad certificate
2020/03/25 14:51:00 http: TLS handshake error from 10.0.66.48:39750: remote error: tls: bad certificate
2020/03/25 14:51:10 http: TLS handshake error from 10.0.66.48:39820: remote error: tls: bad certificate
2020/03/25 14:51:30 http: TLS handshake error from 10.0.66.48:39974: remote error: tls: bad certificate
2020/03/25 14:52:10 http: TLS handshake error from 10.0.66.48:40328: remote error: tls: bad certificate
2020/03/25 14:53:30 http: TLS handshake error from 10.0.66.48:41022: remote error: tls: bad certificate
2020/03/25 14:56:10 http: TLS handshake error from 10.0.66.48:42490: remote error: tls: bad certificate
2020/03/25 15:01:10 http: TLS handshake error from 10.0.66.48:45270: remote error: tls: bad certificate
2020/03/25 15:06:10 http: TLS handshake error from 10.0.66.48:47966: remote error: tls: bad certificate

I tried deleting the certificate and restarting the webhook pod but still getting these errors.. I have been installing cert-manager and it's components via manifests. The version is v0.13.1.
I have moved cert-manager to a different namespace though. Could that be an issue causing this? I didn't have such issue before.

Do you have any ideas guys?

bogdanalov-sw on 25 Mar 2020

Switching it over to "cert-manager" namespace works.. But why it doesn't with the other?

bogdanalov-sw on 25 Mar 2020

Same here https://github.com/jetstack/cert-manager/issues/2752

pierluigilenoci on 26 Mar 2020

I am running into this issue as well using the recommended manifest installation:

kubectl apply \
  --validate=false \
  -f https://github.com/jetstack/cert-manager/releases/download/v0.14.0/cert-manager.yaml

medined on 26 Mar 2020

Same here having upgraded from v0.11 to 0.14.1. Mandatory webhook component seems to have borked. Our new webhook pod is accessible on cert-manager-webhook.our-namespace.svc:443 and I've tried the hostNetwork suggestion and waiting for the pod to come up before creating the clusterIssuer resource. No dice.
Rolling back to < v0.14 until all the open issues about this are closed. May I suggest a patch to make webhook optional again in the meantime?

holmesb on 28 Mar 2020

👍1

I have a similar issue with non-standard namespace (core-l7)
Error: UPGRADE FAILED: release core-l7 failed, and has been rolled back due to atomic being set: failed to create resource: conversion webhook for cert-manager.io/v1alpha3, Kind=ClusterIssuer failed: Post https://cert-manager-webhook.core-l7.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority
helm.go:75: [debug] conversion webhook for cert-manager.io/v1alpha3, Kind=ClusterIssuer failed: Post https://cert-manager-webhook.core-l7.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority

Antiarchitect on 31 Mar 2020

Here https://github.com/jetstack/cert-manager/issues/2752#issuecomment-605966908 you can find the answer of @munnerz that explains very well the issue, the reason behind and a possible workaround.

pierluigilenoci on 31 Mar 2020

👍2

Thank you!

Antiarchitect on 31 Mar 2020

I'm still seeing the problem,

everything is installed into cert-manager namespace. First ierror on install

Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

http: TLS handshake error from 10.10.15.106:50016: remote error: tls: bad certificate

on fresh upgrade to 0.14.1

immediate creation of webhook gives

I0401 14:04:58.427632       1 main.go:79]  "msg"="enabling TLS as certificate file flags specified"
I0401 14:04:58.428254       1 server.go:131]  "msg"="listening for insecure healthz connections"  "address"=":6080"
I0401 14:04:58.428394       1 server.go:143]  "msg"="listening for secure connections"  "address"=":10250"
I0401 14:04:58.428518       1 server.go:165]  "msg"="registered pprof handlers"
I0401 14:04:58.428800       1 tls_file_source.go:144]  "msg"="detected private key or certificate data on disk has changed. reloading certificate"
2020/04/01 14:23:11 http: TLS handshake error from 10.10.20.196:52582: remote error: tls: bad certificate
2020/04/01 14:26:12 http: TLS handshake error from 10.10.20.196:53848: remote error: tls: bad certificate

and nothing works.

if I understand the workaround correctly (just deploy into the cert-manager namespace so that all the crds which have hardcoded namespaces and are not controlled by helm), then we've done than and I'm still seeing the error.

I don't actually understand the error. Which certificate is bad?

EDIT: clarification here, I was using an older helm chart values. it does seem that the webhook became mandatory, while the CA issuer is not. I was using both set to off, then the webhook always arrives with no certs.

Probably an edge case, but if you got stuck here in an upgrade path, then ensure you've actually got a cainjector running at all. I did not and the webhook (unsurprisingly) didn't like it.

dogfish182 on 1 Apr 2020

I've solved my problems with sed :))

namespace="not-cert-manager"
curl -s -L "https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml" 2>&1 | sed -e "s/namespace: cert-manager/namespace: ${namespace}/" -e "s/cert-manager.io\/inject-ca-from-secret: cert-manager\/cert-manager-webhook-tls/cert-manager.io\/inject-ca-from-secret: ${namespace}\/${namespace}-cert-manager-webhook-tls/" |  kubectl apply --validate=false -f -

But you should remove not only CRDs but

mutatingwebhookconfigurations.admissionregistration.k8s.io
validatingwebhookconfigurations.admissionregistration.k8s.io

if they were improperly configured before

Antiarchitect on 2 Apr 2020

👍4

@Antiarchitect Only your solution worked for me!

Steps taken:

kubectl delete mutatingwebhookconfiguration.admissionregistration.k8s.io cert-manager-webhook
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cert-manager-webhook
namespace="not-cert-manager"
curl -s -L "https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml" 2>&1 | sed -e "s/namespace: cert-manager/namespace: ${namespace}/" -e "s/cert-manager.io\/inject-ca-from-secret: cert-manager\/cert-manager-webhook-tls/cert-manager.io\/inject-ca-from-secret: ${namespace}\/${namespace}-cert-manager-webhook-tls/" |  kubectl apply --validate=false -f -

TylerIlunga on 2 Apr 2020

🎉12 🚀3 👍2

Hi @TylerIlunga and @Antiarchitect,

I've the same issue, and with that fix I've already created an issuer. But when try describe created issuer, return that error:

conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.not-cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found.

jarpsimoes on 3 Apr 2020

👍2

Hi @jarpsimoes. Seem like you should delete your issuer too. It is better to apply those sed filters on a clean setup. But the problem is cert-manager is a rather complicated piece of software that creates some global entities.

Antiarchitect on 3 Apr 2020

@Antiarchitect Only your solution worked for me!

Steps taken:

kubectl delete mutatingwebhookconfiguration.admissionregistration.k8s.io cert-manager-webhook
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cert-manager-webhook
namespace="not-cert-manager"
curl -s -L "https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml" 2>&1 | sed -e "s/namespace: cert-manager/namespace: ${namespace}/" -e "s/cert-manager.io\/inject-ca-from-secret: cert-manager\/cert-manager-webhook-tls/cert-manager.io\/inject-ca-from-secret: ${namespace}\/${namespace}-cert-manager-webhook-tls/" |  kubectl apply --validate=false -f -

I followed the same action plan and it is working.
but after that i cant describe or delete the issuer , it is giving me the following error:

conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.not-cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found.

Any Idea ?

zzaareer on 5 Apr 2020

👍2

I have been dealing with this issue for a couple days now. After the 0.15.0 alpha came out today I thought this issue would be resolved, but I continue to suffer the same issue.

Also I don't think @Antiarchitect 's solution is actually a real solution since it necessitates deletes the webhook configurations, effectively disabling the webhook service. I think the issue is TLS connection establishment related, but I am not sure why none of the ciphers work.

KillerByte on 9 Apr 2020

Hi @zzaareer. Did you install the cert-manager into the not-cert-mananger namespace? If not, you have to adjust the variable namespace according to your namespace.

akowasch on 9 Apr 2020

I'm not sure if this is helpful, but an FYI: attempting to apply this via kubectl -k (kustomize) failed, but kubectl -f succeeded. I don't know how to research more.

EDIT: potentially very relevant, I was working w/ a possibly very bad mix of kubectl versions:

λ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-10T21:53:58Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:36:19Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

weisjohn on 10 Apr 2020

👍2

I ended up finding another unique solution to this problem, and all of cert-manager is working at full capacity for me now. My setup was:

Kubernetes 1.18
Calico CNI
Bare Metal
Cert-Manager 0.14.1

To fix, for some reason I had to make an adjustment to the calico network IP pool configuration away from the default. I downloaded the calico setup YAML (https://docs.projectcalico.org/manifests/calico.yaml), and then I edited this snippet

- name: CALICO_IPV4POOL_IPIP
  value: "Always"

- name: CALICO_IPV4POOL_IPIP
  value: "Never"

After deleting the default created IP pool and restarting calico, I reinstalled cert-manager and it began working as intended.

I am not sure exactly why this change fixed all my problems.

KillerByte on 10 Apr 2020

👍1

I am also bit by this bad. :(

sharkymcdongles on 16 Apr 2020

me too :(

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm which changes service names to xyz-cert-manager-webhook,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct web hook dns name is used when the issuer is created )

gwaines on 22 Apr 2020

me too :(

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm which changes service names to xyz-cert-manager-webhook,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct web hook dns name is used when the issuer is created )

FYI ... 0.15-alpha.0 fixed my problem

gwaines on 22 Apr 2020

@pierluigilenoci thanks for the hint. The main difference for me it's I use the ns cert-manager

papanito on 22 Apr 2020

We've made significant improvements to the way TLS is managed in the upcoming v0.15 release, as well as adding an installCRDs option to the Helm chart which will handle correctly updating service names and conversion webhook configuration when deploying into namespaces other than cert-manager or using a Helm release name other than cert-manager.

I think this issue can now be closed after this, and if anyone is still running into issues I'd advise you to try the new v0.15.0-alpha.1 release and reporting back! (to be safe, it may be best to start 'fresh' in case you have a currently broken configuration!)

munnerz on 23 Apr 2020

👎6 👍2

I just install the new version of cert-manager 0.15.0-alpha.1 and tried the test-resources to create the issuer and certificate but still getting the same issue:

kubectl apply -f test-resources.yaml
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.99.49.145:443: i/o timeout
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.99.49.145:443: i/o timeout

zzaareer on 23 Apr 2020

I don't think your issue here is the same as the original poster (which had a TLS handshake timeout) - in your case, it seems like there is some legitimate issue between your apiserver and pods running in your cluster. This is part of the Kubernetes conformance suite, so if that's the case, is something that needs to be fixed within your Kubernetes deployment.

Has your webhook pod started? & are endpoints being registered behind the Service resource okay? kubectl get pod,service,endpoints -n cert-manager should help work that out :). Otherwise I'd investigate how/where you've deployed Kubernetes.

munnerz on 23 Apr 2020

@munnerz Thank you for your help.
I deployed the cert manager using the kubectl command like below :

kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.0-alpha.1/Cert-manager.yaml

every thing working fine as you can see:

kubectl get pod,service,endpoints -n cert-manager
NAME READY STATUS RESTARTS AGE
pod/cert-manager-5bb5b9dcf8-sb52s 1/1 Running 0 28m
pod/cert-manager-cainjector-869f7868b7-rrrw2 1/1 Running 0 28m
pod/cert-manager-webhook-79d78c45cd-7fxfs 1/1 Running 0 28m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager ClusterIP 10.99.239.39 9402/TCP 28m
service/cert-manager-webhook ClusterIP 10.99.49.145 443/TCP 28m

NAME ENDPOINTS AGE
endpoints/cert-manager 10.244.4.60:9402 28m
endpoints/cert-manager-webhook 10.244.5.75:10250 28m

but when i try to create the issuer and certificate i got the timeout and context deadline exceeded

zzaareer on 23 Apr 2020

👍7

Using cert manager v0.15.0 which is released yesterday. With installCRDs set true, I am still getting the same error as above:

failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

Our scripts deploy another helm chart which contains cert manager resources just after cert manager helm release reports ready and helm fails with above error. However, if I try to create resources after some time, I don't get any errors. So, it looks like a timing issue but I was not getting it with v0.15.0-alpha.0.

turkenh on 7 May 2020

👍10

@zzaareer is your api-server able to access to your nodes?

Timeout and context deadline exceeded errors sounds very similar to issues with webhooks while using GKE private clusters:

https://www.revsys.com/tidbits/jetstackcert-manager-gke-private-clusters/
https://github.com/open-policy-agent/gatekeeper/blob/master/README.md#running-on-private-gke-cluster-nodes

turkenh on 7 May 2020

I am getting the same errors. Does port 443 need to be allowed through to the node hosting the webhook pod? I am running a on-prem k8s cluster.

Trenthani on 7 May 2020

I have the same error here using v0.15.0

MrApe on 7 May 2020

I am observing initial startup of cert-manager consistently fails with:

Failed to pull image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0": rpc error: code = Unknown desc = failed to pull and unpack image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0": failed to copy: httpReaderSeeker: failed open: failed to do request: Get https://quay.io/v2/jetstack/cert-manager-webhook/manifests/sha256:8596cc308466075565c54ec30fec2847bc357b741ddc70f0a8460d9e6e229edc: unexpected EOF

which increases startup time and this could be related to the root cause why we started to get this error.

Full events:

Events:
  Type     Reason     Age                  From                              Message
  ----     ------     ----                 ----                              -------
  Normal   Scheduled  2m21s                default-scheduler                 Successfully assigned cert-manager/cert-manager-webhook-67545d6b46-q4zdj to local-dev-control-plane
  Warning  Failed     34s                  kubelet, local-dev-control-plane  Failed to pull image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0": rpc error: code = Unknown desc = failed to pull and unpack image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0": failed to copy: httpReaderSeeker: failed open: failed to do request: Get https://quay.io/v2/jetstack/cert-manager-webhook/manifests/sha256:8596cc308466075565c54ec30fec2847bc357b741ddc70f0a8460d9e6e229edc: unexpected EOF
  Warning  Failed     34s                  kubelet, local-dev-control-plane  Error: ErrImagePull
  Normal   BackOff    33s                  kubelet, local-dev-control-plane  Back-off pulling image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0"
  Warning  Failed     33s                  kubelet, local-dev-control-plane  Error: ImagePullBackOff
  Normal   Pulling    20s (x2 over 2m19s)  kubelet, local-dev-control-plane  Pulling image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0"
  Normal   Pulled     16s                  kubelet, local-dev-control-plane  Successfully pulled image "quay.io/jetstack/cert-manager-webhook:v0.15.0-alpha.0"
  Normal   Created    16s                  kubelet, local-dev-control-plane  Created container cert-manager
  Normal   Started    15s                  kubelet, local-dev-control-plane  Started container cert-manager

turkenh on 8 May 2020

My pods all come up fine without delay. I have tried following the Helm release notes aswell as the kubectl method. Same error consistently...

Trenthani on 10 May 2020

Switch my pod network from flannel to Calico resolved my issue to create an issuer. I found some clues here: https://github.com/jetstack/cert-manager/issues/2811

Trenthani on 11 May 2020

Hi, I ran into the same issue like @zzaareer on a rancher kubernetes cluster. I have successfully deployed cert-manager via helm v3:

helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v0.15.0 --set installCRDs=true --description "install cert-manager"

but when I try to install the test resources, I get the following error:

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded

I attached a sidecar to the cert-manager pod for debugging and it shows me that I can resolve cert-manager-webhook.cert-manager.svc, but the IP is not answering on a ping.

I've resolved the IP to 10.43.179.12 and this matches my svc/cert-manager-webhook service. When I do k port-forward service/cert-manager-webhook 9090:443 and call localhost:9090 in my browser, I see that the API is up. But why is my cert-manager not reaching the webhook pod?

shibumi on 11 May 2020

👍1

@shibumi check out the your CNI setup (flannel or calico). I think there is a flag to pass flannel so that the host-gw is used. I am no expert but changing from flannel to calico worked for me.

Trenthani on 12 May 2020

@Trenthani We are using calico only already.

shibumi on 12 May 2020

@turkenh I am seeing the same issue but no errors in my events. I am following the same approach as you. i.e., deploy the cert-manager first and then the issuer with a separate helm chart. Just as you had observed, I do not see the error if I deploy my Issuer after a few seconds (~60). Back to back installations of cert-manager and the Issuer certainly throws the following error:
_Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority_
@munnerz , I am certainly seeing the same error as the original issue with v.015.0. Any thoughts on why I am seeing the error when I try to immediately deploy the Issuer after the cert-manager deployment and NOT when I deploy the Issuer after a bit of a wait? This still appears to be a bug. Do you want me to open another issue to track this?

vijaygos on 20 May 2020

👍9

fixed this problem on my hard upgrade from v0.10 to v0.15 by deleting "cert-manager-webhook-ca" cause it's not updated automatically if exists

darthcorsair on 30 May 2020

👍3

@turkenh I am seeing the same issue but no errors in my events. I am following the same approach as you. i.e., deploy the cert-manager first and then the issuer with a separate helm chart. Just as you had observed, I do not see the error if I deploy my Issuer after a few seconds (~60). Back to back installations of cert-manager and the Issuer certainly throws the following error:
_Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority_
@munnerz , I am certainly seeing the same error as the original issue with v.015.0. Any thoughts on why I am seeing the error when I try to immediately deploy the Issuer after the cert-manager deployment and NOT when I deploy the Issuer after a bit of a wait? This still appears to be a bug. Do you want me to open another issue to track this?

I had similar issue and found out that my kube-controller-manager pod and kube-api-server had wrongly configured NO_PROXY not excluding .svc from proxy traffic. I had to change /etc/kubernetes/manifests/*.yaml on master node.

rolish on 8 Jun 2020

As a workaround to keep vxlan (because I can't use host-gw, nodes are in diffent l2 networks) disabled checksums on flannel.1 interface (https://github.com/coreos/flannel/pull/1282)

ethtool --offload flannel.1 rx off tx off

And persisted it in systemd
/etc/systemd/system/xiaodu-flannel-txrx-off.service

[Unit]
Description=Turn off checksum offload on flannel.1
After=sys-devices-virtual-net-flannel.1.device

[Install]
WantedBy=sys-devices-virtual-net-flannel.1.device

[Service]
Type=oneshot
ExecStart=ethtool --offload flannel.1 rx off tx off

raider444 on 3 Jul 2020

I tried what @darthcorsair did and deleted the secret. I its immediately being recreated. Then, once it is back, I tried to install die ClusterIssuer again and no success:

Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

Not sure how the CA creation works, but something is off.

EDIT: This is in a private GKE cluster. Cert-manager helm chart version v0.15.2. helm version v3.2.1.

boxcee on 16 Jul 2020

@munnerz, can you please consider re-opening this. Keeps happening in 0.15.2.

boxcee on 17 Jul 2020

👍1

Waiting a while as mentioned earlier seems to do the trick, so there is probably a timing issue somewhere. Tested on 0.15.2.

The following works for me (you might wanna tinker with the timer):

kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.2/cert-manager.yaml
kubectl -n cert-manager rollout status deploy/cert-manager-webhook
sleep 20

Fresa on 22 Jul 2020

👍2 🚀1 🎉1

Waiting a while as mentioned earlier seems to do the trick, so there is probably a timing issue somewhere. Tested on 0.15.2.

The following works for me (you might wanna tinker with the timer):
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.2/cert-manager.yaml

kubectl -n cert-manager rollout status deploy/cert-manager-webhook

sleep 20

I had a 60 second wait built in to my script and it still failed, came back 10 minutes later and tried this and it worked.

jesseborden on 25 Jul 2020

👍1

I had issue with deploy ClusterIssuer, error was:

Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": Post https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s: context deadline exceeded

Solved as:

$ helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v0.16.0 \
  --set installCRDs=true

$ kubectl delete mutatingwebhookconfiguration.admissionregistration.k8s.io cert-manager-webhook
$ kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cert-manager-webhook

ivanovUA on 5 Aug 2020

👍17 🚀12

Had the same error. My cause was, that I accidentally turned on the istio-injection in the cert-manager namespace...
After turning it off, everything was fine again.

May my clumsiness will help someone ;)

lober-io on 7 Aug 2020

This happens when you put any annotations on the Issuer or ClusterIssuer resources. Causes it to fail the validating webhook.

Kampe on 16 Sep 2020

This happens when you put any annotations on the Issuer or ClusterIssuer resources. Causes it to fail the validating webhook.

Can you file a seperate issue for that?

meyskens on 22 Sep 2020

👍1

这个问题可能是cni导致的，我修改了calico的mtu后这个问题解决了(This problem may be caused by cni. After I modified the mtu of calico, the problem was solved.)

"mtu": 1440-> "mtu": 1420,

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "k3s-operator-1",
      "mtu": 1420,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

liulangwa on 29 Sep 2020

👍1

"mtu": 1440-> "mtu": 1420,

Thanks for that. This fixed it for me on microk8s 1.19.

lilvinz on 6 Oct 2020

Using hetzner cloud servers here, and the problem was fixed indeed by changing MTU not cert-manager

Changing calico MTU from 1440 to 1400 or 1420 fixed the error running test-resource.yaml

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=10s": context deadline exceeded

MTU Change:

kubectl patch configmap/calico-config -n kube-system --type merge \
  -p '{"data":{"veth_mtu": "1400"}}'
kubectl rollout restart daemonset calico-node -n kube-system

SirNarsh on 17 Oct 2020

👍2

Update from previous solution above
Steps taken:

kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v1.0.3/cert-manager.yaml
kubectl delete mutatingwebhookconfiguration.admissionregistration.k8s.io cert-manager-webhook
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cert-manager-webhook
namespace="not-cert-manager"
curl -s -L "https://github.com/jetstack/cert-manager/releases/download/v1.0.3/cert-manager.crds.yaml" 2>&1 | sed -e "s/namespace: cert-manager/namespace: ${namespace}/" -e "s/cert-manager.io\/inject-ca-from-secret: cert-manager\/cert-manager-webhook-tls/cert-manager.io\/inject-ca-from-secret: ${namespace}\/${namespace}-cert-manager-webhook-tls/" |  kubectl apply --validate=false -f -

TylerIlunga on 23 Oct 2020

Potential resolution:

In our case, our cert-manager-webhooks pod had been running for nearly a year. We suspect it was using some sort of out-of-date internal cluster cert. After deleting the webhook pod, the Deployment spun up a new one without the issue.

sgringwe on 29 Oct 2020

🎉1

Here is the easiest solution to this quagmire. The problem stems from the webhook cert secret being outdated for the newer cert-manager. Easiest way to solve all this stuff is just scorch the earth. Helm uninstall or kubectl delete all cert-manager resources and CRDs. If the CRDs get stuck kill the finalizers. Then make sure all secrets and all configmaps are clear in the cert-manager NS. Then you should be able to redeploy. Your existing certs live as secrets, so they will be unaffected and on next boot cert-manager will find them and add them to the internal queue.

sharkymcdongles on 11 Nov 2020

I had similar problem.
My solution turned to be very complicated.

Delete everything related to cert-manager. CDRS. Secrets.
Download https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.crds.yaml
Replace all namespace cert-manager occurrences to be kube-system in cert-manager.crds.yaml
Apply cert-manager.crds.yaml
Apply helm chart 1.1.0 to namespace kube-system
Delete cert-manager-webhook-ca secret from kube-system namespace.
Check logs of webhook pod, it should be

I1207 14:51:21.279796       1 logs.go:58] http: TLS handshake error from 10.209.112.25:54902: remote error: tls: bad certificate
I1207 14:51:21.280118       1 logs.go:58] http: TLS handshake error from 10.209.112.25:54900: remote error: tls: bad certificate
I1207 14:52:41.385457       1 dynamic_source.go:135] cert-manager/webhook "msg"="Detected root CA rotation - regenerating serving certificates"
I1207 14:52:41.409557       1 dynamic_source.go:199] cert-manager/webhook "msg"="Updated serving TLS certificate"