Cert-manager: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Created on 26 Mar 2020 · 33Comments · Source: jetstack/cert-manager

Describe the bug:

I installed cert-manager in the kube-system namespace with this command helm install cert-manager jetstack/cert-manager --namespace kube-system -f values.yaml --wait and everything seemed fine. During the installation of Prometheus on the same cluster, I had problems in generating certificates for alert-manager inputs.

Here is the error I found in the cert-manager logs. I only changed the alertmanager domain for security.

kubectl get certificates le-prometheus-alertmanager-tls -o json Error from server: conversion webhook for &{map[apiVersion:cert-manager.io/v1alpha2 kind:Certificate metadata:map[creationTimestamp:2020-03-25T16:18:11Z generation:1 labels:map[app:prometheus chart:prometheus-11.0.3 component:alertmanager heritage:Helm release:prometheus] name:le-prometheus-alertmanager-tls namespace:kube-system ownerReferences:[map[apiVersion:extensions/v1beta1 blockOwnerDeletion:true controller:true kind:Ingress name:prometheus-alertmanager uid:a225d045-1e39-4238-9352-06af00638c87]] uid:e73287ca-6db4-4e74-8e93-675eebfc8dcb] spec:map[dnsNames:[alertmanager.example.com] issuerRef:map[group:cert-manager.io kind:ClusterIssuer name:letsencrypt-production] secretName:le-prometheus-alertmanager-tls] status:map[conditions:[map[lastTransitionTime:2020-03-25T16:18:12Z message:Waiting for CertificateRequest "le-prometheus-alertmanager-tls-1268665689" to complete reason:InProgress status:False type:Ready]]]]} failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

From what I understand the problem is that the cert-manager tries to connect to webhook in the default namespace (cert-manager) instead of the one in which it was installed (kube-system).

Expected behavior:
I expect cert-manager to call the webhook in the correct namespace.

Steps to reproduce the bug:

Install cert-manager in kube-system with this command:
helm install cert-manager jetstack/cert-manager --namespace kube-system -f values.yaml --wait

Changed conf for values.yaml
extraArgs: --cluster-resource-namespace=kube-system ingressShim: defaultIssuerName: "letsencrypt-production" defaultIssuerKind: "ClusterIssuer"

Install Prometheus in kube-system with this command:
helm install prometheus stable/prometheus --namespace=kube-system -f values.yaml

Changed conf for values.yaml: all ingress to true with tls

Anything else we need to know?:

Environment details::

Kubernetes version (e.g. v1.15.10-eks-bac369):
Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): EKS
cert-manager version (e.g. v0.4.0): v0.14.1
Install method (e.g. helm or static manifests): helm

/kind bug

aredeploy kinbug

Source

pierluigilenoci

👍15

Most helpful comment

This has now been resolved with the new --set installCRDs=true option that can be used when installing the Helm chart in the latest (v0.15.0-alpha.X) versions. We are still in the processing of writing new installation docs to include info on this feature, but if you'd like to give it a go it should clear all this up 😄

munnerz on 23 Apr 2020

👍7 ❤3

All 33 comments

I found other people with the same problem
https://github.com/jetstack/cert-manager/issues/2732
https://github.com/jetstack/cert-manager/issues/2602

pierluigilenoci on 26 Mar 2020

same

VarroReve on 27 Mar 2020

👍9

Currently this problem is because the namespace is hardcoded in the chart. Currently it is possible when you get the yaml and manually change all the namespaces to the namespace you want. It would be nice if the chart will support it.

Stijn98s on 28 Mar 2020

❤1 🎉1 👍1

The chart is actually okay, and does not hardcode the namespace. The issue here is that the CRD manifests _also_ have to hardcode a namespace name, and the CRDs are not managed by the Helm chart and thus cannot be templated.

There's a number of places in the CRDs that make reference to the namespace cert-manager, including in spec.conversion as well as metadata.annotations - you'll need to make sure to adjust all of these in order to have this work. Alternatively, just deploying into the cert-manager namespace works easiest 😅

We're hoping to be able to improve this in future, but we require https://github.com/helm/helm/issues/7735 to be addressed before we can.

munnerz on 30 Mar 2020

👍1

Why CRD can not use helm templates? Basically helm template | kubectl apply -f - where we could provide namespace?

Or a more extreme solution: use CRD in a separate chart?

povils on 30 Mar 2020

It's made slightly trickier because we _also_ generate the OpenAPI schema for the CRDs, which is patched in automatically by the controllergen tool from controller-tools: https://github.com/jetstack/cert-manager/blob/fbf2b3073da9622d30362fd054f9acd7a2dbb18f/hack/update-crds.sh#L42-L45

@meyskens worked on trying to get this generator to work against a CRD that includes Helm templating directives, and unfortunately it does not work (as it sees the YAML as misconfigured).

Either way, this is going to be an issue for us even if https://github.com/helm/helm/issues/7735 is addressed so we should come up with some solution to it. Potentially we could use Kustomize to 'overlay' the OpenAPI schema on top, as then the controller-gen tool could be pointed at an overlay that wouldn't contain any chart templating directives.

That would then mean part of the project is deployed with Kustomize and the other with Helm/kubectl, which is also not ideal. The other option would be to switch everything to a purely Kustomize based approach, but that's definitely a more drastic change and will probably cause quite a lot of pain for people, as there's a lot of Helm users out there 😅

munnerz on 30 Mar 2020

Does anyone know any potential work around, or resolution for this error?

holdenkilbride on 31 Mar 2020

@holdenkilbride the only workaround that I can see is to manually edit the CRD before applying it to match your desidered namespace (ref: https://github.com/jetstack/cert-manager/issues/2752#issuecomment-605883456)

pierluigilenoci on 1 Apr 2020

👍1

We're running into the same issue. If cert-manager is deployed to a different namespace the CRDs are not being deployed by the helm chart and the namespace is hardcoded. This is a real problem for CI/CD.

winromulus on 3 Apr 2020

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

muffinrecon on 3 Apr 2020

Same issue. Installed to kube-system and cant use kubectl describe or get.

Edit:
I've manually changed the CRD YAML config as follows:

Download file https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml and manually replace:

Annotations

Replace all occurances (should be 6 of them):

cert-manager.io/inject-ca-from-secret: cert-manager/cert-manager-webhook-tls

With:

cert-manager.io/inject-ca-from-secret: kube-system/cert-manager-webhook-tls

Namespace definitions

Replace all occurances (should be 6 of them):

namespace: cert-manager

With:

namespace: kube-system

and then apply kubectl apply --validate=false -f cert-manager.crds.yaml (cert-manager.crds.yaml being the local file you just edited) instead of the command provided in the official docs.

After applying everything works as expected.

boris-savic on 3 Apr 2020

👍9

I was never trying to deploy to the kube-system namespace; just the default cert-manager namespace.

holdenkilbride on 3 Apr 2020

I want to install cert-manager to
namespace cert-manager-stage,
namespace cert-manager-prod, and so on.

I think you should fix this quickly.

Hokwang on 6 Apr 2020

👍1

I want to install cert-manager to
namespace cert-manager-stage,
namespace cert-manager-prod, and so on.

I think you should fix this quickly.

There should be only one cert manager per cluster. You can however create multiple Issuers in different namespaces

boris-savic on 6 Apr 2020

👍1

@boris-savic
one cert-manager per one cluster : is restriction and should be documented.
and anyway user can choose their namespace easily.

Hokwang on 6 Apr 2020

👍1

@Hokwang just read the documentation

Warning: You should not install multiple instances of cert-manager on a single cluster. This will lead to undefined behavior and you may be banned from providers such as Let’s Encrypt.

Ref: https://cert-manager.io/docs/installation/kubernetes/

pierluigilenoci on 6 Apr 2020

Thanks @boris-savic, updating the CRDs like this allowed a stuck namespace deletion to finally terminate in my case.

ezk84 on 14 Apr 2020

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct one is used when the issuer is created )

gwaines on 22 Apr 2020

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct one is used when the issuer is created )

FYI ... 0.15-alpha.0 fixed my problem

gwaines on 22 Apr 2020

munnerz on 23 Apr 2020

👍7 ❤3

Thank you @munnerz

pierluigilenoci on 24 Apr 2020

vinay@pramukha:~$ kubectl get namespace
NAME STATUS AGE
cert-manager Active 9h
cert-manager-test Active 2d1h
container-registry Active 2d10h
default Active 2d10h
ingress Active 2d10h
kube-node-lease Active 2d10h
kube-public Active 2d10h
kube-system Active 2d10h
kube-verify Active 2d10h
kube-verify1 Active 2d5h
metallb-system Active 2d10h
monitoring Active 2d10h
nextcloud Active 2d1h
vinay@pramukha:~$

vinay@pramukha:~$ kubectl get pods -A
READY STATUS RESTARTS AGE
1/1 Running 0 9h
1/1 Running 0 9h
1/1 Running 0 9h
1/1 Running 2 2d10h
0/1 CrashLoopBackOff 575 2d
0/1 CrashLoopBackOff 575 2d
0/1 ImagePullBackOff 0 2d
1/1 Running 0 8h
1/1 Running 0 8h
1/1 Running 1 2d
1/1 Running 1 2d
1/1 Running 6 2d10h
1/1 Running 1 2d3h
1/1 Running 1 2d5h
1/1 Running 2 2d10h
1/1 Running 1 2d1h
1/1 Running 2 2d10h
1/1 Running 1 2d1h
1/1 Running 2 2d10h
1/1 Running 2 2d10h
1/1 Running 2 2d10h
1/1 Running 2 2d10h
1/1 Running 1 2d5h
1/1 Running 1 2d5h
1/1 Running 1 2d5h
1/1 Running 2 2d10h
1/1 Running 1 2d5h
1/1 Running 1 2d3h
1/1 Running 2 2d10h
2/2 Running 4 2d10h
2/2 Running 2 2d3h
2/2 Running 4 2d10h
2/2 Running 2 2d5h
1/1 Running 1 2d7h
3/3 Running 6 2d10h
2/2 Running 4 2d10h
2/2 Running 2 2d5h
2/2 Running 2 2d3h
1/1 Running 2 2d10h
3/3 Running 4 2d7h
2/2 Running 4 2d10h
1/1 Running 2 2d10h
0/1 Running 2 2d1h

vinay@pramukha:~$ cat <

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: vinay.[email protected]
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
EOF
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s": x509: certificate is valid for k8s-pi-master, not cert-manager-webhook.cert-manager.svc
vinay@pramukha:~$

Issue still persists even after with dedicated namespace "cert-manager"

babvin on 9 Jul 2020

`ubuntu@k8s-pi-master:~$ kubectl get namespace
NAME STATUS AGE
cert-manager Active 18h
cert-manager-test Active 2d10h
container-registry Active 2d19h
default Active 2d20h
ingress Active 2d19h
kube-node-lease Active 2d20h
kube-public Active 2d20h
kube-system Active 2d20h
kube-verify Active 2d19h
kube-verify1 Active 2d15h
metallb-system Active 2d19h
monitoring Active 2d19h
nextcloud Active 2d11h

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide --namespace cert-manager

No resources found in cert-manager namespace.

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide --namespace kube-system

No resources found in kube-system namespace.

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide -A

No resources found

ubuntu@k8s-pi-master:~$

`
There are no certificates in cert-manager and all other namespaces.

babvin on 10 Jul 2020

Same issue. Installed to kube-system and cant use kubectl describe or get.

Edit:
I've manually changed the CRD YAML config as follows:

Download file https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml and manually replace:

Annotations

Replace all occurances (should be 6 of them):
cert-manager.io/inject-ca-from-secret: cert-manager/cert-manager-webhook-tls
With:
cert-manager.io/inject-ca-from-secret: kube-system/cert-manager-webhook-tls
Namespace definitions

Replace all occurances (should be 6 of them):
namespace: cert-manager
With:
namespace: kube-system
and then apply kubectl apply --validate=false -f cert-manager.crds.yaml _(cert-manager.crds.yaml being the local file you just edited)_ instead of the command provided in the official docs.

After applying everything works as expected.

@boris-savic
Hi there,

I have tried your solution. Implementation went OK, but after when i try
kubectl get clusterissuers

that I'm receiving

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed: Post https://cert-manager-webhook.kube-system.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority

I'm wondering if it was also your case

taikoThe on 10 Jul 2020

Interestingly I'm getting the same error but having installed it in cert-manager namespace.

demiters on 21 Jul 2020

For the ones struggling, it seems that if you generate a name on the helm that is different from cert-manager, then the service will be named after your name (something like service/cert-manager-1595315110-webhook). Since the service is expected to be named cert-manager-webhook then it can't be found. This can be solved by using cert-manager as the name of the release when installing the chart (probably cert-manager as the namespace too).

Anyway, that still didn't solve the problem for me, so what I did was:

Removed the chart: helm uninstall <release_name>
Removed the custom resource definitions: kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0-alpha.1/cert-manager.crds.yaml
Installed the chart again with: helm install jetstack/cert-manager --generate-name --set installCRDs=true

fabiocarneiro on 21 Jul 2020

👍7 🎉2

I hit to the same problem. But I have k8s by kubespray, on managed by GCP, DO etc works without problem.

sebastian-zmijewski on 21 Jul 2020

For the ones struggling, it seems that if you generate a name on the helm that is different from cert-manager, then the service will be named after your name (something like service/cert-manager-1595315110-webhook). Since the service is expected to be named cert-manager-webhook then it can't be found. This can be solved by using cert-manager as the name of the release when installing the chart (probably cert-manager as the namespace too).

Anyway, that still didn't solve the problem for me, so what I did was:

Removed the chart: helm uninstall <release_name>

Removed the custom resource definitions: kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0-alpha.1/cert-manager.crds.yaml

Installed the chart again with: helm install jetstack/cert-manager --generate-name --set installCRDs=true

What worked for me at the end was

helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--version v0.15.2 \
--set installCRDs=true

I believe --set installCRDs=true does the trick

taikoThe on 21 Jul 2020

👍3

--set installCRDs=true worked out for me as well!

demiters on 22 Jul 2020

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments

Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

elliot-resdiary on 4 Sep 2020

👍1

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments

Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

Try removing the secret cert-manager-webhook-ca. It will be regenerated and then restart the cert-manager-webhook(probably not necessary) pod.

adinhodovic on 10 Sep 2020

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments
Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority
AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

Try removing the secret cert-manager-webhook-ca. It will be regenerated and then restart the cert-manager-webhook(probably not necessary) pod.

We deleted the entire namespace before attempting to install, so it's definitely using the CA secret created during the deployment. The exact same helm upgrade --install command failed one day and then worked the next, so it may have been a race condition though I'm not sure 🤷‍♂️

elliot-resdiary on 10 Sep 2020

这个问题可能是cni导致的，我修改了calico的mtu后这个问题解决了(This problem may be caused by cni. After I modified the mtu of calico, the problem was solved.)

"mtu": 1440-> "mtu": 1420,

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "k3s-operator-1",
      "mtu": 1420,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}