Cert-manager: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Created on 26 Mar 2020  ·  33Comments  ·  Source: jetstack/cert-manager

Describe the bug:

I installed cert-manager in the kube-system namespace with this command helm install cert-manager jetstack/cert-manager --namespace kube-system -f values.yaml --wait and everything seemed fine. During the installation of Prometheus on the same cluster, I had problems in generating certificates for alert-manager inputs.

Here is the error I found in the cert-manager logs. I only changed the alertmanager domain for security.

kubectl get certificates le-prometheus-alertmanager-tls -o json Error from server: conversion webhook for &{map[apiVersion:cert-manager.io/v1alpha2 kind:Certificate metadata:map[creationTimestamp:2020-03-25T16:18:11Z generation:1 labels:map[app:prometheus chart:prometheus-11.0.3 component:alertmanager heritage:Helm release:prometheus] name:le-prometheus-alertmanager-tls namespace:kube-system ownerReferences:[map[apiVersion:extensions/v1beta1 blockOwnerDeletion:true controller:true kind:Ingress name:prometheus-alertmanager uid:a225d045-1e39-4238-9352-06af00638c87]] uid:e73287ca-6db4-4e74-8e93-675eebfc8dcb] spec:map[dnsNames:[alertmanager.example.com] issuerRef:map[group:cert-manager.io kind:ClusterIssuer name:letsencrypt-production] secretName:le-prometheus-alertmanager-tls] status:map[conditions:[map[lastTransitionTime:2020-03-25T16:18:12Z message:Waiting for CertificateRequest "le-prometheus-alertmanager-tls-1268665689" to complete reason:InProgress status:False type:Ready]]]]} failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

From what I understand the problem is that the cert-manager tries to connect to webhook in the default namespace (cert-manager) instead of the one in which it was installed (kube-system).

Expected behavior:
I expect cert-manager to call the webhook in the correct namespace.

Steps to reproduce the bug:

  • Install cert-manager in kube-system with this command:
    helm install cert-manager jetstack/cert-manager --namespace kube-system -f values.yaml --wait

Changed conf for values.yaml
extraArgs: --cluster-resource-namespace=kube-system ingressShim: defaultIssuerName: "letsencrypt-production" defaultIssuerKind: "ClusterIssuer"

  • Install Prometheus in kube-system with this command:
    helm install prometheus stable/prometheus --namespace=kube-system -f values.yaml

Changed conf for values.yaml: all ingress to true with tls

Anything else we need to know?:

Environment details::

  • Kubernetes version (e.g. v1.15.10-eks-bac369):
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): EKS
  • cert-manager version (e.g. v0.4.0): v0.14.1
  • Install method (e.g. helm or static manifests): helm

/kind bug

aredeploy kinbug

Most helpful comment

This has now been resolved with the new --set installCRDs=true option that can be used when installing the Helm chart in the latest (v0.15.0-alpha.X) versions. We are still in the processing of writing new installation docs to include info on this feature, but if you'd like to give it a go it should clear all this up 😄

All 33 comments

same

Currently this problem is because the namespace is hardcoded in the chart. Currently it is possible when you get the yaml and manually change all the namespaces to the namespace you want. It would be nice if the chart will support it.

The chart is actually okay, and does not hardcode the namespace. The issue here is that the CRD manifests _also_ have to hardcode a namespace name, and the CRDs are not managed by the Helm chart and thus cannot be templated.

There's a number of places in the CRDs that make reference to the namespace cert-manager, including in spec.conversion as well as metadata.annotations - you'll need to make sure to adjust all of these in order to have this work. Alternatively, just deploying into the cert-manager namespace works easiest 😅

We're hoping to be able to improve this in future, but we require https://github.com/helm/helm/issues/7735 to be addressed before we can.

Why CRD can not use helm templates? Basically helm template | kubectl apply -f - where we could provide namespace?

Or a more extreme solution: use CRD in a separate chart?

It's made slightly trickier because we _also_ generate the OpenAPI schema for the CRDs, which is patched in automatically by the controllergen tool from controller-tools: https://github.com/jetstack/cert-manager/blob/fbf2b3073da9622d30362fd054f9acd7a2dbb18f/hack/update-crds.sh#L42-L45

@meyskens worked on trying to get this generator to work against a CRD that includes Helm templating directives, and unfortunately it does not work (as it sees the YAML as misconfigured).

Either way, this is going to be an issue for us even if https://github.com/helm/helm/issues/7735 is addressed so we should come up with some solution to it. Potentially we could use Kustomize to 'overlay' the OpenAPI schema on top, as then the controller-gen tool could be pointed at an overlay that wouldn't contain any chart templating directives.

That would then mean part of the project is deployed with Kustomize and the other with Helm/kubectl, which is also not ideal. The other option would be to switch everything to a purely Kustomize based approach, but that's definitely a more drastic change and will probably cause quite a lot of pain for people, as there's a lot of Helm users out there 😅

Does anyone know any potential work around, or resolution for this error?

@holdenkilbride the only workaround that I can see is to manually edit the CRD before applying it to match your desidered namespace (ref: https://github.com/jetstack/cert-manager/issues/2752#issuecomment-605883456)

We're running into the same issue. If cert-manager is deployed to a different namespace the CRDs are not being deployed by the helm chart and the namespace is hardcoded. This is a real problem for CI/CD.

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

Same issue. Installed to kube-system and cant use kubectl describe or get.

Edit:
I've manually changed the CRD YAML config as follows:

Download file https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml and manually replace:

  1. Annotations

Replace all occurances (should be 6 of them):

cert-manager.io/inject-ca-from-secret: cert-manager/cert-manager-webhook-tls

With:

cert-manager.io/inject-ca-from-secret: kube-system/cert-manager-webhook-tls
  1. Namespace definitions

Replace all occurances (should be 6 of them):

namespace: cert-manager

With:

namespace: kube-system

and then apply kubectl apply --validate=false -f cert-manager.crds.yaml (cert-manager.crds.yaml being the local file you just edited) instead of the command provided in the official docs.

After applying everything works as expected.

I was never trying to deploy to the kube-system namespace; just the default cert-manager namespace.

I want to install cert-manager to
namespace cert-manager-stage,
namespace cert-manager-prod, and so on.

I think you should fix this quickly.

I want to install cert-manager to
namespace cert-manager-stage,
namespace cert-manager-prod, and so on.

I think you should fix this quickly.

There should be only one cert manager per cluster. You can however create multiple Issuers in different namespaces

@boris-savic
one cert-manager per one cluster : is restriction and should be documented.
and anyway user can choose their namespace easily.

@Hokwang just read the documentation

Warning: You should not install multiple instances of cert-manager on a single cluster. This will lead to undefined behavior and you may be banned from providers such as Let’s Encrypt.

Ref: https://cert-manager.io/docs/installation/kubernetes/

Thanks @boris-savic, updating the CRDs like this allowed a stuck namespace deletion to finally terminate in my case.

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct one is used when the issuer is created )

I have cert manager deployed to the "cert-manager" namespace and this issue still exists. I am unable to delete and describe resources.

I am having similar problem ... using v0.14.2 ...
i.e. deploying cert-manager in the "cert-manager" namespace,
Using airship armada to helm install which forces use of "--name xyz" to helm,
Used 'Values.webhook.serviceName = xyz-cert-manager-webhook' to correct the dns names in certificate,
...
I can create issuer ... but on get, describe, etc. of issuer get the following error:

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found

Why is the wrong web hook dns name being used here ?
( the correct one is used when the issuer is created )

FYI ... 0.15-alpha.0 fixed my problem

This has now been resolved with the new --set installCRDs=true option that can be used when installing the Helm chart in the latest (v0.15.0-alpha.X) versions. We are still in the processing of writing new installation docs to include info on this feature, but if you'd like to give it a go it should clear all this up 😄

Thank you @munnerz

vinay@pramukha:~$ kubectl get namespace
NAME STATUS AGE
cert-manager Active 9h
cert-manager-test Active 2d1h
container-registry Active 2d10h
default Active 2d10h
ingress Active 2d10h
kube-node-lease Active 2d10h
kube-public Active 2d10h
kube-system Active 2d10h
kube-verify Active 2d10h
kube-verify1 Active 2d5h
metallb-system Active 2d10h
monitoring Active 2d10h
nextcloud Active 2d1h
vinay@pramukha:~$

vinay@pramukha:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-66bbb47c56-lh229 1/1 Running 0 9h
cert-manager cert-manager-cainjector-5579468649-8jwvg 1/1 Running 0 9h
cert-manager cert-manager-webhook-746cf468-xzb24 1/1 Running 0 9h
container-registry registry-5d5b754549-m8zl6 1/1 Running 2 2d10h
default echo1-bfb568c8b-7fchh 0/1 CrashLoopBackOff 575 2d
default echo1-bfb568c8b-phc29 0/1 CrashLoopBackOff 575 2d
default echo2-5f44d9d965-hznqh 0/1 ImagePullBackOff 0 2d
default microbot-65bc8bdd7c-sgrd6 1/1 Running 0 8h
default microbot-65bc8bdd7c-zzdqt 1/1 Running 0 8h
default nginx-deployment-6b474476c4-5sm7t 1/1 Running 1 2d
default nginx-deployment-6b474476c4-btzdq 1/1 Running 1 2d
ingress nginx-ingress-microk8s-controller-247ws 1/1 Running 6 2d10h
ingress nginx-ingress-microk8s-controller-8gzgf 1/1 Running 1 2d3h
ingress nginx-ingress-microk8s-controller-qwtxr 1/1 Running 1 2d5h
kube-system coredns-588fd544bf-9nmds 1/1 Running 2 2d10h
kube-system dashboard-metrics-scraper-59f5574d4-wcrll 1/1 Running 1 2d1h
kube-system hostpath-provisioner-9b7695c6b-x5p27 1/1 Running 2 2d10h
kube-system kubernetes-dashboard-6d97855997-tmqmf 1/1 Running 1 2d1h
kube-system metrics-server-79749d858b-bhzl6 1/1 Running 2 2d10h
kube-verify kube-verify-5f976b5474-4vz5s 1/1 Running 2 2d10h
kube-verify kube-verify-5f976b5474-7fg52 1/1 Running 2 2d10h
kube-verify kube-verify-5f976b5474-tjl5v 1/1 Running 2 2d10h
kube-verify1 kube-verify-5f976b5474-6t7pv 1/1 Running 1 2d5h
kube-verify1 kube-verify-5f976b5474-7blz8 1/1 Running 1 2d5h
kube-verify1 kube-verify-5f976b5474-mprc4 1/1 Running 1 2d5h
metallb-system controller-5f98465b6b-gd82s 1/1 Running 2 2d10h
metallb-system speaker-6gs2w 1/1 Running 1 2d5h
metallb-system speaker-cml76 1/1 Running 1 2d3h
metallb-system speaker-ncs2p 1/1 Running 2 2d10h
monitoring alertmanager-main-0 2/2 Running 4 2d10h
monitoring arm-exporter-522xf 2/2 Running 2 2d3h
monitoring arm-exporter-mcw44 2/2 Running 4 2d10h
monitoring arm-exporter-ptshf 2/2 Running 2 2d5h
monitoring grafana-676bcb5687-d6hl2 1/1 Running 1 2d7h
monitoring kube-state-metrics-96bf99844-6tlrq 3/3 Running 6 2d10h
monitoring node-exporter-9gkhm 2/2 Running 4 2d10h
monitoring node-exporter-cjjgs 2/2 Running 2 2d5h
monitoring node-exporter-hbg8s 2/2 Running 2 2d3h
monitoring prometheus-adapter-f78c4f4ff-zvcr7 1/1 Running 2 2d10h
monitoring prometheus-k8s-0 3/3 Running 4 2d7h
monitoring prometheus-operator-6b8868d698-pt9rw 2/2 Running 4 2d10h
monitoring smtp-server-5c7c8d77f8-q592k 1/1 Running 2 2d10h
nextcloud nextcloud-857cd47bdf-xrnm4 0/1 Running 2 2d1h

vinay@pramukha:~$ cat <

apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
email: vinay.[email protected]
server: https://acme-staging-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
EOF
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s": x509: certificate is valid for k8s-pi-master, not cert-manager-webhook.cert-manager.svc
vinay@pramukha:~$

Issue still persists even after with dedicated namespace "cert-manager"

`ubuntu@k8s-pi-master:~$ kubectl get namespace
NAME STATUS AGE
cert-manager Active 18h
cert-manager-test Active 2d10h
container-registry Active 2d19h
default Active 2d20h
ingress Active 2d19h
kube-node-lease Active 2d20h
kube-public Active 2d20h
kube-system Active 2d20h
kube-verify Active 2d19h
kube-verify1 Active 2d15h
metallb-system Active 2d19h
monitoring Active 2d19h
nextcloud Active 2d11h

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide --namespace cert-manager

No resources found in cert-manager namespace.

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide --namespace kube-system

No resources found in kube-system namespace.

ubuntu@k8s-pi-master:~$ kubectl get certificate -o wide -A

No resources found

ubuntu@k8s-pi-master:~$

`
There are no certificates in cert-manager and all other namespaces.

Same issue. Installed to kube-system and cant use kubectl describe or get.

Edit:
I've manually changed the CRD YAML config as follows:

Download file https://github.com/jetstack/cert-manager/releases/download/v0.14.1/cert-manager.crds.yaml and manually replace:

  1. Annotations

Replace all occurances (should be 6 of them):

cert-manager.io/inject-ca-from-secret: cert-manager/cert-manager-webhook-tls

With:

cert-manager.io/inject-ca-from-secret: kube-system/cert-manager-webhook-tls
  1. Namespace definitions

Replace all occurances (should be 6 of them):

namespace: cert-manager

With:

namespace: kube-system

and then apply kubectl apply --validate=false -f cert-manager.crds.yaml _(cert-manager.crds.yaml being the local file you just edited)_ instead of the command provided in the official docs.

After applying everything works as expected.

@boris-savic
Hi there,

I have tried your solution. Implementation went OK, but after when i try
kubectl get clusterissuers

that I'm receiving

Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=ClusterIssuer failed: Post https://cert-manager-webhook.kube-system.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority

I'm wondering if it was also your case

Interestingly I'm getting the same error but having installed it in cert-manager namespace.

For the ones struggling, it seems that if you generate a name on the helm that is different from cert-manager, then the service will be named after your name (something like service/cert-manager-1595315110-webhook). Since the service is expected to be named cert-manager-webhook then it can't be found. This can be solved by using cert-manager as the name of the release when installing the chart (probably cert-manager as the namespace too).

Anyway, that still didn't solve the problem for me, so what I did was:

  • Removed the chart: helm uninstall <release_name>
  • Removed the custom resource definitions: kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0-alpha.1/cert-manager.crds.yaml
  • Installed the chart again with: helm install jetstack/cert-manager --generate-name --set installCRDs=true

I hit to the same problem. But I have k8s by kubespray, on managed by GCP, DO etc works without problem.

For the ones struggling, it seems that if you generate a name on the helm that is different from cert-manager, then the service will be named after your name (something like service/cert-manager-1595315110-webhook). Since the service is expected to be named cert-manager-webhook then it can't be found. This can be solved by using cert-manager as the name of the release when installing the chart (probably cert-manager as the namespace too).

Anyway, that still didn't solve the problem for me, so what I did was:

  • Removed the chart: helm uninstall <release_name>
  • Removed the custom resource definitions: kubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v0.16.0-alpha.1/cert-manager.crds.yaml
  • Installed the chart again with: helm install jetstack/cert-manager --generate-name --set installCRDs=true

What worked for me at the end was

helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--version v0.15.2 \
--set installCRDs=true  

I believe --set installCRDs=true does the trick

--set installCRDs=true worked out for me as well!

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments

Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments

Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority

AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

Try removing the secret cert-manager-webhook-ca. It will be regenerated and then restart the cert-manager-webhook(probably not necessary) pod.

When using helm upgrade --install rather than helm install for the intiial deployment we're seeing this error in any subsuquent deployments
Error: failed to create resource: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority
AKS 1.16.10
cert-manager v0.15.2
helm/tiller v2.16.9

Try removing the secret cert-manager-webhook-ca. It will be regenerated and then restart the cert-manager-webhook(probably not necessary) pod.

We deleted the entire namespace before attempting to install, so it's definitely using the CA secret created during the deployment. The exact same helm upgrade --install command failed one day and then worked the next, so it may have been a race condition though I'm not sure 🤷‍♂️

这个问题可能是cni导致的,我修改了calico的mtu后这个问题解决了(This problem may be caused by cni. After I modified the mtu of calico, the problem was solved.)

"mtu": 1440-> "mtu": 1420,

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "k3s-operator-1",
      "mtu": 1420,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Stono picture Stono  ·  3Comments

f-f picture f-f  ·  4Comments

jbartus picture jbartus  ·  4Comments

cpick picture cpick  ·  3Comments

kragniz picture kragniz  ·  4Comments