Cert-manager: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://rancher-cert-cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "cert-manager-webhook-ca")

Created on 1 Oct 2020 · 19Comments · Source: jetstack/cert-manager

Describe the bug:

I am using cert-manager to generate the certificate for Rancher. I am using helm chart to deploy both. (cert-manager version 0.16.1
and Rancher version 2.4.8). Cert-manager is deployed successfully but while deploying Rancher facing issues in generating certificate.

Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://rancher-cert-cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "cert-manager-webhook-ca")

Expected Behavior:
It should deploy successfully.

Environment:
Kubernetes : "v1.15.11-eks-af3caf"
kubectl : v1.18.6
Install method: helm + kustomize (using Argo-CD)

There is no change in default values of charts.

triagsupport

Source

ankitsoni17-png

All 19 comments

Is this a fresh install of cert-manager?

/triage support

meyskens on 2 Oct 2020

Hi @meyskens
Yes it is a fresh installation of cert-manager.

ankitsoni17-png on 5 Oct 2020

Have you taken a look at https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions?

meyskens on 8 Oct 2020

Hi @meyskens
First thing is that I am not using any custom CNI. And second one is that the URL which you provided is for cert-manager version 1.X but I am using cert-manager 0.16.1.
I checked for compatibility issues in v0.16.0 docs but didn't find any issue regarding EKS cluster.
Also all the ports are allowed within the cluster. I tried with changing the "webhook.secureport" to some other port but still no luck.

ankitsoni17-png on 8 Oct 2020

The documentation for that is only present in v1.0.0 as the workaround wasn't in 0.16.
Have you looked into the cabundle part as you are getting a certiicate validation error

meyskens on 8 Oct 2020

I checked the ca-bundle from my side. Everything seems fine. Could you please suggest what all things need to be checked. I don't want to miss anything.
Also while deploying cert-manager I got some warnings:

CustomResourceDefinition/certificaterequests.cert-manager.io is part of a different application: cert-manager
CustomResourceDefinition/certificates.cert-manager.io is part of a different application: cert-manager
CustomResourceDefinition/challenges.acme.cert-manager.io is part of a different application: cert-manager
CustomResourceDefinition/clusterissuers.cert-manager.io is part of a different application: cert-manager
CustomResourceDefinition/issuers.cert-manager.io is part of a different application: cert-manager
CustomResourceDefinition/orders.acme.cert-manager.io is part of a different application: cert-manager

ankitsoni17-png on 9 Oct 2020

Hi @meyskens
This issue got resolved for me. So closing it.

ankitsoni17-png on 9 Oct 2020

I getting pissed off by people who say the issue resolved, without explanation, what was done to solve it.

alexsorkin on 12 Oct 2020

👍1

I getting pissed off by people who say the issue resolved, without explanation, what was done to solve it.

Hi @alexsorkin
Apologies for not mentioning the reason. Let me explain you the scenerio.
As the cert-manager was not working for me initially so I tried to deploy it on some different namespace(rancher-certs). But there also it didn't worked. So I removed it from that namespace. But unfortunately "ValidatingWebhookConfiguration" and "MutatingWebhookConfiguration" didn't removed at that time.
Then I again deployed cert-manager in cert-manager namespace and it also created both of the above objects.
So due to this the issue occurred.

I debugged the whole process and found the multiple objects. So once I deleted the old one the issue went off.
Also I deployed the required CRD with helm later on (by doing installCRDs : true) rather then installing them using palin YAML's.

But now Facing new issue i.e.

I re-checked everything but still didn't find any solution for that. Please let me know if are familiar with this issue.

ankitsoni17-png on 13 Oct 2020

Hi, the problem eventually heels itself... You should simply wait ~20 seconds before creating issuer, after cert-manager deploy, to let cainjector to inject the CA certificates into webhook.
Its not an issue... but the behaviour should be documented in cert-manager "Getting Started document".

alexsorkin on 13 Oct 2020

Hi, the problem eventually heels itself... You should simply wait ~20 seconds before creating issuer, after cert-manager deploy, to let cainjector to inject the CA certificates into webhook.
Its not an issue... but the behaviour should be documented in cert-manager "Getting Started document".

Hi @alexsorkin
I tried this also. Waited for 10 min after cert-manager deployment but still the same issue.

ankitsoni17-png on 13 Oct 2020

👍1

Is the cainjector running? See https://github.com/jetstack/cert-manager/issues/3338#issuecomment-705458087

meyskens on 14 Oct 2020

Is the cainjector running? See #3338 (comment)

Hi @meyskens
The cainjector is also running.
Everything is healthy and is in running state.

[ankit@rancher ~]$ kubectl get all -n cert-manager
NAME                                                        READY   STATUS    RESTARTS   AGE
pod/rancher-cert-cert-manager-5786f46d5b-lsm6s              1/1     Running   0          12h
pod/rancher-cert-cert-manager-cainjector-6894f9cbcf-jqg46   1/1     Running   0          12h
pod/rancher-cert-cert-manager-webhook-86df8cf76-z8f22       1/1     Running   0          12h

NAME                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/rancher-cert-cert-manager           ClusterIP   10.100.4.67     <none>        9402/TCP   2d4h
service/rancher-cert-cert-manager-webhook   ClusterIP   10.100.85.152   <none>        443/TCP    2d4h

NAME                                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/rancher-cert-cert-manager              1/1     1            1           2d4h
deployment.apps/rancher-cert-cert-manager-cainjector   1/1     1            1           2d4h
deployment.apps/rancher-cert-cert-manager-webhook      1/1     1            1           2d4h

NAME                                                              DESIRED   CURRENT   READY   AGE
replicaset.apps/rancher-cert-cert-manager-5786f46d5b              1         1         1       2d4h
replicaset.apps/rancher-cert-cert-manager-cainjector-6894f9cbcf   1         1         1       2d4h
replicaset.apps/rancher-cert-cert-manager-webhook-86df8cf76       1         1         1       2d4h
[ankit@rancher ~]$ kubectl get secret -n cert-manager
NAME                                               TYPE                                  DATA   AGE
default-token-9v8sh                                kubernetes.io/service-account-token   3      2d4h
istio.default                                      istio.io/key-and-cert                 3      2d4h
rancher-cert-cert-manager-cainjector-token-5tgj5   kubernetes.io/service-account-token   3      2d4h
rancher-cert-cert-manager-token-48bd2              kubernetes.io/service-account-token   3      2d4h
rancher-cert-cert-manager-webhook-ca               Opaque                                3      2d4h
rancher-cert-cert-manager-webhook-token-hbw8c      kubernetes.io/service-account-token   3      2d4h

ankitsoni17-png on 14 Oct 2020

Hi @meyskens
I debugged a bit more and monitored the logs of cainjector pod logs and found the below issue.

E1014 10:27:24.861262       1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-cainjector-leader-election-core: configmaps "cert-manager-cainjector-leader-election-core" is forbidden: User "system:serviceaccount:cert-manager:rancher-cert-cert-manager-cainjector" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

ankitsoni17-png on 14 Oct 2020

Hi @meyskens
I debugged a bit more and monitored the logs of cainjector pod logs and found the below issue.

E1014 10:27:24.861262       1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-cainjector-leader-election-core: configmaps "cert-manager-cainjector-leader-election-core" is forbidden: User "system:serviceaccount:cert-manager:rancher-cert-cert-manager-cainjector" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

It seems to be Rancher specific, missing proper PSP role.

alexsorkin on 14 Oct 2020

Hi @meyskens
I debugged a bit more and monitored the logs of cainjector pod logs and found the below issue.
E1014 10:27:24.861262       1 leaderelection.go:320] error retrieving resource lock kube-system/cert-manager-cainjector-leader-election-core: configmaps "cert-manager-cainjector-leader-election-core" is forbidden: User "system:serviceaccount:cert-manager:rancher-cert-cert-manager-cainjector" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
It seems to be Rancher specific, missing proper PSP role.

Hi @meyskens
The issue is resolved now. The problem was with Clusterrole "cert-manager:rancher-cert-cert-manager-cainjector" created for serviceaccount by cert-manager. The serviceaccout mapped with this role also expects access on configmaps. But in cert-manager helm chart configmaps access part is missing in ClusterRole definition.
I have edited the file manually and the issue went off.
This thing needs to be updated in helm chart itself.
Thanks for you help and support :)

ankitsoni17-png on 15 Oct 2020

I'm also seeing this using v1.0.3. I'm not familiar with the codebase, but do we need to patch clusterrole.rbac.authorization.k8s.io/v1/cert-manager-cainjector to include access to configmaps?

johanbrandhorst on 16 Oct 2020

I'm also seeing this using v1.0.3. I'm not familiar with the codebase, but do we need to patch clusterrole.rbac.authorization.k8s.io/v1/cert-manager-cainjector to include access to configmaps?

I just ran into this issue, and I patched the ClusterRole with the following rule to access configmaps:

  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - get
      - create
      - update

Note that if you're using Kustomize, you'll have to provide all the required permissions because the patch will replace the rules in the ClusterRole, instead of appending the new rule to the list. Here's a gist of the patch I'm using with Kustomize.

s-newman on 30 Oct 2020

❤1

I'm also seeing this using v1.0.3. I'm not familiar with the codebase, but do we need to patch clusterrole.rbac.authorization.k8s.io/v1/cert-manager-cainjector to include access to configmaps?

Hi @johanbrandhorst
It is mandatory to patch the clusterrole because it is expecting access on configmaps. So to make it work smoothly, you have to add the configmaps part in clusterrole.

ankitsoni17-png on 30 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings