Charts: [stable/cert-manager] v0.6.0 - Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request

Created on 24 Jan 2019 · 24Comments · Source: helm/charts

Is this a request for help?:

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT (maybe)

Version of Helm and Kubernetes:
→ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

→ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.6-gke.3", GitCommit:"04ad69a117f331df6272a343b5d8f9e2aee5ab0c", GitTreeState:"clean", BuildDate:"2019-01-10T00:39:15Z", GoVersion:"go1.10.3b4", Compiler:"gc", Platform:"linux/amd64"}

Which chart:
cert-manager Version 0.6

What happened:
after upgrading the cert-manager pod's log is full of messages:
controller.go:147] certificates controller: Re-queuing item "some-certificate" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
I've also done "helm delete --purge" and reinstalled the chart again - same behaviour.
Therefore, I can reproduce the issue by just installing the cert-manager chart

Anything else we need to know:
I've followed the install/upgrade instructions on https://cert-manager.readthedocs.io/en/latest/admin/upgrading/index.html and the upgrade went smooth without any problems. Also after the upgrade, all the pods are in "running" state.

Source

rmuehlbauer

👍8 👀3

Most helpful comment

today I had some time to dig somewhat deeper and finally could resolve the issue with new cert-manager versions - maybe you guys can use this to sort things out on your side.

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

After working my way thorough cert-manager's "getting started guide" and "troubleshooting guide", I found a Note (on the very bottom on the troubleshooting guide) saying: "If the job continues to fail, please read the Webhook docs for additional information."
Now, on this Webhook Doc (which you can find here: https://cert-manager.readthedocs.io/en/latest/getting-started/webhook.html) I found a interesting piece of information, regarding running cert-manager on private GKE clusters.
On GKE environments the K8s masters only have very limited access to its nodes. Now, to be able to use cert-managers webhook you have to allow those connections also.
This was somehow the missing piece of information - Now it was easy to work my way through the GKE docs (found here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules), gather all the little pieces together and create a new firewall rule which solved the issue for me.
Basically I allowed the K8s master network to access the webhook pod on port 6443 (if you have a deeper look on the webhook pod, you will see that it acually listens on that port and only the webhook service translated that port from 6443 to 443)

I hope this piece of information helps a bit to sort out situations on your side.

rmuehlbauer on 25 Feb 2019

👍28 ❤12

All 24 comments

10856

haf on 26 Jan 2019

I think the issue #10856 might be indeed somehow related.
Before starting the "helm upgrade" command, I've labeled the existing namespace like described here: cert-manager documentation and afterwards also verified the label is correctly set:
→ kubectl describe namespace cert-manager Name: ingress Labels: certmanager.k8s.io/disable-validation=true

rmuehlbauer on 28 Jan 2019

👍1

Good to know! I have a helm-let's-encrypt to upgrade going forward as well, so I'm not going to do that until this is resolved. /cc @kragniz

haf on 28 Jan 2019

today I had to do a complete fallback to 0.5.2 as cert-manager 0.6.0 caused very strange side effect on my k8s gke environment:
Today I've upgraded to a new kubernetes version and afterwards I had huge problems with the cluster, as two pods - "calico-typha-vertical-autoscaler" and "calico-node-vertical-autoscaler" didn't start anymore and also other cluster ressources had a strange behaviour.
Pods restarted after error messages like
autoscaler.go:49] failed to discover apigroup for kind "DaemonSet": unable to retrieve the complete list of server APIs: admission.certmanager.k8s.io/v1beta1: an error on the server ("service unavailable") has prevented the request from succeeding
My first idea was that this might have been caused by the kubernetes update - but this was not the case, as a second cluster - that was not updated - had the same strange issues.
So I've completely uninstalled cert-manager (also removed all CRD's) and installed version 0.5.2 again.
After that procedure I also had to recreate the clusterissuer, as it was also gone.
Now everything is up and running again... for the moment I'm gonna stay with the old cert-manager version - at least this version is working for me and not causing strange issues.

rmuehlbauer on 30 Jan 2019

👍3

did anyone already try with cert-manager 0.6.5?

rmuehlbauer on 18 Feb 2019

@rmuehlbauer Trying with v0.6.5, same error... Any workarounds?

davi5e on 19 Feb 2019

at least not to my knowledge...

rmuehlbauer on 19 Feb 2019

today I had some time to dig somewhat deeper and finally could resolve the issue with new cert-manager versions - maybe you guys can use this to sort things out on your side.

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

I hope this piece of information helps a bit to sort out situations on your side.

rmuehlbauer on 25 Feb 2019

👍28 ❤12

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

rmuehlbauer on 26 Feb 2019

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

What is the gcloud command line to create this rule?

gajus on 15 Mar 2019

👍6

i would like to see that gke command as well please

Izopi4a on 16 Mar 2019

For the record, I did not have this problem when setting up cert-manager.

From my notes, here is literally everything that was needed to set up the cert-manager on a new cluster.

# Install the CustomResourceDefinition resources separately
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.7/deploy/manifests/00-crds.yaml

# Create the namespace for cert-manager
kubectl create namespace cert-manager

# Label the cert-manager namespace to disable resource validation
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install \
  --name cert-manager \
  --namespace cert-manager \
  --version v0.7.0 \
  jetstack/cert-manager

kubectl get pods --namespace cert-manager

# Setup cluster issuer (using letsencrypt)

cat <<'EOF' | kubectl create -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-production
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: '[email protected]'
    privateKeySecretRef:
      name: letsencrypt-production
    http01: {}
EOF

# Set up certficate (replace with your details)
cat <<'EOF' | kubectl replace -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: queryalert-com
  namespace: default
spec:
  secretName: queryalert-com-tls
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  commonName: queryalert.com
  dnsNames:
  - queryalert.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - queryalert.com
EOF

Then just update Ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: {{ .Release.Name | quote }}
  labels:
    {{- include "release_labels" . | indent 4 }}
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
+    certmanager.k8s.io/cluster-issuer: 'letsencrypt-production'
+    certmanager.k8s.io/acme-challenge-type: http01
+spec:
+  tls:
+    - hosts:
+      - queryalert.com
+      secretName: queryalert-com-tls
  rules:
    - host: queryalert.com
      http:
        paths:
          - path: /api
            backend:
              serviceName: {{ .Release.Name | quote }}
              servicePort: 8080

gajus on 16 Mar 2019

👍5 ❤3

with 0.7.0 it works indeed thx

Izopi4a on 18 Mar 2019

issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod

What is the gcloud command line to create this rule?

please have a look at https://github.com/helm/charts/issues/10869#issuecomment-467015706

rmuehlbauer on 18 Mar 2019

So, we are running a private cluster on GKE:

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

I created firewall

CLUSTER=staging
REGION=europe-west4
SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
NETWORK=$(gcloud container clusters describe $CLUSTER --region $REGION | egrep '^network:' | cut -d ':' -f 2 | tr -d ' ')
TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1)

gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGS --network $NETWORK

Then, I tried repeating steps from here https://github.com/helm/charts/issues/10869#issuecomment-473567532 and getting the same error when trying to create ClusterIssuer:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Could you point me what else am I missing?

mlushpenko on 3 Apr 2019

So, we are running a private cluster on GKE:

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}

I created firewall

CLUSTER=staging
REGION=europe-west4
SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1)

gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGS

Then, I tried repeating steps from here #10869 (comment) and getting the same error when trying to create ClusterIssuer:

Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Could you point me what else am I missing?

hmm...your SOURCE and TAGS variables seem to be populated with the correct values - at least when I tried your commands in my environment.
also your "firewall-rules create" line is looking good.
That is exactly that got it finally working in my case...
What about the resulting firewall rule - did you check it is effective for the gke hosts in your cluster? (I dont know how to check this using cli but you can easily see it in the webgui in the firewall rules details on the very bottom of the page...)

rmuehlbauer on 4 Apr 2019

@rmuehlbauer thanks, good observation, I haven't really checked if rules were applied and they weren't, I am not sure why, maybe it has something to do with custon node pools or preemptible nodes.

UPDATE: It was network, damn it, our cluster doesn't run on default network. I've updated my commands

One more update, I got a step further with connections, but now getting this:

I0404 15:31:48.644874       1 request.go:942] Request Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}}
I0404 15:31:48.645004       1 round_trippers.go:419] curl -k -v -XPOST  -H "Content-Type: application/json" -H "User-Agent: image.app_linux-amd64.binary/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Authorization: Bearer blalblalal" 'https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews'
I0404 15:31:48.654331       1 round_trippers.go:438] POST https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 9 milliseconds
I0404 15:31:48.654376       1 round_trippers.go:444] Response Headers:
I0404 15:31:48.654392       1 round_trippers.go:447]     Audit-Id: c5990609-2b9d-47ce-9bda-d15180940f1c
I0404 15:31:48.654397       1 round_trippers.go:447]     Content-Type: application/json
I0404 15:31:48.654400       1 round_trippers.go:447]     Content-Length: 294
I0404 15:31:48.654403       1 round_trippers.go:447]     Date: Thu, 04 Apr 2019 15:31:48 GMT
I0404 15:31:48.654441       1 request.go:942] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false,"reason":"no RBAC policy matched"}}
I0404 15:31:48.654606       1 authorization.go:73] Forbidden: "/", Reason: "no RBAC policy matched"
I0404 15:31:48.654766       1 wrap.go:47] GET /: (10.220427ms) 403 [Go-http-client/2.0 172.16.0.10:54560]
I0404 15:31:49.774434       1 log.go:172] http: TLS handshake error from 172.16.0.11:56368: remote error: tls: bad certificate
I0404 15:31:50.357270       1 log.go:172] http: TLS handshake error from 172.16.0.10:58938: remote error: tls: bad certificate
I0404 15:31:52.656929       1 log.go:172] http: TLS handshake error from 172.16.0.10:58960: remote error: tls: bad certificate
I0404 15:31:55.179124       1 log.go:172] http: TLS handshake error from 172.16.0.10:58966: remote error: tls: bad certificate
I0404 15:31:56.677010       1 log.go:172] http: TLS handshake error from 172.16.0.10:58972: remote error: tls: bad certificate

I was updating from 0.5.2, so maybe something got messed up along those lines, I may try clean install again a bit later

mlushpenko on 4 Apr 2019

@mlushpenko I think you are hitting some RBAC issues - hava a look at https://docs.cert-manager.io/en/latest/getting-started/install.html - especially about the note regarding RBAC and GKE...hopefully that fixes your problem

rmuehlbauer on 4 Apr 2019

👍1

@rmuehlbauer thanks for suggestion, although looks fine:

kubectl describe clusterrolebinding cluster-admin-binding                                           
Name:         cluster-admin-binding
Labels:       <none>
Annotations:  <none>
Role:
  Kind:  ClusterRole
  Name:  cluster-admin
Subjects:
  Kind  Name                     Namespace
  ----  ----                     ---------
  User  [email protected]

I was testing by running helm with my permissions, but it does look related to RBAC as it states in the error log. It feels to me like whoever is calling the API (probably webhook pod or cert-manager) is not running with specific SA because it tries to use anonymous user:

"user":"system:anonymous","group":["system:unauthenticated"]}

Do you have idea about validation process? I read How it works section, but didn't find relevant info.
From the other side, I probably won't be spending much more time with it now, but maybe this will help some other people if they encounter similar issues.

mlushpenko on 5 Apr 2019

👍1

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

This hint helps me solve the problem.

Some notes here: for cluster created by kops, cross subnet mode may need to be enabled explicitly on AWS when using calico.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  name: k8s.local
spec:
  networking:
    # Ref: https://github.com/kubernetes/kops/blob/master/docs/networking.md#enable-cross-subnet-mode-in-calico-aws-only
    calico:
      crossSubnet: true

yujunz on 15 Apr 2019

TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.

I dont actually have masterIpv4CidrBlock listed when i describe the cluster, likely because it isn't a private cluster. Also masterIpv4CidrBlock is noted as deprecated in these docs?

https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.zones.clusters

Anyone have a workaround here that doesn't involve all the madness with private clusters? Trying to set this up on a vpc-native gke cluster.

oshalygin on 17 May 2019

@oshalygin CIDR is 0.0.0.0/0 on a public cluster

intellix on 24 Oct 2019

Credit to @skuro for the gcloud commands to add the firewall:
https://github.com/jetstack/cert-manager/issues/2109#issuecomment-535901422

mpvoss on 22 Jan 2020

In my case, I was deploying via terraform cert-manager and cert-manager-issuer charts in the same script.
I fixed it when set depends on cert-manager for cert-manager-issuer. Cert-manager should deploy first