Is this a request for help?:
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT (maybe)
Version of Helm and Kubernetes:
→ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
→ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.6-gke.3", GitCommit:"04ad69a117f331df6272a343b5d8f9e2aee5ab0c", GitTreeState:"clean", BuildDate:"2019-01-10T00:39:15Z", GoVersion:"go1.10.3b4", Compiler:"gc", Platform:"linux/amd64"}
Which chart:
cert-manager Version 0.6
What happened:
after upgrading the cert-manager pod's log is full of messages:
controller.go:147] certificates controller: Re-queuing item "some-certificate" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
I've also done "helm delete --purge" and reinstalled the chart again - same behaviour.
Therefore, I can reproduce the issue by just installing the cert-manager chart
Anything else we need to know:
I've followed the install/upgrade instructions on https://cert-manager.readthedocs.io/en/latest/admin/upgrading/index.html and the upgrade went smooth without any problems. Also after the upgrade, all the pods are in "running" state.
I think the issue #10856 might be indeed somehow related.
Before starting the "helm upgrade" command, I've labeled the existing namespace like described here: cert-manager documentation and afterwards also verified the label is correctly set:
→ kubectl describe namespace cert-manager
Name: ingress
Labels: certmanager.k8s.io/disable-validation=true
Good to know! I have a helm-let's-encrypt to upgrade going forward as well, so I'm not going to do that until this is resolved. /cc @kragniz
today I had to do a complete fallback to 0.5.2 as cert-manager 0.6.0 caused very strange side effect on my k8s gke environment:
Today I've upgraded to a new kubernetes version and afterwards I had huge problems with the cluster, as two pods - "calico-typha-vertical-autoscaler" and "calico-node-vertical-autoscaler" didn't start anymore and also other cluster ressources had a strange behaviour.
Pods restarted after error messages like
autoscaler.go:49] failed to discover apigroup for kind "DaemonSet": unable to retrieve the complete list of server APIs: admission.certmanager.k8s.io/v1beta1: an error on the server ("service unavailable") has prevented the request from succeeding
My first idea was that this might have been caused by the kubernetes update - but this was not the case, as a second cluster - that was not updated - had the same strange issues.
So I've completely uninstalled cert-manager (also removed all CRD's) and installed version 0.5.2 again.
After that procedure I also had to recreate the clusterissuer, as it was also gone.
Now everything is up and running again... for the moment I'm gonna stay with the old cert-manager version - at least this version is working for me and not causing strange issues.
did anyone already try with cert-manager 0.6.5?
@rmuehlbauer Trying with v0.6.5, same error... Any workarounds?
at least not to my knowledge...
today I had some time to dig somewhat deeper and finally could resolve the issue with new cert-manager versions - maybe you guys can use this to sort things out on your side.
TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.
After working my way thorough cert-manager's "getting started guide" and "troubleshooting guide", I found a Note (on the very bottom on the troubleshooting guide) saying: "If the job continues to fail, please read the Webhook docs for additional information."
Now, on this Webhook Doc (which you can find here: https://cert-manager.readthedocs.io/en/latest/getting-started/webhook.html) I found a interesting piece of information, regarding running cert-manager on private GKE clusters.
On GKE environments the K8s masters only have very limited access to its nodes. Now, to be able to use cert-managers webhook you have to allow those connections also.
This was somehow the missing piece of information - Now it was easy to work my way through the GKE docs (found here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules), gather all the little pieces together and create a new firewall rule which solved the issue for me.
Basically I allowed the K8s master network to access the webhook pod on port 6443 (if you have a deeper look on the webhook pod, you will see that it acually listens on that port and only the webhook service translated that port from 6443 to 443)
I hope this piece of information helps a bit to sort out situations on your side.
issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod
issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod
What is the gcloud command line to create this rule?
i would like to see that gke command as well please
For the record, I did not have this problem when setting up cert-manager.
From my notes, here is literally everything that was needed to set up the cert-manager on a new cluster.
# Install the CustomResourceDefinition resources separately
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.7/deploy/manifests/00-crds.yaml
# Create the namespace for cert-manager
kubectl create namespace cert-manager
# Label the cert-manager namespace to disable resource validation
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io
# Update your local Helm chart repository cache
helm repo update
# Install the cert-manager Helm chart
helm install \
--name cert-manager \
--namespace cert-manager \
--version v0.7.0 \
jetstack/cert-manager
kubectl get pods --namespace cert-manager
# Setup cluster issuer (using letsencrypt)
cat <<'EOF' | kubectl create -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-production
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: '[email protected]'
privateKeySecretRef:
name: letsencrypt-production
http01: {}
EOF
# Set up certficate (replace with your details)
cat <<'EOF' | kubectl replace -f -
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: queryalert-com
namespace: default
spec:
secretName: queryalert-com-tls
issuerRef:
name: letsencrypt-production
kind: ClusterIssuer
commonName: queryalert.com
dnsNames:
- queryalert.com
acme:
config:
- http01:
ingressClass: nginx
domains:
- queryalert.com
EOF
Then just update Ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: {{ .Release.Name | quote }}
labels:
{{- include "release_labels" . | indent 4 }}
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
+ certmanager.k8s.io/cluster-issuer: 'letsencrypt-production'
+ certmanager.k8s.io/acme-challenge-type: http01
+spec:
+ tls:
+ - hosts:
+ - queryalert.com
+ secretName: queryalert-com-tls
rules:
- host: queryalert.com
http:
paths:
- path: /api
backend:
serviceName: {{ .Release.Name | quote }}
servicePort: 8080
with 0.7.0 it works indeed thx
issue was solved by allowing my k8s master to access its nodes on port 6443 - which is used on the cert manager webhook pod
What is the gcloud command line to create this rule?
please have a look at https://github.com/helm/charts/issues/10869#issuecomment-467015706
So, we are running a private cluster on GKE:
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}
I created firewall
CLUSTER=staging
REGION=europe-west4
SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ')
NETWORK=$(gcloud container clusters describe $CLUSTER --region $REGION | egrep '^network:' | cut -d ':' -f 2 | tr -d ' ')
TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1)
gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGS --network $NETWORK
Then, I tried repeating steps from here https://github.com/helm/charts/issues/10869#issuecomment-473567532 and getting the same error when trying to create ClusterIssuer:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request
Could you point me what else am I missing?
So, we are running a private cluster on GKE:
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}I created firewall
CLUSTER=staging REGION=europe-west4 SOURCE=$(gcloud container clusters describe $CLUSTER --region $REGION | grep masterIpv4CidrBlock| cut -d ':' -f 2 | tr -d ' ') TAGS=$(gcloud compute firewall-rules list --filter "name~^gke-$CLUSTER" --format 'value(targetTags.list():label=TARGET_TAGS)' | head -n 1) gcloud compute firewall-rules create cert-manager-admission-webhook --action ALLOW --direction INGRESS --source-ranges $SOURCE --rules tcp:6443 --target-tags $TAGSThen, I tried repeating steps from here #10869 (comment) and getting the same error when trying to create ClusterIssuer:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the requestCould you point me what else am I missing?
hmm...your SOURCE and TAGS variables seem to be populated with the correct values - at least when I tried your commands in my environment.
also your "firewall-rules create" line is looking good.
That is exactly that got it finally working in my case...
What about the resulting firewall rule - did you check it is effective for the gke hosts in your cluster? (I dont know how to check this using cli but you can easily see it in the webgui in the firewall rules details on the very bottom of the page...)
@rmuehlbauer thanks, good observation, I haven't really checked if rules were applied and they weren't, I am not sure why, maybe it has something to do with custon node pools or preemptible nodes.
UPDATE: It was network, damn it, our cluster doesn't run on default network. I've updated my commands
One more update, I got a step further with connections, but now getting this:
I0404 15:31:48.644874 1 request.go:942] Request Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false}}
I0404 15:31:48.645004 1 round_trippers.go:419] curl -k -v -XPOST -H "Content-Type: application/json" -H "User-Agent: image.app_linux-amd64.binary/v0.0.0 (linux/amd64) kubernetes/$Format" -H "Accept: application/json, */*" -H "Authorization: Bearer blalblalal" 'https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews'
I0404 15:31:48.654331 1 round_trippers.go:438] POST https://10.125.192.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 9 milliseconds
I0404 15:31:48.654376 1 round_trippers.go:444] Response Headers:
I0404 15:31:48.654392 1 round_trippers.go:447] Audit-Id: c5990609-2b9d-47ce-9bda-d15180940f1c
I0404 15:31:48.654397 1 round_trippers.go:447] Content-Type: application/json
I0404 15:31:48.654400 1 round_trippers.go:447] Content-Length: 294
I0404 15:31:48.654403 1 round_trippers.go:447] Date: Thu, 04 Apr 2019 15:31:48 GMT
I0404 15:31:48.654441 1 request.go:942] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"nonResourceAttributes":{"path":"/","verb":"get"},"user":"system:anonymous","group":["system:unauthenticated"]},"status":{"allowed":false,"reason":"no RBAC policy matched"}}
I0404 15:31:48.654606 1 authorization.go:73] Forbidden: "/", Reason: "no RBAC policy matched"
I0404 15:31:48.654766 1 wrap.go:47] GET /: (10.220427ms) 403 [Go-http-client/2.0 172.16.0.10:54560]
I0404 15:31:49.774434 1 log.go:172] http: TLS handshake error from 172.16.0.11:56368: remote error: tls: bad certificate
I0404 15:31:50.357270 1 log.go:172] http: TLS handshake error from 172.16.0.10:58938: remote error: tls: bad certificate
I0404 15:31:52.656929 1 log.go:172] http: TLS handshake error from 172.16.0.10:58960: remote error: tls: bad certificate
I0404 15:31:55.179124 1 log.go:172] http: TLS handshake error from 172.16.0.10:58966: remote error: tls: bad certificate
I0404 15:31:56.677010 1 log.go:172] http: TLS handshake error from 172.16.0.10:58972: remote error: tls: bad certificate
I was updating from 0.5.2, so maybe something got messed up along those lines, I may try clean install again a bit later
@mlushpenko I think you are hitting some RBAC issues - hava a look at https://docs.cert-manager.io/en/latest/getting-started/install.html - especially about the note regarding RBAC and GKE...hopefully that fixes your problem
@rmuehlbauer thanks for suggestion, although looks fine:
kubectl describe clusterrolebinding cluster-admin-binding
Name: cluster-admin-binding
Labels: <none>
Annotations: <none>
Role:
Kind: ClusterRole
Name: cluster-admin
Subjects:
Kind Name Namespace
---- ---- ---------
User [email protected]
I was testing by running helm with my permissions, but it does look related to RBAC as it states in the error log. It feels to me like whoever is calling the API (probably webhook pod or cert-manager) is not running with specific SA because it tries to use anonymous user:
"user":"system:anonymous","group":["system:unauthenticated"]}
Do you have idea about validation process? I read How it works section, but didn't find relevant info.
From the other side, I probably won't be spending much more time with it now, but maybe this will help some other people if they encounter similar issues.
TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.
This hint helps me solve the problem.
Some notes here: for cluster created by kops, cross subnet mode may need to be enabled explicitly on AWS when using calico.
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
name: k8s.local
spec:
networking:
# Ref: https://github.com/kubernetes/kops/blob/master/docs/networking.md#enable-cross-subnet-mode-in-calico-aws-only
calico:
crossSubnet: true
TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.
I dont actually have masterIpv4CidrBlock listed when i describe the cluster, likely because it isn't a private cluster. Also masterIpv4CidrBlock is noted as deprecated in these docs?

https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.zones.clusters
Anyone have a workaround here that doesn't involve all the madness with private clusters? Trying to set this up on a vpc-native gke cluster.
@oshalygin CIDR is 0.0.0.0/0 on a public cluster
Credit to @skuro for the gcloud commands to add the firewall:
https://github.com/jetstack/cert-manager/issues/2109#issuecomment-535901422
In my case, I was deploying via terraform cert-manager and cert-manager-issuer charts in the same script.
I fixed it when set depends on cert-manager for cert-manager-issuer. Cert-manager should deploy first
Most helpful comment
today I had some time to dig somewhat deeper and finally could resolve the issue with new cert-manager versions - maybe you guys can use this to sort things out on your side.
TL;DR:
There was a firewall rule missing. Allow Kubernetes master (network) to access the cert-manager-webhook pod on port 6443.
After working my way thorough cert-manager's "getting started guide" and "troubleshooting guide", I found a Note (on the very bottom on the troubleshooting guide) saying: "If the job continues to fail, please read the Webhook docs for additional information."
Now, on this Webhook Doc (which you can find here: https://cert-manager.readthedocs.io/en/latest/getting-started/webhook.html) I found a interesting piece of information, regarding running cert-manager on private GKE clusters.
On GKE environments the K8s masters only have very limited access to its nodes. Now, to be able to use cert-managers webhook you have to allow those connections also.
This was somehow the missing piece of information - Now it was easy to work my way through the GKE docs (found here: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules), gather all the little pieces together and create a new firewall rule which solved the issue for me.
Basically I allowed the K8s master network to access the webhook pod on port 6443 (if you have a deeper look on the webhook pod, you will see that it acually listens on that port and only the webhook service translated that port from 6443 to 443)
I hope this piece of information helps a bit to sort out situations on your side.