Bugs should be filed for issues encountered whilst operating cert-manager.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors.
Please provide as much detail as possible.
Describe the bug:
Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: Service Unavailable
This happens trying to test the functionality with the test-resources manifest.
Expected behaviour:
We should get a certificate
Steps to reproduce the bug:
Just apply Manifest in the documentation
Anything else we need to know?:
Environment details::
/kind bug
I have a similar issue too. I installed the cert-manager (v0.13) with regular manifest and when I tried to run the test resources I got the same error message except the 'Service Unavailable' part.
Here is the error message:
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: notresolvable
OK. So I figured out the why and made a correction to our kubernetes.
https://cert-manager-webhook.cert-manager.svc:443
This domain is short for svc.cluster.local and cluster.local is in the search path but gets truncated by linux as there is a limit.
In order to fix it we added "cluster.local" to docker's search path since we use docker_dns for our resolution on the cluster. The fix would be related to how you maintain your kubernetes of course.
We make use of kubspray to deploy all of our clusters so we simply added the below to our values.yaml when building the cluster:
searchdomains:
- cluster.local
By default it is not in the kubspray config. searchdomains paramater is concatenated to the docker search path.
I used the standard helm chart after that with no special configurations and it works perfectly now.
Thought I would share this because it took some time to dig into the problem and find the correct solution.
Enjoy....
After some testing and rebuilding, I don't believe the above worked but instead the webhook was not enabled.
Can someone explain how this URL is formed: https://cert-manager-webhook.cert-manager.svc:443
Really it should reference as:
https://cert-manager-webhook.cert-manager.svc.cluster.local:443
or
https://cert-manager-webhook.cert-manager:443
Any guidance would be appreciated.
Also and ran sonobuoy and cluster 100% passed.
@munnerz Can you possibly provide a quick feedback? I would prefer to run this with webhook enabled.
I spoke to a colleague and he suggests this might be related to 1.16 and broken api paths there.
It looks like
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
and
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
have indeed changed and I believe the service is simply not registering properly.
@rsliotta what do you think have changed? I checked the changelog for 1.16, and the only thing is that they moved it from v1beta1 to v1, and changed some default values. Nothing that should affect the api paths.
Cannot figure it out actually. Just does not work. I do see the same behavior due to version changes on a few items such as helm tiller (had a workaround), kubefed (Have not dug into it yet) and a few other containers.
If I do a kubectl get apiservice should I see something for cert-manager-webhook?
I do see the admissionwebhook and mutatingwebhook...
$ k get apiservice v1beta1.webhook.cert-manager.io
NAME SERVICE AVAILABLE AGE
v1beta1.webhook.cert-manager.io cert-manager/cert-manager-webhook False (FailedDiscoveryCheck) 11m
is what I have. describe gives me the following message:
failing or missing response from https://10.254.80.38:443/apis/webhook.cert-manager.io/v1beta1: bad status from https://10.254.80.38:443/apis/webhook.cert-manager.io/v1beta1: 403
I get nothing for that so mine does not even register in my case. Thats what I thought. My earlier ramblings were me digging to get to this point.
Did you use helm or manifest? Can you share yours? You seemed to get farther. Maybe I missed something. If I disable webhook all else is working
I'm setting up Kubeflow, so they have their own setup via Kustomize. The file looks like this:
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
name: cert-manager-webhook
labels:
app: webhook
annotations:
cert-manager.io/inject-apiserver-ca: "true"
webhooks:
- name: webhook.certmanager.k8s.io
rules:
- apiGroups:
- "cert-manager.io"
apiVersions:
- v1alpha2
operations:
- CREATE
- UPDATE
resources:
- certificates
- issuers
- clusterissuers
- certificaterequests
failurePolicy: Fail
sideEffects: None
clientConfig:
service:
name: kubernetes
namespace: default
path: /apis/webhook.cert-manager.io/v1beta1/validations
caBundle: ""
I even tried applying yours and no luck. No APIService registered.
Sent you the wrong one:
darwin/aapiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
name: v1beta1.webhook.cert-manager.io
labels:
app: webhook
annotations:
cert-manager.io/inject-ca-from-secret: "cert-manager/cert-manager-webhook-tls"
spec:
group: webhook.cert-manager.io
groupPriorityMinimum: 1000
versionPriority: 15
service:
name: cert-manager-webhook
namespace: cert-manager
version: v1beta1
OK. So this is missing from the cert-manager.yaml i guess.
I tried it, had to enable hostNetwork: true but get a 404 as the path is wrong with discovery error now.
The APIService resource was removed in v0.12 onwards which is likely the
‘change’ you referred to in your earlier comment:
https://cert-manager.io/docs/installation/upgrading/upgrading-0.11-0.12/#removal-of-the-webhook-api-service
What version of cert-manager is being deployed here?
On Tue, 10 Mar 2020 at 18:24, Robert Liotta notifications@github.com
wrote:
OK. So this is missing from the cert-manager.yaml i guess.
I tried it, had to enable hostNetwork: true but get a 404 as the path is
wrong with discovery error now.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/2640?email_source=notifications&email_token=AABRWP4JOTAV7VGGD24J74DRG2AXHA5CNFSM4LAKNRJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOMS2NQ#issuecomment-597241142,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AABRWP72EXYNCW3E3N2WHCDRG2AXHANCNFSM4LAKNRJQ
.
k describe apiservice v1beta1.webhook.cert-manager.io
Name: v1beta1.webhook.cert-manager.io
Namespace:
Labels: app=webhook
Annotations: cert-manager.io/inject-ca-from-secret: cert-manager/cert-manager-webhook-tls
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{"cert-manager.io/inject-ca-from-secret":"cert-man...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2020-03-10T17:25:50Z
Resource Version: 84623046
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.webhook.cert-manager.io
UID: bc456065-0b13-44c0-8e4b-8981b5f7a895
Spec:
Ca Bundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURRakNDQWlxZ0F3SUJBZ0lRVG1FWEhnby8zTy9lcU9ubEpsaXBVREFOQmdrcWhraUc5dzBCQVFzRkFEQkEKTVJ3d0dnWURWUVFLRXhOalpYSjBMVzFoYm1GblpYSXVjM2x6ZEdWdE1TQXdIZ1lEVlFRREV4ZGpaWEowTFcxaApibUZuWlhJdWQyVmlhRzl2YXk1allUQWVGdzB5TURBek1UQXhOekkyTlRkYUZ3MHlOVEF6TURreE56STJOVGRhCk1FQXhIREFhQmdOVkJBb1RFMk5sY25RdGJXRnVZV2RsY2k1emVYTjBaVzB4SURBZUJnTlZCQU1URjJObGNuUXQKYldGdVlXZGxjaTUzWldKb2IyOXJMbU5oTUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQwpBUUVBdFNDMmIxNS94MEtOZzhpbDAxSktDcTVxM2NxMlhjQ0MwSURhSkdBS3NFTkpLZnRHYWxDbnhLQk5VS0c2CkNwek1UL1dYVzZ0Qlk4dmhObFI3UnpSRVRBT0J5QzlsdzZOR2tFelVncm5sQlEycEdNMU82azQ0WUt1YVJPZVAKR0JxVEcrdzdGL1poWWtiRmxLUUVpOHpYSHBhOWZLbkhGcENHVTN6M3dRYnpHU09TQUNmT25rcllzWWNPQ2EyegpPSjQ3Mm9hNXl2c0IvZkpIc2Z3VE42YmhDcEZmM1AwZWsxd1VyaDUvdElJKy9DNFB2SkpuRmhicFpBWWpsZFYyCmtjaEFINzhsb1hGQ1VWVW5QTTZoRFg5cXJUd0xMOTZGdEJVMnhlN2NzNndlNmhPK3piQjQ4T1FmQnlKYlgzaFUKSHpwenRVVDMrYnJHQVBLTHdEb1Y1WVRyOHdJREFRQUJvemd3TmpBT0JnTlZIUThCQWY4RUJBTUNBcVF3RXdZRApWUjBsQkF3d0NnWUlLd1lCQlFVSEF3RXdEd1lEVlIwVEFRSC9CQVV3QXdFQi96QU5CZ2txaGtpRzl3MEJBUXNGCkFBT0NBUUVBWkE4L01MaTlFVUoyakc3aFEvV3NRaUgzbHlrclJmeHNweEJ2SGMwVTN4bU8ySng1WXhxSXQzYVQKSjMxRk15dlRsUHlabWh5Nlp6NmZXT1l1cFV5ZGdYeWo5a283M2ZWRkZ6WW9PZVJjQnlQaGtSYTA2TDc4KzlaSApvYkZsTUxXdHYxZDU3V3dRUlpPTVhQT1J5dWlNRlBvRUo2NXhEb2FIZUVZSTB0SzVqYlRYMk9VdVQvcVlwTDhQCnQvQWJ3RElSQXlJQ0tIWHlUOVkwUHdkTVBmd3FrM1RTcjR6KzNRbEJEUzkyZE1jOGRXdzc3UFJyM25pNmdWNzMKL1VoK3p4UXlvVk5kbEhIV29hZmYwSnlWWDJOWXJyOGRTL1RNQlNqbU5kaDhjTjRLQjJ2aG90Y1BWcEVVY25xVQpUNlpRVkN3UnhtWVRsMGxMUThwYUhYYzAwb2hQakE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
Group: webhook.cert-manager.io
Group Priority Minimum: 1000
Service:
Name: cert-manager-webhook
Namespace: cert-manager
Port: 443
Version: v1beta1
Version Priority: 15
Status:
Conditions:
Last Transition Time: 2020-03-10T17:25:50Z
Message: failing or missing response from https://10.233.78.240:11250/apis/webhook.cert-manager.io/v1beta1: Get https://10.233.78.240:11250/apis/webhook.cert-manager.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events:
Hello.
I tried 0.13 and 0.14.
I can go back to the vanilla standalone install and we get this after install.
Error from server (InternalError): error when creating "/tmp/test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: Service Unavailable
This was the test enample. Same for issuer and clusterissuer
That service is running if I fully qualify the domain to .svc.cluster.local
Then I'm guessing there is something wrong with your service discovery, as app.namespace is just short for app.name.svc, which again is short for app.namespace.svc.cluster.local. All of these three should work. Also just app should work if your in the same namespace.
I have this same behavior on all the 1.16 clusters. It worked on 1.15. I agree but I believe something changed in 1.16. I have rechecked it and cluster.local is in the search path but seems to get ignored on 1.16
I figured out what is going on and it looks like nodelocaldns is not resolving .svc but the cluster dns is. I think it may be misconfigured from kubespray and I am looking into that.
Can someone kindly do a ps -ef | grep dockerd and post the output. I believe mine is pointing at the wrong dns.
Gone through a full cluster validation:
found a few things regarding system DNS, and found that Kubespray does not enable admission-plugins by default. They are now enabled as referenced by:
--enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,Priority,ResourceQuota
Also api looks to be registered:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
creationTimestamp: "2020-03-19T11:57:45Z"
labels:
kube-aggregator.kubernetes.io/automanaged: "true"
name: v1alpha3.cert-manager.io
resourceVersion: "88739394"
selfLink: /apis/apiregistration.k8s.io/v1/apiservices/v1alpha3.cert-manager.io
uid: 0f86b652-4b42-4ae1-b8f6-146f552b6b2c
spec:
group: cert-manager.io
groupPriorityMinimum: 1000
service: null
version: v1alpha3
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2020-03-19T11:57:45Z"
message: Local APIServices are always available
reason: Local
status: "True"
type: Available
And I made sure api-server can reach the service by:
k exec -n kube-system kube-apiserver-karin-burns -- /bin/ping cert-manager-webhook.cert-manager.svc
64 bytes from cert-manager-webhook.cert-manager.svc.cluster.local (10.233.10.25): icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from cert-manager-webhook.cert-manager.svc.cluster.local (10.233.10.25): icmp_seq=2 ttl=64 time=0.126 ms
64 bytes from cert-manager-webhook.cert-manager.svc.cluster.local (10.233.10.25): icmp_seq=3 ttl=64 time=0.197 ms
64 bytes from cert-manager-webhook.cert-manager.svc.cluster.local (10.233.10.25): icmp_seq=4 ttl=64 time=0.114 ms
I also created a centos pod and can curl the service. It is running.
Sadly I still get 503 returned. Any other ideas?
So finally I figured out what was going on.
It turns out that although normal pods do not inherit the system proxy, kube-apiserver does.
The final issue was I had to add '.svc' and '.cluster.local' to the no_proxy for the kubespray call.
I found it by executing to kube-apiserver with env as follows:
k exec -n kube-system kube-apiserver-nodename -- env | grep -i proxy
I cannot show the output but the no_proxy was in the environment and missing the two above.
In summary I had to do only two things:
values.yaml:
kube_apiserver_enable_admission_plugins:
- "NamespaceLifecycle"
- "LimitRanger"
- "ServiceAccount"
- "DefaultStorageClass"
- "DefaultTolerationSeconds"
- "MutatingAdmissionWebhook"
- "ValidatingAdmissionWebhook"
- "Priority"
- "ResourceQuota"
my inventory_file:
additional_no_proxy=.domain,corp,.company.com,.svc,.svc.cluster.local
on the host record
I had very similar issues to this @rsliotta
The no_proxy additions of .svc,.svc.cluster.local were what got me working. Fantastic writeup of your troubleshooting steps and process. You're a life saver!! 💯
@rsliotta Thank you so much for sharing that!
You are welcome. Spent about a week figuring it out. It was hair pulling…
Best Regards,
[signature_1824511968]
[signature_65162848]
BOB LIOTTA
SVP IT ENTERPRISE ARCHITECT
robert.[email protected]robert.liotta@cevalogistics.com
Tel : +1 904 928 2322
8800 Baymeadows Way West – Jacksonville FL 32256
www.cevalogistics.comhttp://www.cevalogistics.com
From: Mikhail Lukianchenko notifications@github.com
Reply-To: jetstack/cert-manager reply@reply.github.com
Date: Friday, April 17, 2020 at 12:25 PM
To: jetstack/cert-manager cert-manager@noreply.github.com
Cc: "Liotta, Robert" Robert.Liotta@Cevalogistics.com, Mention mention@noreply.github.com
Subject: Re: [jetstack/cert-manager] RESOLVED: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: Service Unavailable (#2640)
@rsliottahttps://github.com/rsliotta Thank you so much for sharing that!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/jetstack/cert-manager/issues/2640#issuecomment-615340247, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKUB3ZUK47WRAE3TMPKB6KLRNB7GZANCNFSM4LAKNRJQ.
NVOCC Services are provided by CEVA as agents for and on behalf of Pyramid Lines Limited trading as Pyramid Lines.
This e-mail message is intended for the above named recipient(s) only. It may contain confidential information that is privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail by error, please immediately notify the sender by replying to this e-mail and deleting the message including any attachment(s) from your system. Thank you in advance for your cooperation and assistance. Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.
This sounds like it was a tough one to debug/work out! To try and help others in future, would anyone be able to open a PR to document errors like this and to explain that if you run in an environment with a proxy configured on your nodes, you should be sure to have the svc.cluster.local suffix skip proxying? It could save someone multiple hours/days again in future! 😄
@rsliotta where are you setting the proxy settings in the kubeflow manifests / what command were you using to do this? I am struggling to find the spec I need to add the settings you figured out. Should these proxy settings be in the kfdef.yaml by the webhook api service or elsewhere?
Please expand.
@rsliotta Would you mind expanding a bit more on @tdigangi5 question?
Tony, did you figure it out. Would appreciate some help. I'm using kubeadm instead of kubespray and when I grep the apiserver's env vars for additional_no_proxy nothing returns.
@rsliotta Would you mind expanding a bit more on @tdigangi5 question?
Tony, did you figure it out. Would appreciate some help. I'm using
kubeadminstead of kubespray and when I grep the apiserver'senvvars foradditional_no_proxynothing returns.
AFAIK kubeadm checks your current environment variables when initializing. So before running kubeadm init, you should
export no_proxy=$no_proxy,.svc,.svc.cluster.local and kubeadm should then pick them up. Worked for me.
So finally I figured out what was going on.
It turns out that although normal pods do not inherit the system proxy, kube-apiserver does.
The final issue was I had to add '.svc' and '.cluster.local' to the no_proxy for the kubespray call.
I found it by executing to kube-apiserver with env as follows:
k exec -n kube-system kube-apiserver-nodename -- env | grep -i proxyI cannot show the output but the no_proxy was in the environment and missing the two above.
In summary I had to do only two things:
values.yaml:
kube_apiserver_enable_admission_plugins: - "NamespaceLifecycle" - "LimitRanger" - "ServiceAccount" - "DefaultStorageClass" - "DefaultTolerationSeconds" - "MutatingAdmissionWebhook" - "ValidatingAdmissionWebhook" - "Priority" - "ResourceQuota"my inventory_file:
additional_no_proxy=.domain,corp,.company.com,.svc,.svc.cluster.localon the host record
thank you @rsliotta . real life saver.
Most helpful comment
So finally I figured out what was going on.
It turns out that although normal pods do not inherit the system proxy, kube-apiserver does.
The final issue was I had to add '.svc' and '.cluster.local' to the no_proxy for the kubespray call.
I found it by executing to kube-apiserver with env as follows:
I cannot show the output but the no_proxy was in the environment and missing the two above.
In summary I had to do only two things:
values.yaml:
my inventory_file:
on the host record