Cert-manager: i/o timeout from apiserver when connecting to webhook on k3s

Created on 16 Apr 2020  路  16Comments  路  Source: jetstack/cert-manager

Bugs should be filed for issues encountered whilst operating cert-manager.
You should first attempt to resolve your issues through the community support
channels, e.g. Slack, in order to rule out individual configuration errors.
Please provide as much detail as possible.

Describe the bug:
I was following along the steps at here: https://cert-manager.io/docs/installation/kubernetes/

Expected behaviour:
I got an issue when trying to test the installation

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: dial tcp 10.43.18.
211:443: i/o timeout

Steps to reproduce the bug:
Steps to reproduce the bug should be clear and easily reproducible to help people
gain an understanding of the problem.

Following the steps in the link above and got the issue when testing the installation

Anything else we need to know?:
The installation step is successfully, as I verified as follow

kubectl get pods --namespace cert-manager                                                                                                                                                                         Thu Apr 16 18:33:31 2020

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-cainjector-79f4496665-7gptd   1/1     Running   0          11m
cert-manager-57cdd66b-7xvj2                1/1     Running   0          11m
cert-manager-webhook-6d57dbf4f-28zjc       1/1     Running   0          11m

I'm using VMs from Google.
Environment details::

  • Kubernetes version (e.g. v1.10.2): v1.17.4
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): k3s
  • cert-manager version (e.g. v0.4.0): v0.14.0
  • Install method (e.g. helm or static manifests): Helm

/kind bug

aredeploy triagsupport

Most helpful comment

Same issue here using k3s. Resolved it using the flag --flannel-backend host-gw during the k3s setup. So it looks like something is wrong in the default flannel setup, but I didn't investigate further

All 16 comments

I have the same this scenario, installed the last version of cert-manager (0.15 alpha.1
we trying to create issuer and certificate by the test-resources.yaml i am getting the following error, the status of every thing is up and running but still i am facing this error:


Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded

Hey @sangnguyen7

How have you deployed k3s/what environment is it deployed into? This error indicates that your apiserver is unable to route traffic to the cert-manager webhook pod, which is a required component.

This is something that is required to be working in order for Kubernetes conformance tests to pass as far as I'm aware, so this indicates that somewhere along the line your cluster is not configured correctly.

Are you able to run Sonobuoy to check and ensure your cluster is set up properly? This will hopefully help you to pinpoint what's going on 馃槃

/triage support
/area deploy
/remove-kind bug

Thanks @munnerz. No, I have not tried the tool yet. I will double check with.

I had the same issue in k8s, not k3s. I resolved it by switching from flannel to calico.

Same issue here using k3s. Resolved it using the flag --flannel-backend host-gw during the k3s setup. So it looks like something is wrong in the default flannel setup, but I didn't investigate further

Are there any related issues in the k3s repository that this could be linked to?

@munnerz, it seems the issue is related to the network setup and not related to cert-manager, if there is nobody else having other issues, you can close this issue. Thanks!

@alecunsolo, same issue in k3s and i change flannel backend from vxlan to host-gw, but it doesn't work.

  • method to change flannel backend.
vim /etc/systemd/system/k3s.service
  • what I modified.
ExecStart=/usr/local/bin/k3s server --flannel-backend host-gw
  • check my k3s net-conf
[root@bowser1704 ~]# cat /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json
{
    "Network": "10.42.0.0/16",
    "Backend": {
    "Type": "host-gw"
}
}

But it still doesn't work.

[root@bowser1704 ~]# kubectl apply -f cert-manager/cluster-issuer.yaml
Error from server (InternalError): error when creating "cert-manager/cluster-issuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded

Do you have any suggestions?
thanks.


maybe realted to rancher/k3s#1266 rancher/k3s#1958

It tooks a few days to fix this problem.
it maybe related to coreos/flannel#1268. or vxlan bugs.

  1. Super slow access to service IP from host, maybe 60s delay
  2. Can't access pods by service cluster ip except you access from the node where pod in
  • So, one method to reslove this issue is to dispatch cert-manager-webook to the node where you are by using spec.nodeSelector.
  • another method is changing the flannel backend from vxlan to host-gw. P.S. but it seems doesn't work for me.

maybe related to rancher/k3s#1638
I fix by this command.

ethtool -K flannel.1 tx-checksum-ip-generic off

Partly comment, partly question:

We're running k3s (v1.17.7+k3s1, flannel, AWS AMI2) but the master is separated from the cluster (by firewall - allowing port 6443 from workers towards master EDIT: master runs no pods, not part of the cluster workload). Until now we've been using the "no-webhook" version of cert manager (until 0.13) without problems but after upgrade yesterday getting the same error when we try and list certs using kubectl on the master:

# /usr/local/bin/kubectl get cert --all-namespaces
Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: context deadline exceeded 

From a pod running on the cluster I can retrieve the certs (I tested the pod on different nodes, it always works):

curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  https://kubernetes.default.svc/apis/cert-manager.io/v1alpha2/certificates

Any idea why this would happen?

Partly comment, partly question:

We're running k3s (v1.17.7+k3s1, flannel, AWS AMI2) but the master is separated from the cluster (by firewall - allowing port 6443 from workers towards master EDIT: master runs no pods, not part of the cluster workload). Until now we've been using the "no-webhook" version of cert manager (until 0.13) without problems but after upgrade yesterday getting the same error when we try and list certs using kubectl on the master:

# /usr/local/bin/kubectl get cert --all-namespaces
Error from server: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: context deadline exceeded 

From a pod running on the cluster I can retrieve the certs (I tested the pod on different nodes, it always works):

curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  https://kubernetes.default.svc/apis/cert-manager.io/v1alpha2/certificates

Any idea why this would happen?

UPDATE: I moved my master server into the same security group and started k3s agent on the master node and it started working. So for anyone Googling this : if the master node is separated from the cluster and not running k3s agent, it doesn't appear that it will be able to contact the webhook server.

Having the same issue with our EKS cluster. Wondering if there is a fix to stop the pod from restarting on the time out and gracefully just adding an error to the logs ?

Error from server (InternalError): error when creating "clusterIssuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded
Getting above error.

杩欎釜闂鍙兘鏄痗ni瀵艰嚧鐨勶紝鎴戜慨鏀逛簡calico鐨刴tu鍚庤繖涓棶棰樿В鍐充簡(This problem may be caused by cni. After I modified the mtu of calico, the problem was solved.)

"mtu": 1440-> "mtu": 1420,

{
  "name": "k8s-pod-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "log_level": "info",
      "log_file_path": "/var/log/calico/cni/cni.log",
      "datastore_type": "kubernetes",
      "nodename": "k3s-operator-1",
      "mtu": 1420,
      "ipam": {
          "type": "calico-ipam"
      },
      "policy": {
          "type": "k8s"
      },
      "kubernetes": {
          "kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
      }
    },
    {
      "type": "portmap",
      "snat": true,
      "capabilities": {"portMappings": true}
    },
    {
      "type": "bandwidth",
      "capabilities": {"bandwidth": true}
    }
  ]
}

Was this page helpful?
0 / 5 - 0 ratings