Cert-manager: ErrVerifyACMEAccount: dial tcp: i/o timeout in v0.3.0

Created on 6 Jun 2018 · 16Comments · Source: jetstack/cert-manager

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
Got a ErrVerifyACMEAccount when creating an issuer.

$ kubectl describe issuer
Name:       v02-letsencrypt-staging
Namespace:  default
Labels:     <none>
Annotations:    <none>
API Version:    certmanager.k8s.io/v1alpha1
Kind:       Issuer
Metadata:
  Cluster Name:     
  Creation Timestamp:   2018-06-06T05:15:22Z
  Generation:       0
  Resource Version: 74067579
  Self Link:        /apis/certmanager.k8s.io/v1alpha1/namespaces/default/issuers/v02-letsencrypt-staging
  UID:          9ccfbd86-6948-11e8-93be-0a49fe81f092
Spec:
  Acme:
    Email:  email@somewhere
    Http 01:
    Private Key Secret Ref:
      Key:  
      Name: v02-letsencrypt-staging
    Server: https://acme-staging-v02.api.letsencrypt.org/directory
Status:
  Conditions:
    Last Transition Time:   2018-06-06T05:15:50Z
    Message:            Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
    Reason:         ErrRegisterACMEAccount
    Status:         False
    Type:           Ready
Events:
  FirstSeen LastSeen    Count   From        SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----        -------------   --------    ------          -------
  34m       34m     1   cert-manager            Warning     ErrVerifyACMEAccount    Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  34m       34m     1   cert-manager            Warning     ErrInitIssuer       Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  34m       11m     15  cert-manager            Warning     ErrVerifyACMEAccount    Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  34m       11m     15  cert-manager            Warning     ErrInitIssuer       Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout

What you expected to happen:
The issuer should be initialised.

How to reproduce it (as minimally and precisely as possible):

Install latest stable/cert-manager (v0.3.1)

helm install --name cert-manager stable/cert-manager --namespace kube-system

Create an issuer:
issuer.yaml file content:

apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
  name: v02-letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory

    # Email address used for ACME registration
    email: "email@somewhere"

    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: v02-letsencrypt-staging

    # Enable the HTTP-01 challenge provider
    http01: {}

$ kubectl create -f /tmp/issuer.yaml

Anything else we need to know?:
Logs from 'cert-manager' pod:

$ kubectl logs cert-manager-XYZ --namespace kube-system
...
I0606 06:09:33.874679       1 acme.go:159] getting private key (v02-letsencrypt-staging->tls.key) for acme issuer default/v02-letsencrypt-staging
I0606 06:09:33.874744       1 setup.go:46] v02-letsencrypt-staging: generating acme account private key "v02-letsencrypt-staging"
I0606 06:09:34.092759       1 logger.go:67] Calling GetAccount
I0606 06:09:39.093191       1 helpers.go:69] Setting lastTransitionTime for Issuer "v02-letsencrypt-staging" condition "Ready" to 2018-06-06 06:09:39.093172006 +0000 UTC m=+462.291870771
I0606 06:09:39.093261       1 sync.go:40] Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
E0606 06:09:39.100548       1 controller.go:145] issuers controller: Re-queuing item "default/v02-letsencrypt-staging" due to error processing: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
I0606 06:09:39.100588       1 controller.go:136] issuers controller: syncing item 'default/v02-letsencrypt-staging'
I0606 06:09:39.100626       1 acme.go:159] getting private key (v02-letsencrypt-staging->tls.key) for acme issuer default/v02-letsencrypt-staging
...

Environment:

Kubernetes version (use kubectl version):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T19:01:12Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.8", GitCommit:"c138b85178156011dc934c2c9f4837476876fb07", GitTreeState:"clean", BuildDate:"2018-05-21T18:53:18Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: AWS

triagsupport

Source

huyph

Most helpful comment

I found that the kube-dns deployment was missing! Indeed not an issue with cert-manager 👏

huyph on 7 Jun 2018

👍2

All 16 comments

It sounds like you've got some firewall configuration/setup incorrect, which is causing requests to the LE API to fail. Are you able to curl https://acme-staging-v02.api.letsencrypt.org/directory from within the cert-manager pod?

I cannot see any reported recent outages on the LE status page (https://letsencrypt.status.io/), so my guess is there is some networking issue in your cluster 😄

munnerz on 6 Jun 2018

👍1

Thanks for checking that.

Yeah. I am closing this for now. Will reopen if it ends up because of something else.

huyph on 7 Jun 2018

I found that the kube-dns deployment was missing! Indeed not an issue with cert-manager 👏

huyph on 7 Jun 2018

👍2

Hi, @huyph
I'm facing exactly the same issue, but have no idea how to fix kube-dns to get it working
Could you give me some advice?

Glorf on 10 Aug 2018

Hi @Glorf, the issue I ran into was related to my kube-dns deployment which was mysteriously down for some reasons. (This deployment is in charge of DNS lookup for all the pods in the cluster. When it was down, pods cannot connect to the outside world. These pods include the cert-manager pod )

I had to to re-deploy kube-dns by using the yaml file for kube-dns in the k8s state store (in s3 in my set-up).

I hope that you will be able to fix your problem soon ...

huyph on 11 Aug 2018

Turned out to be problem with kube-dns upstreamServers setting. Fixed it, thanks!

Glorf on 13 Aug 2018

Name:         letsencrypt-production
Namespace:    default
API Version:  certmanager.k8s.io/v1alpha1
Kind:         Issuer
Spec:
  Acme:
    Email:  [email protected]
    Private Key Secret Ref:
      Name:  letsencrypt-production
    Server:  https://acme-v02.api.letsencrypt.org/directory
Status:
  Acme:
    Uri:  
  Conditions:
    Last Transition Time:  2018-08-14T14:27:34Z
    Message:               Failed to verify ACME account: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
    Reason:                ErrRegisterACMEAccount
    Status:                False
    Type:                  Ready
Events:
  Type     Reason                Age                From          Message
  ----     ------                ----               ----          -------
  Warning  ErrVerifyACMEAccount  19m (x19 over 1h)  cert-manager  Failed to verify ACME account: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrInitIssuer         19m (x19 over 1h)  cert-manager  Error initializing issuer: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrInitIssuer         6m (x12 over 8m)   cert-manager  Error initializing issuer: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrVerifyACMEAccount  1m (x16 over 8m)   cert-manager  Failed to verify ACME account: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout

I redeployed cert-manager / coredns, didn't help :(

kojidev on 14 Aug 2018

Feel free to take a look how I fixed it, I wrote it down in stackoverflow question here

Glorf on 14 Aug 2018

I have coredns it has slightly different syntax, I had:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
    }
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system

I added nameservers from /run/systemd/resolve/resolv.conf

upstream 195.182.118.53 195.182.169.53

redeployed cert-manager and coredns

It didn't make a difference

kojidev on 14 Aug 2018

I had the same problem, restarting kube-dns solved it. I deleted one of the two pods, Kubernetes auto-restarted it.

stieler-it on 16 Aug 2018

@stieler-it I tried, didn't work :(

kojidev on 16 Aug 2018

@kojiDev Can you open a shell in a container (cert-manager) and try to

curl https://acme-v02.api.letsencrypt.org/directory
nslookup acme-v02.api.letsencrypt.org
nslookup acme-v02.api.letsencrypt.org (your kube-dns virtual IP)

to see if it's actually a DNS problem you are facing?

stieler-it on 16 Aug 2018

@stieler-it
well, it doesn't have a curl and I don't have permissions to install one :(

/ $ curl https://acme-v02.api.letsencrypt.org/directory
/bin/sh: curl: not found
/ $ nslookup acme-v02.api.letsencrypt.org
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'acme-v02.api.letsencrypt.org': Try again
/ $ nslookup acme-v02.api.letsencrypt.org 10.96.0.10
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'acme-v02.api.letsencrypt.org': Try again

this might be useful

/ $ printenv 
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.96.0.1:443
KUBERNETES_DASHBOARD_PORT=tcp://10.107.232.27:443
KUBERNETES_DASHBOARD_SERVICE_PORT=443
KUBE_DNS_SERVICE_PORT_DNS_TCP=53
TILLER_DEPLOY_SERVICE_HOST=10.96.100.78
HOSTNAME=cert-manager-6b47fc5fc-8nlm9
SHLVL=1
HOME=/home/certmanager
KUBE_DNS_SERVICE_HOST=10.96.0.10
TILLER_DEPLOY_SERVICE_PORT=44134
TILLER_DEPLOY_PORT=tcp://10.96.100.78:44134
TILLER_DEPLOY_PORT_44134_TCP_ADDR=10.96.100.78
KUBE_DNS_PORT=udp://10.96.0.10:53
TILLER_DEPLOY_PORT_44134_TCP_PORT=44134
KUBE_DNS_SERVICE_PORT=53
TILLER_DEPLOY_PORT_44134_TCP_PROTO=tcp
CALICO_TYPHA_SERVICE_PORT_CALICO_TYPHA=5473
CALICO_TYPHA_SERVICE_HOST=10.99.48.101
TERM=xterm
KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
TILLER_DEPLOY_SERVICE_PORT_TILLER=44134
KUBERNETES_DASHBOARD_PORT_443_TCP_ADDR=10.107.232.27
CALICO_TYPHA_PORT_5473_TCP_ADDR=10.99.48.101
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBE_DNS_PORT_53_TCP_ADDR=10.96.0.10
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_DASHBOARD_PORT_443_TCP_PORT=443
KUBE_DNS_PORT_53_UDP_ADDR=10.96.0.10
CALICO_TYPHA_PORT_5473_TCP_PORT=5473
KUBERNETES_DASHBOARD_PORT_443_TCP_PROTO=tcp
KUBE_DNS_PORT_53_TCP_PORT=53
CALICO_TYPHA_SERVICE_PORT=5473
TILLER_DEPLOY_PORT_44134_TCP=tcp://10.96.100.78:44134
CALICO_TYPHA_PORT_5473_TCP_PROTO=tcp
CALICO_TYPHA_PORT=tcp://10.99.48.101:5473
KUBE_DNS_PORT_53_TCP_PROTO=tcp
KUBE_DNS_PORT_53_UDP_PORT=53
KUBE_DNS_SERVICE_PORT_DNS=53
KUBE_DNS_PORT_53_UDP_PROTO=udp
KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
KUBERNETES_SERVICE_PORT_HTTPS=443
POD_NAMESPACE=kube-system
PWD=/
KUBERNETES_DASHBOARD_PORT_443_TCP=tcp://10.107.232.27:443
CALICO_TYPHA_PORT_5473_TCP=tcp://10.99.48.101:5473
KUBERNETES_SERVICE_HOST=10.96.0.1
KUBE_DNS_PORT_53_TCP=tcp://10.96.0.10:53
KUBERNETES_DASHBOARD_SERVICE_HOST=10.107.232.27
KUBE_DNS_PORT_53_UDP=udp://10.96.0.10:53

kojidev on 16 Aug 2018

Ok, so it's actually a DNS problem. I had two of them today, but could solve them with either a) restarting kube-dns pods or b) restarting the networking pod (in my case canal). Maybe you can try a kubectl describe of the kube-dns pods and see if they have problems?

I can't help you beyond that, still in my first K8s days - and I used Rancher 2 to setup most of the cluster.

stieler-it on 17 Aug 2018

@stieler-it thanks for your time, I basically reinstalled the whole cluster, but single node that time (before I had master and working node) and now it is working

kojidev on 17 Aug 2018

Restarted one of coredns pods and recreated letsencrypt-prod ClusterIssuer then it solved the issue