Cert-manager: Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout

Created on 15 Oct 2018 · 16Comments · Source: jetstack/cert-manager

Describe the bug:
I am trying to deploy hyperledger fabric on kubernetes on my local machine. While doing this my cert-manager pods giving me an error which I have seen in pod logs. According to me, the pod is not able to access this url https://acme-staging-v02.api.letsencrypt.org/directory. My kubernetes cluster is in the network.
Expected behaviour:
No error log in cert-manager pod.

Can you please look into it?
Environment details::

Kubernetes version (v1.11.2):
local system (ubuntu 16.04):
cert-manager version (e.g. v0.4.0):
Install method ( helm):

Source

akshay27395

Most helpful comment

We have a little workaround using dnsPolicy and dnsConfig to force name resolution through other nameservers on the cert-manager Pod (deployed with helm).

Pod's spec example:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

gemoya on 10 Jan 2019

👍2 🎉1

All 16 comments

I have same issue with k8s v1.12.1.
Get a shell in cert-manager and run this

/ # ping acme-staging-v02.api.letsencrypt.org
PING acme-staging-v02.api.letsencrypt.org (184.85.118.160): 56 data bytes
64 bytes from 184.85.118.160: seq=0 ttl=52 time=22.715 ms
64 bytes from 184.85.118.160: seq=1 ttl=52 time=23.009 ms
64 bytes from 184.85.118.160: seq=2 ttl=52 time=22.611 ms
64 bytes from 184.85.118.160: seq=3 ttl=52 time=22.844 ms
/ # wget https://acme-staging-v02.api.letsencrypt.org/directory
Connecting to acme-staging-v02.api.letsencrypt.org (184.85.118.160:443)
ssl_client: acme-staging-v02.api.letsencrypt.org: TLS connect failed
wget: error getting response: Connection reset by peer
/ # apk add curl
(1/3) Installing libssh2 (1.8.0-r1)
(2/3) Installing libcurl (7.61.1-r0)
(3/3) Installing curl (7.61.1-r0)
Executing busybox-1.26.2-r11.trigger
OK: 6 MiB in 17 packages
/ # curl https://acme-staging-v02.api.letsencrypt.org/directory
{
  "keyChange": "https://acme-staging-v02.api.letsencrypt.org/acme/key-change",
  "meta": {
    "caaIdentities": [
      "letsencrypt.org"
    ],
    "termsOfService": "https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf",
    "website": "https://letsencrypt.org/docs/staging-environment/"
  },
  "newAccount": "https://acme-staging-v02.api.letsencrypt.org/acme/new-acct",
  "newNonce": "https://acme-staging-v02.api.letsencrypt.org/acme/new-nonce",
  "newOrder": "https://acme-staging-v02.api.letsencrypt.org/acme/new-order",
  "revokeCert": "https://acme-staging-v02.api.letsencrypt.org/acme/revoke-cert",
  "wB_oZ6nkiPA": "https://community.letsencrypt.org/t/adding-random-entries-to-the-directory/33417"
/ # apk add wget
(1/1) Installing wget (1.19.5-r0)
Executing busybox-1.26.2-r11.trigger
OK: 6 MiB in 18 packages
/ # wget https://acme-staging-v02.api.letsencrypt.org/directory
--2018-10-17 10:09:35--  https://acme-staging-v02.api.letsencrypt.org/directory
Resolving acme-staging-v02.api.letsencrypt.org... 184.85.118.160, 2405:4800:10b:191::3a8e, 2405:4800:10b:190::3a8e
Connecting to acme-staging-v02.api.letsencrypt.org|184.85.118.160|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 724 [application/json]

busybox's wget doesn't work, curl works, wget works.

ntcong on 17 Oct 2018

Here is ClusterIssuer's log

  Warning  ErrInitIssuer         21m (x12 over 22m)  cert-manager-controller  Error initializing issuer: Head : unsupported protocol scheme ""
  Warning  ErrInitIssuer         19m (x7 over 21m)   cert-manager             Error initializing issuer: Post https://acme-staging-v02.api.letsencrypt.org/acme/acct/7150558: dial tcp: i/o timeout
  Warning  ErrVerifyACMEAccount  11m (x12 over 21m)  cert-manager             Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrInitIssuer         6m (x20 over 21m)   cert-manager             Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrVerifyACMEAccount  1m (x573 over 22m)  cert-manager-controller  Failed to verify ACME account: Head : unsupported protocol scheme ""
  Warning  ErrVerifyACMEAccount  56s (x29 over 21m)  cert-manager             Failed to verify ACME account: Post https://acme-staging-v02.api.letsencrypt.org/acme/acct/7150558: dial tcp: i/o timeout

ntcong on 17 Oct 2018

I'm seeing similar issues with cert-manager 0.5.0 running in a GKE 1.10.x cluster:

Events:
  Type     Reason                Age   From          Message
  ----     ------                ----  ----          -------
  Warning  ErrVerifyACMEAccount  20m   cert-manager  Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: net/http: TLS handshake timeout
  Warning  ErrInitIssuer         20m   cert-manager  Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: net/http: TLS handshake timeout
  Warning  ErrVerifyACMEAccount  20m   cert-manager  Failed to verify ACME account: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
  Warning  ErrInitIssuer         20m   cert-manager  Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout

... but in this case it appears to be non-fatal, as the ACME account registration was eventually successful. Perhaps the acme-staging-v02 endpoint is just slow sometimes?

benley on 17 Oct 2018

Look like the same to me, after a few hours my ingress got the cert, but the clusterissuer still doesn't ready.

Same problem with let's encrypt production so I don't think it's the staging API that slows.

ntcong on 18 Oct 2018

1 problem I noticed is cert-manager calling IPv6 in a IPv4 only cluster:

I1018 07:13:16.938931 1 sync.go:72] Error initializing issuer: Post https://acme-v02.api.letsencrypt.org/acme/acct/44059068: dial tcp [2405:4800:10b:191::3a8e]:443: i/o timeout

ntcong on 18 Oct 2018

I have exactly the same issue, i'm using Kubernetes 1.11.3 on Azure. It has been running for almost 12 straight hours and the error still occurs...

pedrorochaorg on 31 Oct 2018

Hey guys, I have same problem 😞 Did anyone solve this?
I tried do check IP6 related network conditions and they are seems properly disabled in my cluster.
But, cert-manager still prefers IP6 connection instead of IP4 when connecting to letsencrypt server and fails to connect.

ugurerkan on 2 Nov 2018

I'm seeing similar issues with cert-manager cert-manager-v0.5.0

jwthanh on 9 Nov 2018

seing same on cert-manager-v0.5.2

isamuelson on 25 Nov 2018

@isamuelson @jvthanh were you two able to solve this?

cdrage on 18 Dec 2018

@cdrage nope, I'm using kube-lego now

jwthanh on 19 Dec 2018

I also run into this same issue.

I use AWS EKS that comes with coredns.

My output is:

I0104 15:50:56.417580       1 setup.go:159] letsencrypt-prod: Failed to verify ACME account: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
I0104 15:50:56.417623       1 sync.go:71] Error initializing issuer: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout
E0104 15:50:56.417654       1 controller.go:149] clusterissuers controller: Re-queuing item "letsencrypt-prod" due to error processing: Get https://acme-v02.api.letsencrypt.org/directory: dial tcp: i/o timeout

and when I want to resolve or ping:

$ cat /etc/resolv.conf 
  nameserver 10.100.0.10
  search kube-system.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal us-west-2.compute.internal
  options ndots:5

$ ping acme-v02.api.letsencrypt.org
  ping: bad address 'acme-v02.api.letsencrypt.org'

$ wget https://acme-v02.api.letsencrypt.org/directory
  wget: bad address 'acme-v02.api.letsencrypt.org'

When I enter a other container I can access the url and ping normally. Any suggestions on how to debug/solve?

It is the same for the 0.5.0 and 0.5.2 version.

johan-smits on 4 Jan 2019

My issue got solved. In EKS the pod got scheduled on the EKS cluster. This cluster can't access the internet. Scheduling the pod on the real worker nodes got it working.

johan-smits on 5 Jan 2019

👍1

We have a little workaround using dnsPolicy and dnsConfig to force name resolution through other nameservers on the cert-manager Pod (deployed with helm).

Pod's spec example:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

gemoya on 10 Jan 2019

👍2 🎉1

This issue seems to be related to networking configuration and not specifically cert-manager.

If you are able to produce some kind of minimal reproduction so that we can attempt to recreate the issue and work on a fix, please open a new issue describing your problem 😄 I'm going to close this for now as it has become an umbrella for user support questions, which should be created separately.

munnerz on 7 Feb 2019

I'm running in this exact same issue with 0.8.0.
The strange thing is it sometimes works as expected and some times i get the i/o timeout.
I didn't observer any other issues with networking in my cluster beside the certmanager issue.
I tought of an IPv6 problem and if I exec in the certmanager pod and run

getent hosts acme-staging-v02.api.letsencrypt.org
2a02:26f0:cf:290::3a8e  e14990.dscx.akamaiedge.net  e14990.dscx.akamaiedge.net acme-staging-v02.api.letsencrypt.org

I, in fact, get an IPv6 address back. But the error message in the clusterissuer indicates an IPv4 address

  Warning  ErrInitIssuer         16m (x8 over 19m)  cert-manager  Error initializing issuer: Get https://acme-staging-v02.api.letsencrypt.org/directory: dial tcp 104.66.96.217:443: i/o timeout

I'm sorry this is not really the "minimal reproduction" case you asked for, but still this issue exists and as of now I have no idea on how to debug this further or break it down to a minimal example.