Cert-manager: http01 challenge succeeds, but authorization fails

Created on 27 Feb 2018 · 4Comments · Source: jetstack/cert-manager

/kind bug

What happened:

Disclaimer: I'm a relative newbie at k8s, so I'm very likely doing something wrong and just didn't RTFM correctly.

Super close to getting http01 ACME challenges working with letsencrypt-staging, but running into a strange authorization failure I can't seem to pin down. First, I installed cert-manager v0.2.3 via helm install --name cert-manager --namespace kube-system contrib/charts/cert-manager. My Issuer and Certificate config looks like this:

apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: [email protected]
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-staging
    # Enable the HTTP-01 challenge provider
    http01: {}
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
  name: domain-com-staging
  namespace: default
spec:
  secretName: domain-com-tls-staging
  issuerRef:
    name: letsencrypt-staging
    kind: ClusterIssuer
  commonName: domain.com
  acme:
    config:
    - http01:
        ingressClass: nginx
      domains:
      - domain.com

After applying it, according to the logs cert-manager picks it up and begins the process:

I0227 06:23:20.413039       1 sync.go:107] Error checking existing TLS certificate: secret "domain-com-tls-staging" not found
I0227 06:23:20.413065       1 sync.go:238] Preparing certificate with issuer
I0227 06:23:20.413506       1 prepare.go:239] Compare "" with "https://acme-staging.api.letsencrypt.org/acme/reg/<redacted>"

I see a mini challenge-response pod pop up called cm-domain-com-staging-*. Looks like the logs show the ACME server actually does perform the challenge correctly:

2018/02/27 06:22:15 [domain.com] Validating request. basePath=/.well-known/acme-challenge, token=HOlOtteBrz8A_<redacted>
2018/02/27 06:22:15 [domain.com] Comparing actual host 'domain.com' against expected 'domain.com'
2018/02/27 06:22:15 [domain.com] Got successful challenge request, writing key...

Unfortunately something between then and cert-manager retrieving the certificate fails:

I0227 06:23:35.254098       1 sync.go:242] Error preparing issuer for certificate: error waiting for authorization for domain "domain.com": acme: authorization error for :

Anything else we need to know?:

The first thing that jumps out at me is the blank entry in prepare.go which appears to come from auth.Account. I'm assuming that comes from the letsencrypt-staging secret (???), but all I see there is tls.key:
```
$ k -n kube-system describe secret letsencrypt-staging
Name: letsencrypt-staging
Namespace: kube-system
Labels:
Annotations:

Type: Opaque

Data

tls.key: 1675 bytes
```The resulting blank error is probably related is related to #12, but unfortunately I can't figure out how to dump thewireAuthz` structure (my go-fu is ... bad).

One last thing that might be related is that I've used LetsEncrypt to generate certificates for this domain before. Sadly I don't have the private key for the staging environment, but I _do_ have the private_key.json that certbot spits out. I've looked high and low, but I can't seem to find out exactly what the structure of privateKeySecretRef is, so I don't know how to stuff the private_key.json from the prod instance into a secret. I'd try to generate a new cert de-novo on the prod instance to see if it's something related to staging, but I've been burned by the rate limit before and would like to avoid that week-long penalty box if I can.

Let me know if there's anything else I can provide.

Environment:

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: bare-metal
Install tools: kubeadm

kinbug

Source

timblakely

Most helpful comment

OH! You know, I think I was messing with my iptables forwarding and dnsmasq rules around that time. tl;dr: cluster's network switch didn't support hairpin routing, so I had dnsmasq point to the internal IP address. That made the domain accessible from intra-cluster (hence the successful test challenges), but it didn't occur to me that the FORWARDING rules might be borked.

@cpu Thanks a bunch! :+1:

timblakely on 27 Feb 2018

❤2 👍1

All 4 comments

Hm. Well, tried again today and it appeared to succeed...? Seems like it might have been an intermittent issue with the letsencrypt staging server. I'll dig around a bit more and see if I find anything, but I'll close this in the meantime.

timblakely on 27 Feb 2018

Seems like it might have been an intermittent issue with the letsencrypt staging server.

@timblakely I was able to use that snippet of the challenge token you put in the issue body (token=HOlOtteBrz8A_<redacted>) to work backwards through the server-side validation authority logs (I work for Let's Encrypt/ISRG :wave:) to find the HTTP-01 validation request result.

I don't believe this was a staging server issue, it looks like the HTTP-01 challenge request was refused. The API response given to cert-manager said as much too:
"Error":"connection :: Fetching http://<censored>/.well-known/acme-challenge/HOlOtteBrz8A_<censored>: Connection refused"

Hope that helps!

cpu on 27 Feb 2018

@cpu Thanks a bunch! :+1:

timblakely on 27 Feb 2018

❤2 👍1

These exact errors also happened to me. It was also caused by a refused connection. The firewall was blocking the connection.

If you have set up a website with a whitelist that only accepts connections from a select list of IPs, then you might be able to access the challenge URLs, but Let's Encrypt will not. Adding the following Let's Encrypt IPs to my firewall fixed the problem:

34.213.106.112
66.133.109.36
52.29.173.72
13.58.30.69

expz on 10 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

On new AKS cluster: "Failed to clean up previous order: No existing record found"

gaieges · 3Comments

Failed to determine Route 53 hosted zone ID: SignatureDoesNotMatch

jbartus · 4Comments

Cannot create certificate directly from ingress from different namespace than cert-manager & clusterissuer

apetheriotis · 3Comments

Allow selecting the certificate format to be used

matthew-muscat · 4Comments

Error picking challenge type to use for domain 'staging.myapp.com: no configured and supported challenge type found

jonathan-kosgei · 4Comments