/kind bug
What happened:
Disclaimer: I'm a relative newbie at k8s, so I'm very likely doing something wrong and just didn't RTFM correctly.
Super close to getting http01 ACME challenges working with letsencrypt-staging, but running into a strange authorization failure I can't seem to pin down. First, I installed cert-manager v0.2.3 via helm install --name cert-manager --namespace kube-system contrib/charts/cert-manager. My Issuer and Certificate config looks like this:
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
# The ACME server URL
server: https://acme-staging.api.letsencrypt.org/directory
# Email address used for ACME registration
email: [email protected]
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-staging
# Enable the HTTP-01 challenge provider
http01: {}
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: domain-com-staging
namespace: default
spec:
secretName: domain-com-tls-staging
issuerRef:
name: letsencrypt-staging
kind: ClusterIssuer
commonName: domain.com
acme:
config:
- http01:
ingressClass: nginx
domains:
- domain.com
After applying it, according to the logs cert-manager picks it up and begins the process:
I0227 06:23:20.413039 1 sync.go:107] Error checking existing TLS certificate: secret "domain-com-tls-staging" not found
I0227 06:23:20.413065 1 sync.go:238] Preparing certificate with issuer
I0227 06:23:20.413506 1 prepare.go:239] Compare "" with "https://acme-staging.api.letsencrypt.org/acme/reg/<redacted>"
I see a mini challenge-response pod pop up called cm-domain-com-staging-*. Looks like the logs show the ACME server actually does perform the challenge correctly:
2018/02/27 06:22:15 [domain.com] Validating request. basePath=/.well-known/acme-challenge, token=HOlOtteBrz8A_<redacted>
2018/02/27 06:22:15 [domain.com] Comparing actual host 'domain.com' against expected 'domain.com'
2018/02/27 06:22:15 [domain.com] Got successful challenge request, writing key...
Unfortunately something between then and cert-manager retrieving the certificate fails:
I0227 06:23:35.254098 1 sync.go:242] Error preparing issuer for certificate: error waiting for authorization for domain "domain.com": acme: authorization error for :
Anything else we need to know?:
The first thing that jumps out at me is the blank entry in prepare.go which appears to come from auth.Account. I'm assuming that comes from the letsencrypt-staging secret (???), but all I see there is tls.key:
```
$ k -n kube-system describe secret letsencrypt-staging
Name: letsencrypt-staging
Namespace: kube-system
Labels:
Annotations:
Type: Opaque
tls.key: 1675 bytes
```
The resulting blank error is probably related is related to #12, but unfortunately I can't figure out how to dump thewireAuthz` structure (my go-fu is ... bad).
One last thing that might be related is that I've used LetsEncrypt to generate certificates for this domain before. Sadly I don't have the private key for the staging environment, but I _do_ have the private_key.json that certbot spits out. I've looked high and low, but I can't seem to find out exactly what the structure of privateKeySecretRef is, so I don't know how to stuff the private_key.json from the prod instance into a secret. I'd try to generate a new cert de-novo on the prod instance to see if it's something related to staging, but I've been burned by the rate limit before and would like to avoid that week-long penalty box if I can.
Let me know if there's anything else I can provide.
Environment:
kubectl version):Hm. Well, tried again today and it appeared to succeed...? Seems like it might have been an intermittent issue with the letsencrypt staging server. I'll dig around a bit more and see if I find anything, but I'll close this in the meantime.
Seems like it might have been an intermittent issue with the letsencrypt staging server.
@timblakely I was able to use that snippet of the challenge token you put in the issue body (token=HOlOtteBrz8A_<redacted>) to work backwards through the server-side validation authority logs (I work for Let's Encrypt/ISRG :wave:) to find the HTTP-01 validation request result.
I don't believe this was a staging server issue, it looks like the HTTP-01 challenge request was refused. The API response given to cert-manager said as much too:
"Error":"connection :: Fetching http://<censored>/.well-known/acme-challenge/HOlOtteBrz8A_<censored>: Connection refused"
Hope that helps!
OH! You know, I think I was messing with my iptables forwarding and dnsmasq rules around that time. tl;dr: cluster's network switch didn't support hairpin routing, so I had dnsmasq point to the internal IP address. That made the domain accessible from intra-cluster (hence the successful test challenges), but it didn't occur to me that the FORWARDING rules might be borked.
@cpu Thanks a bunch! :+1:
These exact errors also happened to me. It was also caused by a refused connection. The firewall was blocking the connection.
If you have set up a website with a whitelist that only accepts connections from a select list of IPs, then you might be able to access the challenge URLs, but Let's Encrypt will not. Adding the following Let's Encrypt IPs to my firewall fixed the problem:
34.213.106.112
66.133.109.36
52.29.173.72
13.58.30.69
Most helpful comment
OH! You know, I think I was messing with my iptables forwarding and dnsmasq rules around that time. tl;dr: cluster's network switch didn't support hairpin routing, so I had dnsmasq point to the internal IP address. That made the domain accessible from intra-cluster (hence the successful test challenges), but it didn't occur to me that the FORWARDING rules might be borked.
@cpu Thanks a bunch! :+1: