Cert-manager: Self-check fails in 0.7.0 due to CAA check that is inconsistent with Let's Encrypt Issuance Process

Created on 3 Apr 2019  Â·  12Comments  Â·  Source: jetstack/cert-manager

Describe the bug:
Cert-manager self checks fail with message "CAA self-check failed: Could not validate CAA: Unexpected response code 'SERVFAIL' for [host]" after upgrading to 0.7.0. We were able to issue certs fine without any CAA prior to upgrading. Currently we have to create a CAA record in Route53 in order for it to pass, which is not ideal for all of our circumstances and we want to avoid that.

It appears that cert-manager 0.7.0 requires CAA self-check to pass if the issuer declares a CAA identity. Let's Encrypt will check CAAs if they exist, but LE does not mandate a CAA record as a part of the issuance process.

Per https://letsencrypt.org/documents/isrg-cps-v2.4/#4-2-1-performing-identification-and-authentication-functions
4.2.1 Performing identification and authentication functions

ISRG performs all identification and authentication functions in accordance with the ISRG CP. This includes validation per CPS Section 3.2.2.

ISRG checks for relevant CAA records prior to issuing certificates. The CA acts in accordance with CAA records _if present_. The CA’s CAA identifying domain is ‘letsencrypt.org’.

It appears that the code starting at following line makes an incorrect assumption. The TODO comment hints at this possibly being an issue as well.
https://github.com/jetstack/cert-manager/blob/05517af6cbe96eb07b61efc4839e6e3f1bec49f0/pkg/controller/acmechallenges/sync.go#L146

Expected behaviour:
Self-check passes without having to create a CAA record in DNS as was the case in earlier versions.

Steps to reproduce the bug:
Create NGINX ingress with tls spec to trigger cert creation automation. Do not specify a CAA. Look at logs for cert-manager and or describe the challenge.

Environment details::

  • Kubernetes version (e.g. v1.10.2): 1.13.5
  • Cloud-provider/provisioner: self hosted
  • cert-manager version: 0.7.0
  • Install method: helm

/kind bug

areacme kinbug prioritcritical-urgent

Most helpful comment

Using the helm values:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

Fixes this issue

All 12 comments

We have more info on this. Cert-manager is using our internal dns servers to do the self check for the CAA. When they get down to checking "com.", because there are no CAAs, our windows DNS server gives a SERVFAIL. https://serverfault.com/questions/353589/windows-dns-server-2008-r2-fallaciously-returns-servfail

Would it make more sense to use some well known name servers for these sort of checks since Let's Encrypt would not be using our internal servers?

Looks like https://github.com/helm/charts/blob/master/stable/cert-manager/values.yaml#L71 may be an answer to our problems. I'm going to try it out

Using the helm values:

podDnsPolicy: "None"
podDnsConfig:
  nameservers:
    - "1.1.1.1"
    - "8.8.8.8"

Fixes this issue

So it appears we may need to 'allow' SERVFAIL errors then, or more concisely perhaps, we need to only validate CAA records iff the domain we are validating actually responds with an OK response to a CAA record lookup.

Marking as urgent for the v0.7 milestone, as it's a regression that will break renewals for users who had a previously working configuration in v0.6.

/cc @DanielMorsing
/milestone v0.7
/priority critical-urgent
/area acme

If the #1521 fix is cool, can we expect a release soon? (perhaps https://github.com/jetstack/cert-manager/releases/tag/v0.7.1 )

Similar issue with wildcard certificate, nginx-ingress and digitalocean dns.

Re-opening as v0.7.1 hasn't been cut yet

v0.7.1 is out 😄

Hm not solved for me. However, the error message changed between 0.7 and 0.71 to

CAA self-check failed: Could not validate CAA record: Could not determine the zone for "com.": Unexpected response code 'SERVFAIL' for com.

Further up it was mentioned to try different values for podDNS... But how can I set those?

I am trying to use cert-manager on Azure Kubernetes Service.

Any hints?

@maddylliieeee I was trying to figure out the same thing, I fixed it doing the following

helm del --purge cert-manager
helm install --name cert-manager --namespace cert-manager --version v0.7.1 jetstack/cert-manager --set podDnsPolicy=None --set podDnsConfig.nameservers[0]=1.1.1.1 --set podDnsConfig.nameservers[1]=8.8.8.8

It looks like AKS was using an internal azure DNS server which couldnt handle "com" returning SERVFAIL.

@mshorrosh Works perfectly, thank you!

@mshorrosh Thanks a lot! I've been looking for it for days.

Was this page helpful?
0 / 5 - 0 ratings