Cert-manager: Feature: ability to specify specific DNS server to resolve check requests with DNS01

Created on 23 Apr 2018 · 11Comments · Source: jetstack/cert-manager

Is this a BUG REPORT or FEATURE REQUEST?: Feature Request

/kind feature

I need to be able to specify a specific DNS server to wait for DNS01 Auth updates. I am operating an internal cluster with its own internal authority nameserver. I'm actually using AWS Route53 for public DNS separately. So there are two authorities. The external (only for letsencrypt DNS01 auth) and internal (where the actual cert will be installed and used).

If there was a place I could put another nameserver to use to resolve the solver's wait for requests, then this would work perfectly. For now, I'm forcing cert-manager to use a different nameserver via manipulating /etc/resolv.conf.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-13T22:27:55Z", GoVersion:"go1.9.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:14:26Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration**: baremetal
Install tools: manual
Others:

areacme kinfeature

Source

dlamotte

👍2

Most helpful comment

I need this too.... was actually surprised that split horizon was broken out of the box as being able to use TLS/SSL internally is pretty much the main benefit of split horizon.

If anyone does find a work around would be great. Going to try bind-mounting a fudged resolv.conf before giving up. Changing feature gates or upgrading to k8s 1.10 isn't an option sadly...

EDIT:

The workaround I went for was to enable the feature gate for custom pod DNS in KOPS:

  kubeAPIServer:
    featureGates:
      CustomPodDNS: "true"    # Remove with K8S v1.10 ######

  kubelet:
    featureGates:
      CustomPodDNS: "true"    # Remove with K8S v1.10 ######

The featureGate is default enabled in K8S v.1.10 so the above would not be required.

Then use the following in the cert-manager deployment spec:

      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 8.8.8.8
          - 8.8.4.4

rlees85 on 24 Apr 2018

👍3

All 11 comments

I need this too.... was actually surprised that split horizon was broken out of the box as being able to use TLS/SSL internally is pretty much the main benefit of split horizon.

If anyone does find a work around would be great. Going to try bind-mounting a fudged resolv.conf before giving up. Changing feature gates or upgrading to k8s 1.10 isn't an option sadly...

EDIT:

The workaround I went for was to enable the feature gate for custom pod DNS in KOPS:

  kubeAPIServer:
    featureGates:
      CustomPodDNS: "true"    # Remove with K8S v1.10 ######

  kubelet:
    featureGates:
      CustomPodDNS: "true"    # Remove with K8S v1.10 ######

The featureGate is default enabled in K8S v.1.10 so the above would not be required.

Then use the following in the cert-manager deployment spec:

      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 8.8.8.8
          - 8.8.4.4

rlees85 on 24 Apr 2018

👍3

Here's my take at this: https://github.com/jetstack/cert-manager/pull/522

fgrehm on 30 Apr 2018

@fgrehm That's very helpful to have a chart be able to configure the dnsConfig as @rlees85 pointed out. But I'm leaning ever so slightly to the side that it's a workaround to the original request.

I believe users might want to couple a specific ClusterIssuer with a specific DNS Authority Endpoint and/or DNS Resolver. That said, I'm having a hard time coming up with use cases where there would be anything beyond "internal DNS" vs "external DNS". So maybe this is _good enough_?

Maybe if a company exposes its own ACME authority as well as using the letsencrypt authority? You might have "internal DNS", "internal ACME Authority visible DNS" and "external DNS"? Seems pretty contrived.

Just feels wrong to force all DNS queries out of the pod to use a specific DNS server vs _just_ for validation that the TXT record was created. What if the pod needed to resolve DNS records internally but had to validate TXT records externally?

dlamotte on 4 May 2018

I'm having the same issue as my cluster is in an almost air gapped environment. the servers have access to aws and letsencrypt apis but not to google dns. because of this the default cert-manager fails at the prepare step.
As a workaround I'm using a modified cert-manager that does not stop the process if the check step fails and everything is working perfectly

It's my opinion that cert-manager should not fail if the dns check request fails at all.
It should instead continue the process and try to validate with letsencrypt.
If the dns really had not been propagated then it will fail anyway at that point.

brokenmass on 4 May 2018

👍1

That's a good point. An option to disable the optional check entirely would be useful.

@brokenmass are you able to reach the AWS NS servers for your zone? You could point the nameservers for the pod directly at the zone's NS servers perhaps.

dlamotte on 4 May 2018

No i cannot reach those either. in my environment only some specific urls are allowed and the internal dns do not get the txt entries.

That check is like applying bandage before getting injured.
It's role should be to just shorten a default propagation waiting time (like 2 minutes) in case the checks succeeds earlier.

brokenmass on 4 May 2018

I think this issue and the associated PR (#522) are related to #446 and trying to solve a similar problem (restrictive firewalls).

cc @redbaron

euank on 6 May 2018

@euank : --acme-dns01-propogation-time=60s why isn't this option just the default ?
what is the dnscheck actually trying to achieve ? would a PR that allows to disable DNSchecks be accepted ?

brokenmass on 11 May 2018

We have to perform the DNS self check else Let's Encrypt will fail validation if the DNS propagation takes longer than a few minutes, and exhaust your API quotas.

There are now a few PRs that attempt to work around issues with split-horizon DNS, and plenty of good ideas on how to do it.

My 2c:

1) Allowing users to skip DNS01 self check: This is required afaik, as otherwise Let's Encrypt may fail the order before the record has propagated. I think this request has come about because DNS self checks are failing in split horizon environments (a valid complaint!) and I believe we can do better to remedy this than a simple 'off switch' which could potentially harm users in future who do not understand the reasoning behind the option.

2) Allowing dns-over-https instead of standard port 53 (#446): This effectively remediates the issue because it no longer relies on the pods resolv.conf for DNS configuration, meaning the local DNS resolvers are skipped. I think this is a great feature addition to cert-manager, and incidentally solves the issue, but I do not think it should be a requirement for a user to use dns-over-https in a split-brain environment

3) Allow specifying a custom dns policy on the Helm chart (#522): This is a great and clean solution to the problem - although has the downside of requiring more recent versions of Kubernetes. I think we should accept this change as an option as it will be helpful in the future, and is a simple change to our chart.

I think it does allow a user to override the DNS servers used as an option to cert-manager itself. I've seen so far, 3 different proposals on where the option should live:

On the Issuer resource as a per-dns-provider option (#564)
On the Issuer resource as a per-issuer option - I think @mikebryant also suggested this?
As a CLI flag

I don't think it should be per DNS provider personally, as whether the self check passes is a property of the ACME server, and not of a DNS provider. We should attempt to view the DNS zone of the domain from the perspective of the ACME server (i.e. in the Let's Encrypt case, the public DNS hierarchy).

Initially, allowing a user to specify this as a 'default' flag on cert-manager seems like the easiest and quickest way forward. This would involve a []string passed to the CLI, which if specified would cause cert-manger to skip reading resolv.conf altogether.

Allowing to specify it on a per-Issuer basis would allow users to consume certificates from two completely isolated ACME servers, i.e. in a highly multi-tenant environment where two organisations share a single instance of cert-manager, and each orgs manage their own ACME server that validates challenges using a private DNS zone.

This does seem like a fairly valid use case, but I don't know if it's worth complicating our API surface at this time if nobody actually requires this feature ❓

cc @mikebryant