Describe the bug:
We have a wildcard domain pointing at an nginx ingress controller, which basically means that the wildcard domain resolves to an Elastic Load Balancer.
When trying to create the _acme-challenge record in the wildcarded domain, it sees the CNAME to the ELB and then tries to update the DNS in the ELB's domain (us-west-2.elb.amazonaws.com).
I0817 02:04:31.075401 1 logger.go:73] Calling GetAuthorization
I0817 02:04:31.203160 1 logger.go:98] Calling DNS01ChallengeRecord
I0817 02:04:31.203193 1 prepare.go:279] Cleaning up old/expired challenges for Certificate staging/staging-phoenix-my-tls
I0817 02:04:31.203206 1 logger.go:68] Calling GetChallenge
I0817 02:04:31.436572 1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.075401 1 logger.go:73] Calling GetAuthorization
I0817 02:04:31.203160 1 logger.go:98] Calling DNS01ChallengeRecord
I0817 02:04:31.203193 1 prepare.go:279] Cleaning up old/expired challenges for Certificate staging/staging-wildcard-tls
I0817 02:04:31.203206 1 logger.go:68] Calling GetChallenge
I0817 02:04:31.436572 1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.436589 1 dns.go:93] Checking DNS propagation for "example.com" using name servers: [100.64.0.10:53]
I0817 02:04:31.472333 1 dns.go:100] DNS record for "example.com" not yet propagated
I0817 02:04:31.472460 1 dns.go:83] Presenting DNS01 challenge for domain "example.com"
I0817 02:04:31.481949 1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.841210 1 helpers.go:201] Found status change for Certificate "staging-wildcard-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-17 02:04:31.841201695 +0000 UTC m=+9743.366825451
I0817 02:04:31.841235 1 sync.go:276] Error preparing issuer for certificate staging/staging-wildcard-tls: Failed to determine Route 53 hosted zone ID: Zone us-west-2.elb.amazonaws.com. not found in Route 53 for domain ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
E0817 02:04:31.841254 1 sync.go:197] [staging/staging-wildcard-tls] Error getting certificate 'staging-wildcard-tls': secret "staging-wildcard-tls" not found
E0817 02:04:31.854121 1 controller.go:180] certificates controller: Re-queuing item "staging/staging-wildcard-tls" due to error processing: Failed to determine Route 53 hosted zone ID: Zone us-west-2.elb.amazonaws.com. not found in Route 53 for domain ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
Expected behaviour:
The _acme-challenge TXT record is created in the wildcarded domain (example.com in the above)
Steps to reproduce the bug:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
certmanager.k8s.io/acme-challenge-type: dns01
certmanager.k8s.io/acme-dns01-provider: route53
certmanager.k8s.io/cluster-issuer: letsencrypt-staging
kubernetes.io/ingress.class: nginx-external
name: staging-frontend
spec:
rules:
- host: '*.example.com'
http:
paths:
- backend:
serviceName: staging-frontend
servicePort: http
tls:
- hosts:
- '*.example.com'
secretName: staging-wildcard-tls
with a suitable nginx ingress controller pointing at an AWS ELB should do the trick
Anything else we need to know?:
The CNAME behaviour was introduced in #670 and the commit message is sufficient to understand the motivation behind the change, and there's plenty of support for the change within #670 - as such I don't know how best to fix this so that my use case is supported without breaking the use case that motivated #670.
cc: @gurvindersingh
Environment details::
/kind bug
It looks like #670 isn't yet in any actual releases (I'm using canary as that's what the default static manifests pointed me at) so I'll revert to v0.4.1 for now
I had this issue as well testing master.
Using the CNAME to create the text record should really be some kind of option, or at least it should test which domain we can create records in and use that one
@willthames I think we can use a config option to enable or disable the CNAME support. The default can be disabled to keep the behavior same as earlier. This code can be put under that condition check.
Hm - so from my understanding, we should be following CNAMEs for _
acme-challenge.example.com instead of for example.com itself.
This would in turn, resolve your issue.
I don't think the validation process will actually even work if we're
resolving CNAMEs for example.com in your example, as we'd not be proving
ownership of the domain.
@gurvindersingh would you mind clarifying the intent of the original PR? 😀
@cpu do you have any idea how we should handle CNAMEs? I assume only CNAME
records set on _well-known.example.com should be followed?
On Tue, 21 Aug 2018 at 10:39, Gurvinder Singh notifications@github.com
wrote:
@willthames https://github.com/willthames I think we can use a config
option to enable or disable the CNAME support. The default can be disabled
to keep the behavior same as earlier. This code
https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/util/dns.go#L23-L29
can be put under that condition check.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/jetstack/cert-manager/issues/837#issuecomment-414614941,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMbPyYNUppW2lN78uWHyQ7lHoo_8sP6ks5uS9VBgaJpZM4WA3Om
.
@munnerz the code changes in the PR #670 does CNAME for acme-challenge.example.com not example.com, so not sure if that is the problem.
@munnerz reading more carefully the bug report, to me it seems the current code is doing what it suppose to do. As @willthames have wildcard domain CNAME to AWS LB so the code sees that there is a CNAME even for _acme.challenge and update the fqdn to use that. So the solution to it is to have a config option to enable this feature or not depending upon your cluster setup.
Thanks for taking a look @gurvindersingh.
That makes sense then - so you have *.example.com pointed at an ELB, which implies _acme-challenge.example.com is CNAMEd too.
This is a tricky one - from what I can see, there's no way for us to detect this, as wildcards are a DNS provider feature and not part of the DNS spec.
I think it's also fair that some users may want some domains to follow the CNAME, and some to not.
One solution, I'd guess, is if you were to provide an explicit CNAME for _acme-challenge.example.com to some other domain (e.g. acme-challenge.acme.example.com).
How do you think we can best represent this configuration option to users?
I was thinking to keep things simple. As the earlier behavior is which people are used to in some setups, so we can have a config option e.g. enable-acme-cname which users decide to enable it if they want CNAME replacement for _acme-challengepart otherwise earlier behavior will be kept same.
If at later stage, people want to have more granular control for different domains different behavior then we can think about adding domain specific CNAME logic.
I tried v0.5.0 and canary (master-5602) and it didn't work in either of those. I guess like mentioned above we have to wait for #670 to make it into a release.
I've reverted to v0.4.1 using the same helm chart without any changes other than the tag and just doing a helm upgrade (fingers crossed i've not put myself in a world of hurt longer term)
I am no longer seeing the CNAME related cannot find ZoneID for xyzxyz.elb.amazon.com domain error message.
I tried v0.5.0 and canary (master-5602) and it didn't work in either of those. I guess like mentioned above we have to wait for #670 to make it into a release.
I've reverted to v0.4.1 using the same helm chart without any changes other than the tag and just doing a helm upgrade (fingers crossed i've not put myself in a world of hurt longer term)
I am no longer seeing the CNAME related cannot find ZoneID for xyzxyz.elb.amazon.com domain error message.
Same worked for me as well! 🤔
Just to be clear, I am using the nginx ingress controller for now, but I'm not even using wildcards in my ingresses yet. I feel like people with a good working setup are going to either upgrade or if already on 0.5.0 have someone add a DNS entry that causes their certs not to be re-issued.
Please let me know if I'm way off on this.
I agree with @keithlayne on this one. The change from #670 causes cert-manager to essentially break down when the user has a wildcard CNAME record. Reference #1035.
I completely see why #670 was needed, but I'd argue that a wildcard CNAME record is at least similarly as common, if not more common, of a use case. Therefore, an option like @gurvindersingh proposed is imperative.
We were tripped up by this bug as well after upgrading to 0.5.0. The cluster.example.com DNS zone is hosted in Azure DNS and has a wildcard CNAME pointing to cluster-example.trafficmanager.net.
After creating a certificate for foo.cluster.example.com we see this:
| Name | Type | Value |
|---|---|---|
| * | CNAME | cluster-example.trafficmanager.net |
| _acme-challenge.foo | TXT | Zm9vYmFy... |
| Name | Type | Value |
|---|---|---|
| * | CNAME | cluster-example.trafficmanager.net |
| cluster-example.trafficmanager.net | TXT | Zm9vYmFy... |
trafficmanager.net is Azure's global load balancer, so we can't create the _acme-challenge records on that domain.
I also downgraded to v0.4.1 for now. This would be awesome if you could toggle as suggested.
So I've done some research here, and have found that if someone has a CNAME record configured for _acme-challenge.example.com (including *.example.com), they will check both example.com and acme.insecure.com (the domain that the CNAME points at) for the TXT record, and if either has one, it will validate the challenge as successful.
This implies to me that we need to allow users to configure how cert-manager behaves.
I see two options going forward, and I'd love to hear feedback on either:
1) add a followCNAME option to issuer.spec.acme.dns01.solvers[] (defaults to false). If set to true, when cert-manager encounters a CNAME record it will traverse the CNAME and update the zone it points at (and check that domain during self checking).
2) utilise the certificate.spec.acme.config.domains[] field to allow users to configure this:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: testcrt-acme
spec:
acme:
config:
- domains:
- example.com
dns01:
provider: cloudflare
dnsNames:
- example.com
The above would cause cert-manager to update _acme-challenge.example.com directly with a TXT record
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: testcrt-acme
spec:
acme:
config:
- domains:
- acme.insecure.com
dns01:
provider: cloudflare
dnsNames:
- example.com
In order to achieve option (2), we'll need to attempt a CNAME lookup for every domain listed in dnsNames in order to determine which domains are valid substitutions for example.com.
Over time, we want to remove the certificate.spec.acme configuration from Certificate resources anyway, which makes this simpler for end-users (as they will only need to request dnsNames: ["example.com"] and not have to think about how that'll be solved)
I'd rather go with option(1) since followCNAME is an explicit option whereas option(2) appears more like an implicit, derived functionality to me.
I agree with @timuthy here, I saw you also already took that approach @munnerz.
This is now fixed as part of #1136 😄
Most helpful comment
I was thinking to keep things simple. As the earlier behavior is which people are used to in some setups, so we can have a config option e.g.
enable-acme-cnamewhich users decide to enable it if they want CNAME replacement for_acme-challengepart otherwise earlier behavior will be kept same.If at later stage, people want to have more granular control for different domains different behavior then we can think about adding domain specific CNAME logic.