I'm hitting issues using the failover annotation with Route53.
These are the annotations on the ingress:
external-dns.alpha.kubernetes.io/set-identifier: us-east-1
external-dns.alpha.kubernetes.io/aws-failover: PRIMARY
I added a bit of debugging to dump the interface; here is the output:
ChangeBatch:
[{
Action: "CREATE",
ResourceRecordSet: {
AliasTarget: {
DNSName: "01234567-default-mysite-0123456789.us-east-1.elb.amazonaws.com",
EvaluateTargetHealth: true,
HostedZoneId: "ZZZZZDOTZZZZZZ"
},
Failover: "PRIMARY",
Name: "hostname.mysite.com",
SetIdentifier: "us-east-1",
Type: "A"
}
} {
Action: "CREATE",
ResourceRecordSet: {
Failover: "PRIMARY",
Name: "prefix.hostname.mysite.com",
ResourceRecords: [{
Value: "\"heritage=external-dns,external-dns/owner=mysite-com-prod-us-east-1,external-dns/resource=ingress/default/mysite-prod\""
}],
SetIdentifier: "us-east-1",
TTL: 300,
Type: "TXT"
}
}]
The error I get is:
A non-alias primary ResourceRecordSet must have an associated health check. No changes made.
From looking into this, the issue is the TXT record can't have a failover routing policy unless it's got a health check, or unless it's an ALIAS. A health check is not needed in this case since the A record is an ALIAS, and that checks the target health (EvaluateTargetHealth: true).
In order for external-dns to be able to store multiple TXT records for this failover A/ALIAS record, I think we need to have the TXT record be stored with a multi-value answer routing policy instead, as that gives more flexibility.
I also have encountered this failure when attempting to set the aws-failover annotation on an ingress in aws:
external-dns.alpha.kubernetes.io/set-identifier: "my-cluster-us-east-1"
external-dns.alpha.kubernetes.io/aws-failover: "PRIMARY"
The error reported:
A non-alias primary ResourceRecordSet must have an associated health check. No changes made.
were you able to solve this? i'm running into the same issue
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
I am seeing this issue as well. The external-dns.alpha.kubernetes.io/aws-failover feature is absolutely useless without it because it attempts to create the TXT record with failover in addition to the actual DNS record.
Same issue here :(
Same issue for me.
/remove-lifecycle stale
/kind bug
Can someone validate this is still on issue with external-dns v0.7.3?
Still a bug, I tested with 0.7.3.
time="2020-09-08T11:37:48Z" level=error msg="InvalidChangeBatch: [A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made.]\n\tstatus code: 400, request id: 217f681f-5edd-4389-8150-c7c6bce67833"
time="2020-09-08T11:37:48Z" level=error msg="failed to submit all changes for the following zones:
Has anyone found a workaround for this?
The only one I've found was to prevent the txt record from being created entirely. However, if the txt record is not created, then external-dns fails to update/delete the A record when the service is updated/deleted.
I am using terraform, so I end up removing the entire hosted zone.
Any other solutions you guys have found? Thanks
In my own case, I didn't want to keep using a fork (#1423), so I had external-dns create regional records (e.g. service-{us-east-1,us-west-2}.domain.tld), and I created an additional set of failover records which aliased those. So far, so good, although not having to manage that top-level entry out-of-band would be nice.
Thanks @ameir for the response. Would you be so kind to share a little more information on your approach?
If I understand correctly, you were looking to use a failover routing policy, in other words, have primary and secondary alias records.
Were these the ones you created through external-dns?
@gonzalobarbitta sure; due to this bug, I wasn't able to have external-dns create the failover records directly. Instead, it creates standard A (as alias) records for each service, but with the region in the hostname. Then, on top of that, I have failover records that I manage separately that point to these regional records. external-dns is not in this flow. I still get the benefits of failover here, but it requires additional setup. If that doesn't answer your question, let me know and I'll try to elaborate.
have anyone tried out the new annotation health-check-id, released in external-dns:v0.7.4?
by associating failover TXT records with health-check-id, it should not complain anymore that it cannot create the record.
I have created a Route 53 Health Check and specified its id into annotation on ingress:
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale