External-dns: Route 53 Failover setup issues - TXT record type

Created on 11 Feb 2020 · 14Comments · Source: kubernetes-sigs/external-dns

I'm hitting issues using the failover annotation with Route53.

These are the annotations on the ingress:

    external-dns.alpha.kubernetes.io/set-identifier: us-east-1
    external-dns.alpha.kubernetes.io/aws-failover: PRIMARY

I added a bit of debugging to dump the interface; here is the output:

ChangeBatch:
[{
  Action: "CREATE",
  ResourceRecordSet: {
    AliasTarget: {
      DNSName: "01234567-default-mysite-0123456789.us-east-1.elb.amazonaws.com",
      EvaluateTargetHealth: true,
      HostedZoneId: "ZZZZZDOTZZZZZZ"
    },
    Failover: "PRIMARY",
    Name: "hostname.mysite.com",
    SetIdentifier: "us-east-1",
    Type: "A"
  }
} {
  Action: "CREATE",
  ResourceRecordSet: {
    Failover: "PRIMARY",
    Name: "prefix.hostname.mysite.com",
    ResourceRecords: [{
        Value: "\"heritage=external-dns,external-dns/owner=mysite-com-prod-us-east-1,external-dns/resource=ingress/default/mysite-prod\""
      }],
    SetIdentifier: "us-east-1",
    TTL: 300,
    Type: "TXT"
  }
}]

The error I get is:

A non-alias primary ResourceRecordSet must have an associated health check. No changes made.

From looking into this, the issue is the TXT record can't have a failover routing policy unless it's got a health check, or unless it's an ALIAS. A health check is not needed in this case since the A record is an ALIAS, and that checks the target health (EvaluateTargetHealth: true).

In order for external-dns to be able to store multiple TXT records for this failover A/ALIAS record, I think we need to have the TXT record be stored with a multi-value answer routing policy instead, as that gives more flexibility.

kinbug lifecyclstale provideaws

Source

ameir

👍1

All 14 comments

I also have encountered this failure when attempting to set the aws-failover annotation on an ingress in aws:

external-dns.alpha.kubernetes.io/set-identifier: "my-cluster-us-east-1"
external-dns.alpha.kubernetes.io/aws-failover: "PRIMARY"

The error reported:

A non-alias primary ResourceRecordSet must have an associated health check. No changes made.

gdrudy on 21 Feb 2020

were you able to solve this? i'm running into the same issue

jtai-omniex on 30 Mar 2020

👀1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 28 Jun 2020

I am seeing this issue as well. The external-dns.alpha.kubernetes.io/aws-failover feature is absolutely useless without it because it attempts to create the TXT record with failover in addition to the actual DNS record.

fimbulvetr on 1 Jul 2020

Same issue here :(

semoac on 7 Jul 2020

Same issue for me.

leodamasceno on 20 Jul 2020

/remove-lifecycle stale
/kind bug

Can someone validate this is still on issue with external-dns v0.7.3?

seanmalloy on 17 Aug 2020

Still a bug, I tested with 0.7.3.

time="2020-09-08T11:37:48Z" level=error msg="InvalidChangeBatch: [A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made., A non-alias primary ResourceRecordSet must have an associated health check. No changes made.]\n\tstatus code: 400, request id: 217f681f-5edd-4389-8150-c7c6bce67833" time="2020-09-08T11:37:48Z" level=error msg="failed to submit all changes for the following zones:

jahpola-futurice on 8 Sep 2020

👍1

Has anyone found a workaround for this?

The only one I've found was to prevent the txt record from being created entirely. However, if the txt record is not created, then external-dns fails to update/delete the A record when the service is updated/deleted.
I am using terraform, so I end up removing the entire hosted zone.

Any other solutions you guys have found? Thanks

gonzalobarbitta on 16 Sep 2020

In my own case, I didn't want to keep using a fork (#1423), so I had external-dns create regional records (e.g. service-{us-east-1,us-west-2}.domain.tld), and I created an additional set of failover records which aliased those. So far, so good, although not having to manage that top-level entry out-of-band would be nice.

ameir on 16 Sep 2020

Thanks @ameir for the response. Would you be so kind to share a little more information on your approach?

If I understand correctly, you were looking to use a failover routing policy, in other words, have primary and secondary alias records.
Were these the ones you created through external-dns?

gonzalobarbitta on 17 Sep 2020

@gonzalobarbitta sure; due to this bug, I wasn't able to have external-dns create the failover records directly. Instead, it creates standard A (as alias) records for each service, but with the region in the hostname. Then, on top of that, I have failover records that I manage separately that point to these regional records. external-dns is not in this flow. I still get the benefits of failover here, but it requires additional setup. If that doesn't answer your question, let me know and I'll try to elaborate.

ameir on 17 Sep 2020

👍1

have anyone tried out the new annotation health-check-id, released in external-dns:v0.7.4?
by associating failover TXT records with health-check-id, it should not complain anymore that it cannot create the record.

I have created a Route 53 Health Check and specified its id into annotation on ingress:

external-dns.alpha.kubernetes.io/health-check-id: "health-check-id"
but it looks like it ignoring it and fallback to the same TXT error

pirteac on 2 Nov 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 31 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings