external-dns cannot automatically change record type in Google DNS

Created on 4 Jan 2018 · 15Comments · Source: kubernetes-sigs/external-dns

I'm migrating from managing DNS records using CNAMEs

ie.

external-dns.alpha.kubernetes.io/target:

to a more standard setup of using an A record to point at a LoadBalancer.

When I removed the target annotation, I get this error:

time="2018-01-04T16:28:41Z" level=info msg="Del records: test-consul.dev.foo.io. CNAME [helm.dev.foo.io.]"
time="2018-01-04T16:28:41Z" level=info msg="Add records: test-consul.dev.foo.io. CNAME [10.1.1.8]"
time="2018-01-04T16:28:42Z" level=error msg="googleapi: Error 400: Invalid value for 'entity.change.additions[0].rrdata[0]': '10.1.1.8', invalid"

It looks like it's trying to replace the CNAME value with an IP, rather than delete the CNAME record and create a new A record.

Manually removing the old CNAME DNS record allows external-dns to do the right thing.

time="2018-01-04T16:31:44Z" level=info msg="Add records: test-consul.dev.foo.io. A [10.1.1.8]"
time="2018-01-04T16:31:44Z" level=info msg="Add records: helm-dev-test-consul.dev.foo.io. TXT ["heritage=external-dns,external-dns/owner=helm-dev"]"

Any hints on how to avoid a manual migration here?

closing-soon-if-no-response help wanted lifecyclfrozen

Source

james-masson

All 15 comments

Until the current release, we deliberately added that the record type doesn't change once the record was created to avoid flapping behaviour: https://github.com/kubernetes-incubator/external-dns/blob/v0.4.8/plan/plan.go#L79-L81

However, it looks like this functionality has been removed during our latest (non-released) Plan refactorings: https://github.com/kubernetes-incubator/external-dns/blob/ec07f45c8e8d54bd6b6ed982fd4ce58acfe2f556/plan/plan.go#L99-L103

@ideahitme can you confirm this?

@james-masson therefore, you could try a version from master if you're brave: docker pull registry.opensource.zalan.do/teapot/external-dns:v0.4.8-13-g1ed025a

linki on 5 Jan 2018

@james-masson In the currently released version you need to force ExternalDNS to recreate the record instead of updating it. The Del+Add dance is just an implementation detail on Google CloudDNS but for ExternalDNS it's still just an update.

SInce you're modifying your Services anyways by removing the external-dns.alpha.kubernetes.io/target annotation you might be able to also set the external-dns.alpha.kubernetes.io/hostname annotation to another value so it becomes a different desired hostname. Alternatively if you're using the --fqdn-template feature you can just change the template.

Then let ExternalDNS do one more sync which will drop your existing records and create some bogus ones. Then change your Services/--fqdn-template back to the original values and you'll get back your desired records but with the correct A type set. This will only work if you're using the sync policy (default).

linki on 5 Jan 2018

@ideahitme can you confirm this?

yes, that's right

However, it looks like this functionality has been removed during our latest (non-released) Plan refactorings:

This seemed like a bug to me, rather than a functionality, we "hard" forced preserving record type without checking actual target (is it hostname or ip), which leads to op described bug. Which is already fixed in the current "master"

ideahitme on 9 Jan 2018

Our team just ran into this with the Google Provider.
The behavior is definitely buggy.

The use case was changing from Google Internal LoadBalancers to a single ILB fronting a shared Ingress Controller.

This meant Services previously having annotations for A records got removed and changed to Ingress Objects with annotations for CNAME records of the same hostname.

What we found is that when you have an A record in CloudDNS that collides with a requested CNAME target on the Ingress object, external-dns sometimes does not do the Delete and Re-add of the record. (sometimes it works with no human intervention.)

What's concerning is that when it does not re-add, external-dns is unable to create new CNAME records. The controller seems to stop doing work, but the loop is still running and outputs logs like so:

time="2018-04-10T18:28:34Z" level=error msg="googleapi: Error 400: Invalid value for 'entity.change.additions[14].rrdata[0]': 'internal-ingress.companyci.com'
More details:
Reason: invalid, Message: Invalid value for 'entity.change.additions[14].rrdata[0]': 'internal-ingress.companyci.com'
Reason: invalid, Message: Invalid value for 'entity.change.additions[15].rrdata[0]': 'internal-ingress.companyci.com'

Manually deleting the colliding A records from CloudDNS seems to resolve the issue, and the controller picks up the necessary work and makes the new CNAME records.

This might be a race condition in the controller code or a bug in the provider specific implementation.
I haven't dug into it.

App Version: 0.4.8
Chart Version: 0.5.1

stealthybox on 10 Apr 2018

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 23 Apr 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 23 May 2019

/remove-lifecycle rotten
This doesn't appear to have been fixed.
( I could be wrong. )

I'd consider this a requirement for GA software.
IMHO, external-dns surpasses requirements for beta with its very wide adoption.

casual bump.

/help

stealthybox on 10 Jun 2019

@stealthybox:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 10 Jun 2019

@stealthybox unfortunately we have limited to no access to Google Cloud (read: that would cost us money that we don't have) that makes really hard to reproduce and fix those issues. I'm unsure if @linki or @njuettner can get a cluster on GKE to test that.

I totally second the "help wanted" label, please feel free to jump in.

One of the goals of #540 is to get ExternalDNS to a state where we can have resources to run such tests. I'm slowly working on it, hoping to get it to a final state by the end of the month.

Raffo on 10 Jun 2019

❤1

Same issue happens in azure dns when the service has AWS ELB as the EXTERNAL-IP.
External dns will set it as the CNAME DNS record in azure dns.
After I deleted and re-applied the service, the AWS ELB name will change, but external dns seems unable to update the CNAME DNS record accordingly.
I had to manually delete the CNAME DNS record in azure dns and then external dns will register the CNAME DNS record correctly.
This seems a bug to me.

stuarthu on 6 Sep 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 5 Dec 2019

solved this problem by adding --txt-prefix

stuarthu on 5 Dec 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 4 Jan 2020

/lifecycle frozen

stealthybox on 17 Jan 2020

This happens in AWS as well: https://github.com/kubernetes-sigs/external-dns/issues/1852.

From an earlier comment in here, it sounds like this was intentional behavior. But I'm not quite sure why.