external-dns 🚀 - ExternalDNS deleting and then creating records. Constantly. Azure.

We're seeing the same behaviour on GKE (Google).

eyvind on 1 Feb 2019

What version of external are you're currently running?

njuettner on 1 Feb 2019

Seems related to #879

jhohertz on 1 Feb 2019

v0.5.10 has the problem, we have reverted to v0.5.9 which does not.

eyvind on 1 Feb 2019

Exactly the same here. v0.5.9 works fine, v0.5.10 does this constantly.

gurumaia on 1 Feb 2019

We are having the same issue, I posted an example in #543
We will try to revert to v0.5.9 for now.

lucaghersi on 5 Feb 2019

I've had the same issue this morning. Thankfully you guys already reported this as I was aware of the loop but did not know the cause... I've also reverted to v0.5.9 (running AKS 1.11.3 in Azure by the way)

leonvandebroek on 6 Feb 2019

yep same issue here, we were saved by keeping a lock on our resource groups in azure for delete :)

toutougabi on 6 Feb 2019

We are facing the same issue starting 0.5.10. 0.5.9 works fines

jonesbusy on 7 Feb 2019

Same issue, but only on 0.5.10, reverting to 0.5.9 works perfectly fine:

The following loop it's happening every minute.
Logs from external-dns (debug level):

level=debug msg="Retrieving Azure DNS zones."
level=debug msg="Found 1 Azure DNS zone(s)."
level=debug msg="Retrieving Azure DNS records for zone 'fulldomain.com'."
level=debug msg="Found A record for 'test-app.fulldomain.com' with target 'XX.XX.XX.XX'."
level=debug msg="Found TXT record for 'test-app.fulldomain.com' with target '\"heritage=external-dns,external-dns/owner=prod,external-dns/resource=ingress/test-app/test-app\"'."
level=debug msg="Endpoints generated from ingress: test-app/test-app: [test-app.fulldomain.com 300 IN A XX.XX.XX.XX [] test-app.fulldomain.com 300 IN A XX.XX.XX.XX []]"
level=debug msg="Removing duplicate endpoint test-app.fulldomain.com 300 IN A XX.XX.XX.XX []"
level=debug msg="Retrieving Azure DNS zones."
level=debug msg="Found 1 Azure DNS zone(s)."
level=info msg="Deleting A record named 'test-app' for Azure DNS zone 'fulldomain.com'."
level=info msg="Deleting TXT record named 'test-app' for Azure DNS zone 'fulldomain.com'."
level=info msg="Updating A record named 'test-app' to 'XX.XX.XX.XX' for Azure DNS zone 'fulldomain.com'."
level=info msg="Updating TXT record named 'test-app' to '\"heritage=external-dns,external-dns/owner=prod,external-dns/resource=ingress/test-app/test-app\"' for Azure DNS zone 'fulldomain.com'."

uritau on 7 Feb 2019

👍1

Thanks for all the other reports. I tried to downgrade to 0.5.9 and in Azure I'm now getting an API version error.

I then tried 0.5.8, same problem. Went back to 0.5.10, same problem.

I'm really confused now because up until 10 minutes ago, my External DNS was running the :latest tag and was constantly recycling DNS records.

I deleted that deployment (kubectl delete -f external-dns-manifest.yaml), and then created it. And now for some reason I'm getting API errors.

Wondering if somehow Azure is rate limiting these requests which just coincided with me trying to downgrade?

level=error msg="dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"InvalidApiVersionParameter\" Message=\"The api-version '2016-04-01' is invalid. The supported versions are '2018-11-01,2018-09-01,2018-08-01,2018-07-01,2018-06-01,2018-05-01,2018-02-01,2018-01-01,2017-12-01,2017-08-01,2017-06-01,2017-05-10,2017-05-01,2017-03-01,2016-09-01,2016-07-01,2016-06-01,2016-02-01,2015-11-01,2015-01-01,2014-04-01-preview,2014-04-01,2014-01-01,2013-03-01,2014-02-26,2014-04'.\""

PirateBread on 7 Feb 2019

@PirateBread

Could you try this build for Azure to see if it addresses your issue?

registry.opensource.zalan.do/teapot/external-dns:v0.5.10-16-gfe39b46

jhohertz on 8 Feb 2019

@jhohertz

Just deployed v0.5.10-16-gfe39b46 and I'm still seeing the following:

time="2019-02-08T16:05:52Z" level=info msg="Created Kubernetes client https://xxxxx-2b0c5b7a.hcp.uksouth.azmk8s.io:443" time="2019-02-08T16:05:52Z" level=info msg="Using client_id+client_secret to retrieve access token for Azure API." time="2019-02-08T16:05:52Z" level=error msg="dns.ZonesClient#time="2019-02-08T16:05:52Z" level=info msg="Created Kubernetes client https://xxxxxxx-2b0c5b7a.hcp.uksouth.azmk8s.io:443" time="2019-02-08T16:05:52Z" level=info msg="Using client_id+client_secret to retrieve access token for Azure API." time="2019-02-08T16:05:52Z" level=error msg="dns.ZonesClient#ListByResourceGroup: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code=\"InvalidApiVersionParameter\" Message=\"The api-version '2016-04-01' is invalid. The supported versions are '2018-11-01,2018-09-01,2018-08-01,2018-07-01,2018-06-01,2018-05-01,2018-02-01,2018-01-01,2017-12-01,2017-08-01,2017-06-01,2017-05-10,2017-05-01,2017-03-01,2016-09-01,2016-07-01,2016-06-01,2016-02-01,2015-11-01,2015-01-01,2014-04-01-preview,2014-04-01,2014-01-01,2013-03-01,2014-02-26,2014-04'.\"":

If I get a chance this weekend I'm going to try and reproduce this in a completely fresh environment in my own subscription to rule out some kind of configuration issue but at this point I can't see what would be wrong?

PirateBread on 8 Feb 2019

I can confirm that v0.5.10-16-gfe39b46 solves the eternal delete/update loop of doom on GKE.

eyvind on 11 Feb 2019

Thanks for the feedback, we will work on an official release which will probably land tomorrow.

Raffo on 11 Feb 2019

👍1

I have similar problem but on AWS with version __0.5.11__.
ExternalDNS is constantly updating same record every two minutes (--interval=2m)

time="2019-02-19T14:21:45Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: af6f41c7-3451-11e9-bb90-1939f5de72e5"
time="2019-02-19T14:21:52Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: b3bb1bbc-3451-11e9-92a8-118f2457694e"
time="2019-02-19T14:22:10Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:22:10Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:22:10Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"
time="2019-02-19T14:24:06Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:24:06Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:24:06Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"
time="2019-02-19T14:26:25Z" level=error msg="getting records failed: Throttling: Rate exceeded\n\tstatus code: 400, request id: 5676a7c3-3452-11e9-b59c-ddd6f4af4826"
time="2019-02-19T14:26:25Z" level=info msg="Desired change: UPSERT *.mydomain.com A"
time="2019-02-19T14:26:25Z" level=info msg="Desired change: UPSERT *.mydomain.com TXT"
time="2019-02-19T14:26:25Z" level=info msg="2 record(s) in zone incapsula-qa.de. were successfully updated"

My arguments:

      --log-level=info
      --policy=upsert-only
      --provider=aws
      --registry=txt
      --interval=2m
      --source=service

omegarus on 19 Feb 2019

👍2

Also same behavior on 0.5.9.

omegarus on 19 Feb 2019

👍2

I have the same issue as @omegarus.

FridaGo on 19 Feb 2019

I'm not seeing the needless updates on AWS as others are experiencing, but one difference may be that I don't have any cases of trying to publish wildcard DNS records, so I am wondering if the issue is somewhat specific to the wildcard?

jhohertz on 19 Feb 2019

@jhohertz The DNS records I'm trying to publish don't contain wildcards, they are configured for different ingresses that contain different service host names (for ex. service.internal.domain, app.internal.domain), and I'm still experiencing this issue (I've tried to downgrade as far as v0.5.7 and it still happens).

FridaGo on 19 Feb 2019

I'm sorry @FridaGo I'm not sure what you're experiencing. This issue and the ones I have recently posted about are all relating to a problem that was introduced in v0.5.10.

All I can suggest is try watching the status field of the services you are attaching the DNS records to, to see if something is causing updates you aren't expecting to that status, which external-dns might be picking up on. I've seen some ingress configurations cause things like that to occur.

jhohertz on 19 Feb 2019

Can we close this issue as v0.5.11 was released?

hjacobs on 19 Feb 2019

@jhohertz Status field is constant and not changing.

status:
  loadBalancer:
    ingress:
    - hostname: x8076o593986511e9b2dc86r8d247u18-9901230772.us-west-1.elb.amazonaws.com

omegarus on 19 Feb 2019

dnslog.txt
I'm seeing this same behavior with infoblox after upgrading from 0.5.9 to 0.5.11. I'm going to try and downgrade to 5.9 to see if it resolves it. So much churn with the recycling bin that it blew up the Infoblox DB.
Sample logs attached.

aminGwork on 27 Feb 2019

Have the same issue on v0.5.11 on GKE

aslimacc on 2 Apr 2019

For me, on AWS, both running v0.5.9 and v0.5.11, haven't seen such a problem. Maybe it has something to do @jhohertz mentioned?

Raffo on 5 Apr 2019

Found a solution to the problem.
If you have another externaldns who have the same txt records value, the first externaldns will delete the records of the second and vice versa
you should change the value of "txtOwnerId" for each externaldns deployment.

medanasslim on 8 Apr 2019

👍1

@medanasslim great, thanks for posting an update.

Ping to @PirateBread and @aslimacc , do you have additional info to share and/or are you still experiencing this issue?

Raffo on 11 Apr 2019

Works for me

aslimacc on 11 Apr 2019

Experiencing the same issue with Cloudflare and both registry.opensource.zalan.do/teapot/external-dns:v0.5.9 and registry.opensource.zalan.do/teapot/external-dns:v0.5.12.

...
    spec:
      containers:
      - args:
        - --source=ingress
        - --domain-filter=my-domain.com
        - --provider=cloudflare
        - --cloudflare-proxied
        env:
        - name: CF_API_KEY
          value: 
        - name: CF_API_EMAIL
          value: 
        image: registry.opensource.zalan.do/teapot/external-dns:v0.5.9
        imagePullPolicy: Always
...

MiniJerome on 18 Apr 2019

I am on Cloudflare and as I said above, you should add "txt-owner-id"

Example below:

args:
- --log-level=info
- --registry=txt
- --interval=1m
- --txt-owner-id=instance1

medanasslim on 19 Apr 2019

👍1

I am on Cloudflare and as I said above, you should add "txt-owner-id"

Example below:

args:

--log-level=info

--registry=txt

--interval=1m

--txt-owner-id=instance1

Thank you for the advice but this doesn't fix the issue.
This is useful if you have multiple clusters using the same DNS zone.

MiniJerome on 22 Apr 2019

Can you share your logs, please to see the behavior of the app?

On Mon, Apr 22, 2019 at 5:21 PM Jérôme Lecorvaisier <
[email protected]> wrote:

I am on Cloudflare and as I said above, you should add "txt-owner-id"

Example below:

-

args:

--log-level=info

--registry=txt

--interval=1m

--txt-owner-id=instance1

Thank you for the advice but this doesn't fix the issue.
This is useful if you have multiple clusters using the same DNS zone.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-incubator/external-dns/issues/883#issuecomment-485447952,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALK4NGXXQPM55KKTQWPQUJDPRXJY7ANCNFSM4GTSROLA
.

aslimacc on 22 Apr 2019

Can you share your logs, please to see the behavior of the app?
…
On Mon, Apr 22, 2019 at 5:21 PM Jérôme Lecorvaisier < @.*> wrote: I am on Cloudflare and as I said above, you should add "txt-owner-id" Example below: - args: - --log-level=info - --registry=txt - --interval=1m - --txt-owner-id=instance1 Thank you for the advice but this doesn't fix the issue. This is useful if you have multiple clusters using the same DNS zone. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#883 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ALK4NGXXQPM55KKTQWPQUJDPRXJY7ANCNFSM4GTSROLA .

Sure, you can see logs here https://github.com/kubernetes-incubator/external-dns/issues/992

MiniJerome on 23 Apr 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 22 Jul 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 21 Aug 2019

dnslog.txt
I'm seeing this same behavior with infoblox after upgrading from 0.5.9 to 0.5.11. I'm going to try and downgrade to 5.9 to see if it resolves it. So much churn with the recycling bin that it blew up the Infoblox DB.
Sample logs attached.

I'm also seeing this with the infoblox provider running v0.5.15. Removing my TTL annotations as per a previous comment resolved this issue.

anguswilliams on 16 Sep 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 16 Oct 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 16 Oct 2019

Hi, sorry to open up this ticket again but I've faced the same issue. Once removed all other sources than istio-gateway the problem ~dissapeared~.

Edit: actually it didn't. I'm investigating it further.

heprotecbuthealsoattac on 16 Oct 2019

Seeing this as well with Istio gateways and TransIP provider. We do have two instances of external-DNS for the same zone but with different txt-owner-id so that shouldn't be a problem.

mlushpenko on 26 Oct 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 25 Nov 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 25 Nov 2019

/remove-lifecycle rotten

Xnyle on 24 Feb 2020

/reopen

Xnyle on 24 Feb 2020

@Xnyle: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 24 Feb 2020

txt-owner-id

works for me

valery-zhurbenko on 4 May 2020

External-dns: ExternalDNS deleting and then creating records. Constantly. Azure.

Most helpful comment

All 47 comments

Related issues