External-dns: External DNS unnecessarily upserting the same record constantly in Route 53

Created on 11 Feb 2020 · 21Comments · Source: kubernetes-sigs/external-dns

While testing external-dns, we notice that during its check loop it runs an upsert on all hosts even if the records already exist. This is currently happening with a private zone.

kinbug provideaws

Source

jmcclell

Most helpful comment

Using tag: 0.6.0-debian-10-r0 I see this in the log files constantly:

time="2020-02-12T14:21:27Z" level=info msg="Changing record." action=CREATE record=test.example.com targets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:21:27Z" level=info msg="Changing record." action=CREATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:22:27Z" level=info msg="Changing record." action=UPDATE record=test.example.comtargets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:22:29Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:23:26Z" level=info msg="Changing record." action=UPDATE record=test.example.comtargets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:23:27Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:24:28Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:24:29Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845

Doesn't seem like it should be "Changing record." every minute if nothing's different.

keslerm on 12 Feb 2020

👍6

All 21 comments

I was also seeing the same thing with cloudflare. rolled back to image registry.opensource.zalan.do/teapot/external-dns:v0.5.16 and it stopped

stewiezc on 11 Feb 2020

This needs a bit more clarification, did you already turn on debug?
Could you add some more details incl. logs?

njuettner on 12 Feb 2020

@njuettner We were only evaluating yesterday, so I'm new to the project. Is there just a --debug flag on the binary or some other method for enabling debug-level logging?

jmcclell on 12 Feb 2020

there is --log-level=debug

njuettner on 12 Feb 2020

Manifests

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get","watch","list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns-provider
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns-public
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: external-dns-public
  template:
    metadata:
      labels:
        app: external-dns-public
    spec:
      serviceAccountName: external-dns-provider
      containers:
      - name: external-dns-public
        image: registry.opensource.zalan.do/teapot/external-dns:latest
        args:
        - --source=service
        - --source=ingress
        - --domain-filter=dev.foo.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
        - --provider=aws
        #- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
        - --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
        - --annotation-filter=kubernetes.io/ingress.class=external-ingress
        - --registry=txt
        - --txt-owner-id=XXXXXX
      securityContext:
        fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns-private
spec:
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: external-dns-private
  template:
    metadata:
      labels:
        app: external-dns-private
    spec:
      serviceAccountName: external-dns-provider
      containers:
      - name: external-dns-private
        image: registry.opensource.zalan.do/teapot/external-dns:latest
        args:
        - --source=service
        - --source=ingress
        - --domain-filter=dev.foo.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
        - --provider=aws
        #- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
        - --aws-zone-type=private # only look at public hosted zones (valid values are public, private or no value for both)
        - --annotation-filter=kubernetes.io/ingress.class=internal-ingress
        - --registry=txt
        - --txt-owner-id=XXXXXX
      securityContext:
        fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files

Logs

  time="2020-02-12T13:57:43Z" level=debug msg="pod added"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: us-east-1.dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: mycluster.dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Endpoints generated from service: elastic-stack/efk-kibana-lb: [logs.mycluster.dev.foo.com 60 IN CNAME  internal-xxx-xxx.us-east-1.elb.amazonaws.com []]"
  time="2020-02-12T13:57:44Z" level=debug msg="Endpoints generated from service: monitoring/grafana-lb: [metrics.mycluster.dev.foo.com 60 IN CNAME  internal-xxx-xxx.us-east-1.elb.amazonaws.com []]"
  time="2020-02-12T13:57:44Z" level=debug msg="Endpoints generated from service: kubernetes-dashboard/kubernetes-dashboard: [dashboard.mycluster.dev.foo.com 60 IN CNAME  internal-xxx-xxx.us-east-1.elb.amazonaws.com []]"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: mycluster.dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: us-east-1.dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Considering zone: /hostedzone/xxx (domain: dev.foo.com.)"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding metrics.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding dashboard.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding logs.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding metrics.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding dashboard.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=debug msg="Adding logs.mycluster.dev.foo.com. to zone mycluster.dev.foo.com. [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT metrics.mycluster.dev.foo.com A [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT dashboard.mycluster.dev.foo.com A [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT logs.mycluster.dev.foo.com A [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT metrics.mycluster.dev.foo.com TXT [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT dashboard.mycluster.dev.foo.com TXT [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="Desired change: UPSERT logs.mycluster.dev.foo.com TXT [Id: /hostedzone/xxx]"
  time="2020-02-12T13:57:44Z" level=info msg="6 record(s) in zone mycluster.dev.foo.com. [Id: /hostedzone/xxx] were successfully updated"

That log just repeats over and over.

Service definition example

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: http
  selector:
    app: grafana
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
    external-dns.alpha.kubernetes.io/hostname: metrics.mycluster.dev.foo.com
    external-dns.alpha.kubernetes.io/ttl: "60"
    kubernetes.io/ingress.class: external-ingress
  name: grafana
  namespace: monitoring
spec:
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: http
  selector:
    app: grafana

jmcclell on 12 Feb 2020

Using tag: 0.6.0-debian-10-r0 I see this in the log files constantly:

time="2020-02-12T14:21:27Z" level=info msg="Changing record." action=CREATE record=test.example.com targets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:21:27Z" level=info msg="Changing record." action=CREATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:22:27Z" level=info msg="Changing record." action=UPDATE record=test.example.comtargets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:22:29Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:23:26Z" level=info msg="Changing record." action=UPDATE record=test.example.comtargets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:23:27Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:24:28Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=120 type=CNAME zone=0060f5ee41789fc2dd77dacda3fa3845
time="2020-02-12T14:24:29Z" level=info msg="Changing record." action=UPDATE record=test.example.com targets=1 ttl=1 type=TXT zone=0060f5ee41789fc2dd77dacda3fa3845

Doesn't seem like it should be "Changing record." every minute if nothing's different.

keslerm on 12 Feb 2020

👍6

I guess the problem is when using ALIAS record and trying to set a TTL annotation, it won't work and it will try to update it again and again. You won't get any errors from AWS.

By default you can't specify a TTL on ALIAS records which are pointing to an ALB/ELB.

njuettner on 13 Feb 2020

👍4 ❤1

Ah, that makes perfect sense. Thanks for looking at it.

jmcclell on 13 Feb 2020

(I'm going to leave this open in case you want to address it by adding some logic to your check loop to ignore this case, but if you want to close it as a "won't fix" then I won't complain)

jmcclell on 13 Feb 2020

@jmcclell I'll create a PR to circumvent this problem and will refer it to this issue. Thank you for sharing your information and addressing this problem 👍.

njuettner on 14 Feb 2020

I guess this is a duplicate of https://github.com/kubernetes-sigs/external-dns/issues/992#issuecomment-585579081

davidkarlsen on 15 Feb 2020

I'm seeing the same error in Cloudflare, we are not using "ALIAS" records and there are no ALB/ELBs involved - we are using Cloudflare with Google Cloud Platform. When using 0.5.18 or higher, the external DNS controller deletes and creates DNS records every minute continuously.

@njuettner does your fix include resolution to that problem?

AaronFriel on 25 Feb 2020

👍2

Hi,

We have this on AWS with 0.5.14, 0.5.15, 0.5.16, 0.5.17, 0.5.18 and 0.6.0, only not an upsert but a Create. On 0.5.9 this does not happen, but there every seconds 2 calls to listHostedZones occur which is going to flood our api call quota.

awouda on 25 Feb 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 25 May 2020

/remove-lifecycle stale

Bluesboy on 25 May 2020

This should be fixed at least for cloudflare as of 0.7.2, could you check?

sheerun on 4 Jun 2020

It seems to work well with 0.7.2 - thanks 🎉 I tested using the cloudflare provider.

davidkarlsen on 6 Jun 2020

OK, I'll consider this fixed. If not, please create another issue with steps to reproduce, or ideally a test in cloudflare_test.go or other affected providers. Tests for Cloudflare provider are really easy to write :)

/close

sheerun on 27 Jun 2020

@sheerun: Closing this issue.

In response to this:

OK, I'll consider this fixed. If not, please create another issue with steps to reproduce, or ideally a test in cloudflare_test.go or other affected providers. Tests for Cloudflare provider are really easy to write :)

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 27 Jun 2020

just for the record, removing external-dns.alpha.kubernetes.io/ttl from my manifest just worked, it seems that if you don't try to force the ttl, it doesn't try to change the record again and again.

julian3xl on 28 Jul 2020

@sheerun seeing this issue on AWS. Removing external-dns.alpha.kubernetes.io/ttl from Service LoadBalancer fixed it for that service, but it remains for Istio Gateway, which, AFAIK, has no TTL setting.