External-dns: GCP error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet

Created on 15 Feb 2018  Â·  11Comments  Â·  Source: kubernetes-sigs/external-dns

I am seeing this in the in the logs:

"Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

I am trying to use a dirty zone, and it worked well in some initial tests I did, but somehow eventually I started seeing this. I am going to work around the problem by clearing my zone, but it would be good to understand what this error really means and why it's working okay sometimes and sometimes it doesn't.

kinsupport providegoogle

Most helpful comment

Same issue here, from what I can tell even though the record is updated in the first place, then external-dns tries to delete a record with the default ttl (300) which doesn't exist:

time="2018-07-31T12:10:18Z" level=info msg="Change zone: my-dns-zone"
time="2018-07-31T12:10:18Z" level=info msg="Del records: record.mydomain.tld. A [10.22.22.7] 300"
time="2018-07-31T12:10:18Z" level=info msg="Add records: record.mydomain.tld. A [10.22.22.7] 30"
time="2018-07-31T12:10:19Z" level=error msg="googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

that shouldn't happen since a record with the correct ttl is already there

All 11 comments

I thought I should provide more background on my use-case.

I have a managed zone in GCP, it's called training.weave.works.

I spin up a few clusters, where each cluster is called training-user-<N>, and it has one service that sets the following annotations:

external-dns.alpha.kubernetes.io/hostname: "training-user-<N>.training.weave.works"
external-dns.alpha.kubernetes.io/ttl: "5"

So for each cluster I have a DNS record that points at the service inside that cluster.

I have configured external-dns like this:

          - name: external-dns
            image: registry.opensource.zalan.do/teapot/external-dns:v0.4.8
            args:
            - --source=service
            - --source=ingress
            - --policy=upsert-only
            - --provider=google
            - --registry=txt
            - --txt-owner-id=dx-training-external-dns
            - --domain-filter=training.weave.works
            - --google-project=dx-training

I wonder whether I should try tweaking --policy, --domain-filter or --txt-owner-id to more specifically assign each controller to it's own subset of records?
E.g., I suppose I could use --domain-filter="training-user-0.training.weave.works", and set policy to delete anything under that, but should I then add a subdomain (like app.training-user-0.training.weave.works) or that's not essential and it's okay to make the filter narrow like this?

Besides, I'd be good to understand why that error happens in the first place, because it didn't occur to me in earlier tests.

Currently a single ExternalDNS instance is designed to manage a single Kubernetes cluster, similar to e.g. an autoscaler, ingress-controller etc.

Therefore, for each of your training clusters you'll want to deploy a dedicated ExternalDNS instance. In a simple world each cluster would have its own dedicated subdomain and you'd use --domain-filter so that every attempt to declare a DNS name outside of this domain is ignored. The whole subdomain would be managed by ExternalDNS, hence there'd be no conflicts.

If multiple clusters share the same DNS namespace the different ExternalDNS instances need to coordinate themselves a bit. This is achieved in two ways:

  • --domain-filter which instructs ExternalDNS to ignore desired DNS names that are not ending in a particular suffix
  • --txt-owner-id which is a view on a DNS domain that hides any existing DNS records that don't belong to this particular instance of ExternalDNS. The goal is that multiple ExternalDNS instances can happily sync their records in the very same DNS domain without removing each others records. (A multi-tenant DNS zone where ExternalDNS is the tenant, if you will)

What I would suggest:

  • For each cluster deploy a dedicated ExternalDNS instance in that cluster
  • For each instance use a different value for --txt-owner-id, such as training-user-<N>
  • For each instance use the --domain-filter=training.weave.works (like you did) so that ExternalDNS ignores any annotations stating something else, e.g. bad.prod.weave.works.

With that setup users of cluster training-user-<N> could still create services with annotations that instruct its ExternalDNS instance to create, e.g. training-user-<N+1>.training.weave.works. However, the --txt-owner-id at least ensures that either cluster <N> or <N+1> would manage that record but never both.

If you want to ensure that even those cases are not possible you could use a different --domain-filter for each ExternalDNS instance. The domain filter is a simple suffix match so you could use --domain-filter=-<N>.training.weave.works for cluster <N> and so on. Since this looks a little odd you may also consider to give each cluster a full subdomain so your domain filter looks more like --domain-filter=".cluster-<N>.training.weave.works".

Finally, it looks to me your clusters are short lived which raises the question of cleanup. If you just terminate your cluster your DNS records will survive and they will be owned by this particular cluster's ExternalDNS instance, therefore you will never be able to reuse them in another cluster (they are claimed and you just terminated the only instance that can unclaim them, besides your manual hands of course).

Either delete all Services and Ingresses from your cluster and wait for ExternalDNS to do another syncronization before you terminate it or delete all records manually that belong to this particular --txt-owner-id after you terminated the cluster to unclaim them.

Regarding the error: afaik, this precondition error occurs when you try to delete a DNS record that doesn't exist. I believe that multiple concurrent ExternalDNS instances do conflicting changes because they share the same --txt-owner-id but since they manage different clusters see different Services.

  • ExternalDNS instance <N> constantly creates training-user-<N> and drops training-user-<N+1>
  • ExternalDNS instance <N+1> constantly creates training-user-<N+1> and drops training-user-<N>

Using different values for --txt-owner-id solves that issue.

On a side node, you can also have DNS names automatically being generated without having to add annotations by using the --fqdn-template feature.

@errordeveloper What you are trying looks interesting. Please let us know about your progress. 😃

Martin,

Thanks for your input! I already have a working setup. I will try adding
unique TXT ID, I didn't do that for some reason. If you are interested in
this training setup of ours, I have most of it the bits under
https://github.com/errordeveloper/k9c/blob/master/README.md, but there is
also an internal repo that has more glue scripts that I am not ready to
share publicaly yet (however happy to walk through in private).

On Thu, 22 Feb 2018, 3:08 pm Martin Linkhorst, notifications@github.com
wrote:

Currently a single ExternalDNS instance is designed to manage a single
Kubernetes cluster, similar to e.g. an autoscaler, ingress-controller etc.

Therefore, for each of your training clusters you'll want to deploy a
dedicated ExternalDNS instance. In a simple world each cluster would have
its own dedicated subdomain and you'd use --domain-filter so that every
attempt to declare a DNS name outside of this domain is ignored. The whole
subdomain would be managed by ExternalDNS, hence there'd be no conflicts.

If multiple clusters share the same DNS namespace the different
ExternalDNS instances need to coordinate themselves a bit. This is achieved
in two ways:

  • --domain-filter which instructs ExternalDNS to ignore desired DNS
    names that are not ending in a particular suffix
  • --txt-owner-id which is a view on a DNS domain that hides any
    existing DNS records that don't belong to this particular instance of
    ExternalDNS. The goal is that multiple ExternalDNS instances can happily
    sync their records in the very same DNS domain without removing each
    others records. (A multi-tenant DNS zone where ExternalDNS is the tenant,
    if you will)

What I would suggest:

  • For each cluster deploy a dedicated ExternalDNS instance in that
    cluster
  • For each instance use a different value for --txt-owner-id, such as
    training-user-
  • For each instance use the --domain-filter=training.weave.works (like
    you did) so that ExternalDNS ignores any annotations stating something
    else, e.g. bad.prod.weave.works.

With that setup users of cluster training-user- could still create
services with annotations that instruct its ExternalDNS instance to create,
e.g. training-user-.training.weave.works. However, the --txt-owner-id
at least ensures that either cluster or would manage that
record but never both.

If you want to ensure that even those cases are not possible you could use
a different --domain-filter for each ExternalDNS instance. The domain
filter is a simple suffix match so you could use
--domain-filter=-.training.weave.works for cluster and so on.
Since this looks a little odd you may also consider to give each cluster a
full subdomain so your domain filter looks more like
--domain-filter=".cluster-.training.weave.works".

Finally, it looks to me your clusters are short lived which raises the
question of cleanup. If you just terminate your cluster your DNS records
will survive and they will be owned by this particular cluster's
ExternalDNS instance, therefore you will never be able to reuse them in
another cluster (they are claimed and you just terminated the only instance
that can unclaim them, besides your manual hands of course).

Either delete all Services and Ingresses from your cluster and wait for
ExternalDNS to do another syncronization before you terminate it or delete
all records manually that belong to this particular --txt-owner-id after
you terminated the cluster to unclaim them.

Regarding the error: afaik, this precondition error occurs when you try to
delete a DNS record that doesn't exist. I believe that multiple concurrent
ExternalDNS instances do conflicting changes because they share the same
--txt-owner-id but since they manage different clusters see different
Services.

  • ExternalDNS instance constantly creates training-user- and
    drops training-user-
  • ExternalDNS instance constantly creates training-user-
    and drops training-user-

Using different values for --txt-owner-id solves that issue.

On a side node, you can also have DNS names automatically being generated
without having to add annotations by using the --fqdn-template feature.

@errordeveloper https://github.com/errordeveloper What you are trying
looks interesting. Please let us know about your progress. 😃

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/kubernetes-incubator/external-dns/issues/467#issuecomment-367710177,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS1wO8ffy2w9da8SH_kGpX3sqNC_gks5tXYMBgaJpZM4SGiwm
.

Unfortunately I am currently running into the same issue. I am using the current master as of today. external-dns has already some A records from ingress resources. After that I added an annotation specifying the TTL of that resource. But external-dns is not able to update them in Google Cloud DNS. I get the following log messages:

{"level":"info","msg":"Change zone: my-zone","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: api.my.zone. A [37.137.52.2 35.90.152.2] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Del records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: api.my.zone. A [37.137.52.2 35.90.152.2] 60","time":"2018-03-06T14:33:25Z"}
{"level":"info","msg":"Add records: external-dnsapi.my.zone. TXT [\"heritage=external-dns,external-dns/owner=external-dns,external-dns/resource=ingress/default/api-ingress\"] 300","time":"2018-03-06T14:33:25Z"}
{"level":"error","msg":"googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet","time":"2018-03-06T14:33:26Z"}

My first thought was, that it tried to specify the wrong (new) TTL when trying to delete the records, but as the info log shows, the TTL is correctly the old one.
Is this a known problem? And if not any tips how I can see the raw requests against the Cloud DNS API? At least q quick look didn't reveal a quick way to print the requests to the log.

I just realised that for me this error didn't appear until I have tweaked the TTL to lowest possible (5s, IIRC). Perhaps this is a more general issue to do with low TTL? I also noticed that TTL doesn't apply to TXT records, which could be related, but I don't know.

I think the change in TTL is the problem, not the length of the TTL. The records were probably first created without the TTL annotation, and then you probably added the annotation later to modify the TTL. At least that is what I was doing.
After that updates of the records are not possible any more. Actually all updates are failing, because they are batched together in a dns change request and since the delete portion fails, creates also are never executed (which is probably very sane behavior from the Cloud DNS backend).
But right now I don't really have an explanation why this is happening. Records you want to delete need to match the existing records exactly. And looking at the implementation this should be the case.

Just encountered this : the issue is that when deleting a record after a change of TTL via annotations, external-dns tries to delete a record with the newly specified TTL, so GCP can't find it and throw an error (since the existing record has the previous TTL)

I'm seeing this issue even after completely cleaning out all A and TXT records and having external-dns recreate them. As soon as it's finished creating the new ones with my annotated 60s ttl, it fails again with the "Precondition not met" error and refuses to do anything more. I've had to remove the ttl annotations to move forward.

Just to confirm the above, I've experienced a similar issues with records not being cleaned up when the TTL has been specified via an annotation:

external-dns.alpha.kubernetes.io/ttl: "30"

After removing the service the record it then tries to delete has a TTL of 300:

time="2018-05-30T21:14:25Z" level=info msg="Del records: record.mydomain.tld. A [10.193.96.17] 300"

Version: 0.5.1

Args:

        - --source=ingress
        - --source=service
        - --domain-filter=mydomain.tld
        - --provider=google
        - --policy=sync
        - --google-project=my-project
        - --registry=txt
        - --txt-owner-id=kubernetes
        - --log-level=debug

I can confirm that when I remove external-dns.alpha.kubernetes.io/ttl I'm not seeing this issue on external-dns-0.6.0 and external-dns-0.6.1.

When I change the TTL manually, this does not cause an issue until an update needs to happen.
When I set external-dns.alpha.kubernetes.io/ttl, the record is unable to be updated.
When I leave everything default, updates work correctly.

Same issue here, from what I can tell even though the record is updated in the first place, then external-dns tries to delete a record with the default ttl (300) which doesn't exist:

time="2018-07-31T12:10:18Z" level=info msg="Change zone: my-dns-zone"
time="2018-07-31T12:10:18Z" level=info msg="Del records: record.mydomain.tld. A [10.22.22.7] 300"
time="2018-07-31T12:10:18Z" level=info msg="Add records: record.mydomain.tld. A [10.22.22.7] 30"
time="2018-07-31T12:10:19Z" level=error msg="googleapi: Error 412: Precondition not met for 'entity.change.deletions[0]', conditionNotMet"

that shouldn't happen since a record with the correct ttl is already there

Was this page helpful?
0 / 5 - 0 ratings