External-dns: Cache ListRecords result

Created on 27 Apr 2017  路  11Comments  路  Source: kubernetes-sigs/external-dns

Currently each iteration of the synchronisation loop requires external-dns to fetch the list of all records from DNS Provider, which can be in general case avoided by caching records in memory. We can define a cache with lease period, which will be updated in two scenarios:

  1. Lease period expires
  2. create records fails due to the name overlapping.

In general case it should greatly help with reducing the API rates, especially in cases where create API rarely fails (never happens in case if DNS provider is used solely by single instance of external-dns). In case of stable and moderately active cluster (with not so many ingress/service being created/modified) external-dns will be able to reduce its interaction with DNS provider to bare minimum (on average to one request per lease period)

This behaviour can be pluggable via cmd line flag e.g. --enable-cache

kinfeature lifecyclrotten

Most helpful comment

Setting --interval to higher value would mean user having to wait longer for the record to be created. With cache enabled we will "try our luck" by creating (not upsert) and if it fails update the cache. It would mean with lease period of 1 hour and interval of 1 min - each minute we would only need to send ChangeRecords request to DNS Provider if something has changed according to the cache (without actually polling for records first) - so comparing to current implementation within one hour we could make as less as 1/60 of the requests.

All 11 comments

/cc @linki @justinsb @iterion @hjacobs thoughts? I think this can be potentially useful in the view of AWS rate limiting issues

In general 馃憤 But how does this differ from setting the --interval to a higher value?

Setting --interval to higher value would mean user having to wait longer for the record to be created. With cache enabled we will "try our luck" by creating (not upsert) and if it fails update the cache. It would mean with lease period of 1 hour and interval of 1 min - each minute we would only need to send ChangeRecords request to DNS Provider if something has changed according to the cache (without actually polling for records first) - so comparing to current implementation within one hour we could make as less as 1/60 of the requests.

sgtm 馃憤

@ideahitme what would you use as default TTL for the cache? I think 1 hour is far too long as it would mean that "manually" changing/deleting records would only be "restored" after one hour: IMHO the system should strive for correctness, i.e. Kubernetes state should reflect real DNS state. I guess something in the range of minutes is good enough, e.g. we could reduce the default interval to 30s and have a cache TTL of 300s (5 minutes):

  • the reduced default interval makes sure that any changes in Kubernetes (mostly triggered by users) are quickly synced to DNS :arrow_right: users get quick feedback
  • the 5 min cache makes sure that we stay within (AWS) API rate limits

@hjacobs yes, 1 hour is just an example :D but even setting a TTL of 5min would give a huge win - minimising number of potential clashes (after manual changes) and significantly reducing number of AWS API requests

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/reopen

Was this page helpful?
0 / 5 - 0 ratings