Envoy: Respecting DNS TTL for STRICT_DNS

Created on 9 May 2019  路  5Comments  路  Source: envoyproxy/envoy

Rather than setting the dns_refresh_rate for a cluster, it would be nice to be able to just respect the TTL returned in the DNS response. I was wondering is this intentionally not supported, or just has not been needed yet? If not, would there be interest in adding this feature?

If there is nothing blocking such a change, please let me know and I can consider working on this feature.

Background: We get a lot of reports of high DNS requests from Envoy from Istio users. The cause of this is clear - we use the default dns_refresh_rate of 5s, so each STRICT_DNS cluster is sending a lot of DNS requests.

Our proposed mitigation is to change the default to 5min, to match a typical TTL. However, this isn't a perfect solution, as some services we would actually want a lower refresh rate, and would prefer to not need to set for each service.

More background: https://github.com/istio/istio/issues/13710

enhancement help wanted

All 5 comments

Sounds like a very useful optional feature to me.

@jplevyak @silentdai @PiotrSikora ^

Yes, this would be great... ideally, with some pre-fetch, so that we never have to block requests waiting for the DNS response.

Hi @mattklein123 @jplevyak @silentdai @PiotrSikora, I am implementing this feature and I have some questions.

Getting record ttl is difficult. Envoy uses c-ares to query DNS. In current repository, ares_gethostbyname is used for resolving name. Unfortunately, ares_gethostbyname doesn't provide record ttl information to callback function.

In order to get record ttl, as far as I know, alternatives are ares_query, ares_search. In fact, ares_gethostbyname calls ares_query and ares_search internally. Compared with those two functions, ares_gethostbyname provides some optimizations, for example, before querying DNS server, ares_gethostbyname will look up host files, which is useful when resolving localhost.

A possible solution is moving the optimization code into the function where ares_query and ares_search is called. I don't like this solution because I don't think we should manage optimization for dns resolution.

Do you have any suggestions?

This is done.

Was this page helpful?
0 / 5 - 0 ratings