Title: DNS Filter - high latency when using upstream resolvers
Description:
When using the DNS filter's upstream resolvers there's an additional ~5sec latency with resolves.
Using host and dig don't appear to be affected, but when performing a HTTP request where the requests first performs a DNS lookup, I'm noticing a consistent 5sec additional latency. The latency isn't there if I define a static domain for Envoy to return.
Performing a tcpdump it appears that there's two upstream DNS requests. Both requests that envoy forwards to upstream return a successful resolve. The first response for some reason doesn't get picked up by the client, where a second request is then made which the client appears to successfully pick up. After the second resolve, the HTTP clients then continue with the requests now that it has an IP to use. The time between the first and second DNS request is where I'm seeing the additional latency.
Repro steps:
curl, wget or even a python requests against a non-static entry so that the upstream resolver is used.Config:
https://gist.github.com/skiptomyliu/0ae0959d5f2d6b6c225b393ed145fb73
@abaptiste do you mind taking a look at this?
I am able to reproduce the delay. Using dig or any tool (python3-dnspython) to interact with the filter directly does not show the issue.
Let me dig into this a bit and I'll let you know what I find out.
/assign abaptiste
It turns out that the client is sending 2 queries with two different Query ID's. When the response for the first query is being generated it is erroneously using the ID for the second query. The client waits until the ID in the response matches the ID of the query.
I am working on tests for this and will create a PR with the fix