Got: DNS cache

Created on 14 Nov 2018 · 20Comments · Source: sindresorhus/got

I do understand that mabe it's not the purpose of this lib (feel free to close in case).
But would be useful adding a dns cache layer.
It's one of the common problem nowadays with huge number of requests. It doesn't cache dns lookups or anything. Maybe it's worth just mentioning this in Readme.

What's the cleanest way to deal with it?

enhancement ✭ help wanted ✭

Source

roccomuso

👍1

Most helpful comment

I think the ideal implementation would be an _automatic_ TTL cache by simply respecting the TTL of the DNS record itself. If the upstream DNS server(s) are respecting TTL to spec, they will be caching for the same TTL meaning a value lower than set on the DNS record is unnecessary.

As for a size limit, I don't think it's necessary. Absolute worst case, a domain (63 characters) and IPv6 address (128 bit) pair are represented in 79 bytes. That means you can store 12,658 cached DNS entries per megabyte of memory. In practice, you'd get 25,000+ since domains are shorter. For long running processes, we _could_ implement an asynchronous tick (e.g. setInterval) to clean out expired entries every few minutes if it becomes absolutely necessary.

If we implement it similar to request caching, the user can bring their own cache if they need to manage it more closely. For instance, passing in a custom instance of keyv enabling them to clear entries as necessary.

We should however offer an option to disable it. This leaves us with two modes:

On, fully automated, no limit
Off

Just my $0.02.

brandon93s on 21 Nov 2018

👍4

All 20 comments

Node.js has ability to get the TTL of DNS lookups: https://github.com/nodejs/node/pull/9296 So we could potentially implement a simple automatic cache based on that.

Some prior art (which all seems to require setting TTL manually):

sindresorhus on 14 Nov 2018

👍2

@sindresorhus I would like to work on that
Also, don't you want to set TTL and size manually?
Or maybe for requests which are used often we can automatically increase TTL

morozRed on 21 Nov 2018

We should however offer an option to disable it. This leaves us with two modes:

On, fully automated, no limit
Off

Just my $0.02.

brandon93s on 21 Nov 2018

👍4

btw can anyone assign this issue to me, because I'm already working on it (if it's necessary)

morozRed on 22 Nov 2018

ok, so the cache is almost ready but I don't know what to do with resolving localhost type of addresses, please check implementation docs on node dns.lookup/dns.resolve*
So my plan was to use dns.resolve4/dns.resolve6, but this methods are not checking etc.hosts, and I'm worried that if someone will set custom hosts there they will not be resolved. Any ideas?

morozRed on 26 Nov 2018

Wouldn't it make sense to simply accept a lookup function like the Node APIs do? Then you can use one of the existing DNS cache implementations like redns?

pietermees on 27 Nov 2018

@pietermees default dns.lookup method is not supporting providers ttl which we want to support.
Also, we want to support different types of storage (i.e. keyv).

About dns.resolve(), dns.resolve*()

These functions are implemented quite differently than dns.lookup(). They do not use getaddrinfo(3) and they always perform a DNS query on the network. This network communication is always done asynchronously, and does not use libuv's threadpool.
As a result, these functions cannot have the same negative impact on other processing that happens on libuv's threadpool that dns.lookup() can have.
They do not use the same set of configuration files than what dns.lookup() uses. For instance, they do not use the configuration from /etc/hosts.

For more info please refer to this doc.

And there is definitely no point to add another dependency because it's not so much code to do.

morozRed on 27 Nov 2018

@morozRed sorry, I should have been clearer. I meant to ask why DNS caching should be part of got rather than an external thing you can plug in.

The reason why I ask is that Node APIs already allow overriding DNS lookups on TCP/HTTP/HTTPS/H2 connects by providing a custom lookup function. Existing packages like redns and other leverage that to provide DNS caching capabilities similar to what you're trying to implement here.

My main point is that adding a lookup option to got similar to the lookup option in socket.connect. That would allow you to easily plug in any of the existing DNS cache packages.

pietermees on 28 Nov 2018

@pietermees oh ok, well the idea was to make automatic pre-initialized DNS cache, but still, you already able to do this:

await got('google.com', { lookup: CUSTOM_DNS_CACHE.lookup });

morozRed on 28 Nov 2018

👍1

Having played around with some of the DNS cache implementations that are around, I think you'll find that you can't easily provide a default DNS cache that is fully transparant (i.e. handles DNS exactly the same way as the system would).

As you've indicated you have to use the dns.resolve API which behaves differently than getaddrinfo. This impacts /etc/hosts but also IPv6 vs IPv4 preferences etc.

With that in mind, I don't think it's possible to provide an implementation that's enabled by default and does not change semantics for all got users.

Hence my suggestion to use the lookup option (I didn't know that already worked, thanks!).

It's already available
You can use an existing DNS cache package without having to write anything
If it can't be enabled by default, you'd have to provide an option to activate it anyway, so it's the same amount of work as just providing a custom lookup function

pietermees on 28 Nov 2018

ok so @sindresorhus @brandon93s what do you guys think about it? Maybe I can update the docs here at least?

morozRed on 28 Nov 2018

What should we do when the DNS server offers us multiple IP addresses? Should we cache them?

szmarczak on 24 Dec 2018

@szmarczak you can cache all the addresses and then use round-robin strategy. But I'm not sure now if we do need to implement custom cache if you can simply pass package or custom lookup function to got.

morozRed on 24 Dec 2018

you can simply pass package or custom lookup function to got

Well, there isn't one which will set TTL automatically... We need to create it :)

szmarczak on 24 Dec 2018

@szmarczak this package supports provided DNS TTL. I've been working on cache for got but then I got stuck because of this discussion.

morozRed on 24 Dec 2018

Well, it sounds good... but it isn't. It's way too bloated. Is there any way to manage the database? I'll try to make a prototype now :)

szmarczak on 24 Dec 2018

It's way too bloated.

To be fair, that's mostly because it pulls in async (to get IPv4 and IPv6 stuff in parallel) and lodash (to do input validation). If the maintainer's willing, it wouldn't be that hard to pull those out.