Runtime: Dns.GetHostAddressesAsync is blocking a ThreadPool thread

Created on 7 May 2016  路  25Comments  路  Source: dotnet/runtime

The implementation of Dns.GetHostAddressesAsync is just a wrapper around the blocking implementation and is queued on the ThreadPool.

https://github.com/dotnet/corefx/blob/1baa9ad06a466a5d6399e1e0a7a7a02564ab51b0/src/System.Net.NameResolution/src/System/Net/DNS.cs#L274

This can lead to a ThreadPool starvation when there are many concurrent calls of GetHostAddressAsync in an environment where the DNS is slow.

The code in .Net 4.6 is slightly different but suffers from the same issue:
http://referencesource.microsoft.com/#System/net/System/Net/DNS.cs,744

Windows and Linux both provides real asynchronous calls to resolve a DNS. The asynchronous implementation could use these.

area-System.Net.Sockets bug up-for-grabs

Most helpful comment

Could formalize it?

// Queues to secondary thread-pool so as not to cause starvation in main pool by holding threads
Task Task.RunBlockingAsAsync(Action action);
Task Task.RunBlockingAsAsync<TState>(Action<TState> action, TState state);
Task<T> Task.RunBlockingAsAsync<T>(Func<T> func)
Task<T> Task.RunBlockingAsAsync<TState, T>(Func<TState, T> func, TState state)

All 25 comments

cc: @CIPop

@JeffCyr, you're right that if Windows and Linux provide real async DNS options, these should be used to implement Dns.GetHostAddressesAsync.

However, how would that fix thread pool starvation? A real async API would also execute its callback in a threadpool thread, leading to starvation in cases where there are many concurrent calls, slow DNS and a pool without many threads (e.g. ramp-up time). In other words, the threadpool starvation issue seems like an inherent issue with any async call...

The callbacks have almost no work to do. They basically need to execute the user-provided callback and update some state (maybe complete a task). Whereas the IO might run for 100ms the callbacks run for 0.001ms and are purely CPU work.

@GSPP how do you know that the callbacks have almost no work to do? The user-provided callback my be anything at all, including some lengthy operation, computational or otherwise?

That's not the frameworks problem. If users want non-blocking IO they should not execute Sleep(1000) in their callbacks. The framework _enables_ non-blocking IO for callers where that was not possible before.

Even if users block they will at least save the lengthy DNS IO (here assumed to be 100ms). Alleviating thread pool starvation is not about solving _all_ blocking. You can make do with solving the IO that has the highest (frequency times latency) product. That's the code that actually uses the pool just waiting there.

@roji I don't understand why you think proper async calls make thread pool starvation a problem, where it's not a problem for sync calls.

If you make those calls from thread pool threads, then using async clearly decreases thread pool pressure.

If you make those calls from many non-thread pool threads, then you can (and probably should) use TaskScheduler or SynchronizationContext to execute your long callbacks back on those threads and not on the thread pool. This still leaves library callbacks on the thread pool, but those are going to be very fast. So this does increase thread pool pressure, but only by a small amount.

I never anything here was the framework's problem... Rereading the original post I agree that the current framework implementation of shoving a blocking I/O operation to the pool is highly problematic.

@gspp and @svick I admit I'm continuing the discussion in dotnet/runtime#15963 - this isn't really relevant to this issue. @svick, the point is a sync blocking operation has the (small!) advantage of not relying on the thread pool in any way - the resource needed (the thread) is "reserved" during the I/O operation and will be there in a 100% reliable way one the operation completes to continue processing. With async I/O, on the other hand, we're not sure what the TP state will be when the I/O operation completes, and cases of load coupled with a currently small TP could lead to starvation etc.

@svick I agree there are ways to deal with this, i.e. schedule long operations in your own threads etc. but this is more complicated and requires a good understanding of what's running where. All I'm saying is that async programming requires a lot more attention from the programmer (compared to sync I/O) and is not _always_ justified by the scalability it brings to the table. My problem is that the current approach seems to be to _eliminate_ sync I/O operations as if to say "always use async no matter what".

But this is me hijacking this thread, better to continue this in dotnet/runtime#15963.

Assuming you are not blocking in the threadpool threads (e.g. paused thread not executing anything) and only calling other async methods for pauses, then it should be fairly hard to starve it without maxing CPU before hand.

However if you are going to block in the continuation then you can fire it off to a thread:

await Task.Factory.StartNew( ... , TaskCreationOptions.LongRunning);

From https://github.com/dotnet/corefx/issues/5044#issuecomment-212883399

I'm the maintainer of Npgsql, the PostgreSQL driver for .NET. We've frequently received complaints that under load, and frequently at program startup, DNS queries performed by Npgsql time out. Now, the current stable Npgsql does DNS resolution with Dns.{Begin/End}GetHostAddresses. What seems to be happening, is that when the program starts the thread pool has relatively few threads. If a large amount of requests come in and a lot of async operations are in progress, the thread pool starts allocating new threads - but this process takes time; and during these time it would seem that the operation times out.

@roji the dns issue you have have encountered is as a result of you doing the right thing and the framework doing the wrong thing, which is unfortunate and will undoubtedly give a poor impression of using async methods :disappointed:

@benaadams you're probably right about this specific case. My more general point (in dotnet/runtime#15963) is that I don't necessarily want to possible TP starvation state to determine when my callback runs (regardless of what it's going to do).

@roji this is a slightly trolly response, so apologies but you could add to Main

#if net451
    int workerThreads, ioThreads;
    ThreadPool.GetMaxThreads(out workerThreads, out ioThreads);
    ThreadPool.SetMinThreads(workerThreads, ioThreads);
#endif 

for dnx based coreclr set the environment variable

ComPlus_ThreadPool_ForceMaxWorkerThreads=10000

and for dotnet cli add to [yourappname].runtimeconfig.json

{
    "runtimeOptions": {
        "configProperties": {
            "System.GC.Server": true,
            "System.GC.Concurrent": true,
            "System.Threading.ThreadPool.MinThreads": 10000,
            "System.Threading.ThreadPool.MaxThreads": 10000
        },
    }
}

It will ramp up the thread count really really fast when any are blocked

@benaadams that's not trolly at all :) I've recommended this in some cases. But having to do this kind of thing does demonstrate this one problematic aspect of async.

Again, I'm not saying async is bad or problematic in itself, just that there are situations where sync might make more sense. That's all. In other words, don't remove sync APIs.

@JeffCyr Thanks for reporting this. I agree that we should provide true async wrappers where the OS provides the support for such implementations.

Unfortunately there are probably other APIs in the same situation so we should probably revisit the implementations and search for other cases.

@roji @benaadams While I like the brainstorming regarding sync vs async, I think it would be more useful to have it all in the original thread. Otherwise some of the good ideas and concerns will be lost in multiple places. I'll try to answer on the original thread.

@CIPop you're absolutely right, sorry for hijacking this. For the record I totally agree that the framework should provide truly async DNS where that's possible, regardless of the discussion in dotnet/runtime#15963. Will be looking forward to see your answer there.

Suffer by this issue. See dotnet/coreclr#8383

I investigated the fix for this issue.

Windows

Windows 7

There is no windows API to make async dns resolving.

Windows 8 & up

There is GetAddrInfoEx and DnsQueryEx. These methods can be used for async dns resolving, but the API is a bit complex, an implementation in System.Native might be more robust than straight PInvoke.

Linux

There is getaddrinfo_a, but it's implemented by calling the synchronous getaddrinfo on another thread. c-ares seems to be a popular alternative.

Proposal

The main issue with the current implementation is that blocking calls are queued on the default ThreadPool and this cause starvation. A simple fix could be to implement a custom thread pool specialized for blocking operations.

This specialized ThreadPool could have these properties:

  • Only one permanent thread in the pool
  • New threads are added to the pool immediately if no threads are available when a workitem is enqueued
  • Except the permanent thread, other threads will close if they are unused for 1 second
  • The pool is bounded (e.g. max 64 thread)
  • Threads are started with a small stack (e.g. 128KB)

A custom TaskScheduler could wrap this ThreadPool and minimal code changes would be required to fix this issue. We would just need to replace the TaskScheduler.Default usage in DNS.cs.

This solution is less efficient than true async operations, but it might be good enough. A specialized external dns library could be implemented outside of the framework for the small percentage of applications that needs massive dns query concurrency.

Thoughts?

Could formalize it?

// Queues to secondary thread-pool so as not to cause starvation in main pool by holding threads
Task Task.RunBlockingAsAsync(Action action);
Task Task.RunBlockingAsAsync<TState>(Action<TState> action, TState state);
Task<T> Task.RunBlockingAsAsync<T>(Func<T> func)
Task<T> Task.RunBlockingAsAsync<TState, T>(Func<TState, T> func, TState state)

The main issue with the current implementation is that blocking calls are queued on the default ThreadPool and this cause starvation

The right answer to address that is to a) queue less things that do I/O blocking and/or have them do less I/O blocking, and b) for what remains, help the ThreadPool better deal with blocking workloads, e.g. similar to how pools based on I/O completion ports are able to better adjust to blocking when blocking actually occurs.

A simple fix could be to implement a custom thread pool

This is a very complicated road. If the pool has a fixed upper bound, you have an even greater risk of starvation than you do today. If it's unbounded, such policies could result in potentially huge thread growth that would cause significant problems for the process and potentially the machine.

ps There's already support for using the ThreadPool to ammortize the cost of blocking on something: ThreadPool.RegisterWaitForSingleObject. It uses as few threads as possible to wait for as many objects as are registered, and queues a work item to the pool when the object is signaled.

@stephentoub
I agree this isn't a general solution for background blocking I/O, like you said it's a complicated road.

But the current implementation of Dns.GetHostEntryAsync is quite disastrous for a high-performance app that rely on the ThreadPool. Any usage of Dns.GetHostEntryAsync or even Socket.BeginConnect(string host, int port, AsyncCallback requestCallback, object state) will queue a blocking I/O workload on the ThreadPool. At best this will add random [0-500]ms latency to the thread pool, at worst it will completely starve it.

The ThreadPool max thread is bounded at 32767, but thread spawning is throttled at 1 per second, so in practice the bound will be much lower than 32767.

There is only a short/mid-term fix for Windows 8+ because other platform don't have proper async dns resolving built-in. Even with Windows 8+, GetAddrInfoEx and DnsQueryEx requires complex native interop with managed callback.

Having a separate ThreadPool exlusively used by DNS.cs (or potentially other internal stuff) seems to be a quick win.

We can set lower stack size to reduce resources, because we know they threads will only use a small stack. We can set a sensible upper bound to not be worst than the current implementation. With current impl, if there were 100 concurrent async dns resolve, it would have taken more than 1 minute to create all the threads, so it's already impracticable.

With current impl, if there were 100 concurrent async dns resolve, it would have taken more than 1 minute to create all the threads, so it's already impracticable.

You're talking about at the very beginning of a process, before the thread pool has ramped up. Let's say we set the limit on this special pool you're talking about. Pick a number, say 100. Now a few minutes in to a fully loaded server, the thread pool is likely already at more than that, say 200 threads. At that point using this custom pool is now actually slowing things down.

I agree there's an issue here to be addressed, but I don't believe creating a dedicated thread pool is as quick a win as you suggest. We would need real data from real workloads to demonstrate that it actually is a net win.

but thread spawning is throttled at 1 per second

That's the starvation mechanism. There's also the unrelated hill climbing mechanism.
cc: @kouvel

There is only a short/mid-term fix for Windows 8+ because other platform don't have proper async dns resolving built-in. Even with Windows 8+, GetAddrInfoEx and DnsQueryEx requires complex native interop with managed callback.

Complex interop isn't a problem. We do it all the time, especially in these networking libraries. I'm also not clear on why you say it's a short/mid-term fix for Win8+... it seems like it's _the_ fix for Win8+.

I'm also not clear on why you say it's a short/mid-term fix for Win8+... it seems like it's the fix for Win8+.

I mispoke there, I meant the fix can be done in the short/mid term, this is indeed the long term fix.

Now a few minutes in to a fully loaded server, the thread pool is likely already at more than that, say 200 threads. At that point using this custom pool is now actually slowing things down.

This special thread pool would be implemented to shrink back quickly to one thread if there is no request. The .Net ThreadPool shrinks too but it takes minutes to readjusts.

Even with the optimal Win8+ fix, there still need a fallback solution for unsupported platforms.

This issue has been opened for a while and it's still flagged as "future", I hoped to make some progress on it. What is the next step to make this happen? Are you more comfortable with doing only the Win8+ fix now and leave the current implementation as fallback?

What is the next step to make this happen? Are you more comfortable with doing only the Win8+ fix now and leave the current implementation as fallback?

Yes, using the better platform-provided API on platforms where it's available is a good thing to do.

@JeffCyr, would you want to submit a PR for using the async APIs on Windows when they're available?

@stephentoub I created the PR, waiting for review

Fixed on Windows by PR dotnet/corefx#26850

Was this page helpful?
0 / 5 - 0 ratings

Related issues

btecu picture btecu  路  3Comments

aggieben picture aggieben  路  3Comments

sahithreddyk picture sahithreddyk  路  3Comments

bencz picture bencz  路  3Comments

yahorsi picture yahorsi  路  3Comments