Zig: question: async/await and threadpools for fs/cpu work

Created on 3 Oct 2018 · 3Comments · Source: ziglang/zig

It was great to see the 0.3.0 release the other day.

I think this will not be a problem with zig, but just want to double-check that fs tasks and cpu tasks can use separate threadpools? i.e. zig won't lump everything together into the same threadpool when using language primitives like async/await? The user will be able to indicate what task should go into what threadpool?

Otherwise the faster cpu tasks would get stuck behind the slower fs tasks, kind of like racing the Dakar and the Grand Prix on the same track.

question

Source

jorangreef

Most helpful comment

I think we're still not quite on the same page. The way it works in status quo is that the thread pool is for CPU tasks only. I/O tasks cannot block CPU tasks. I don't think the head-of-line situation you are describing is possible.

Currently, for Linux and MacOS (Windows has true async file system I/O), there is a dedicated thread, outside the thread pool, for doing blocking file system operations.

Sadly, Linux and MacOS provide no way to determine the "hardware id" that a file descriptor belongs to, so that we could attempt to determine how much parallel blocking file system operations would be appropriate. For some hard drives, 2 threads would be slower than one. So there is only one. This will probably have to be configurable because there is no way the zig standard library can query the OS to find out the appropriate amount of parallelism for file system I/O. Potentially the API could let the user provide that information when doing async fs I/O.

andrewrk on 8 Oct 2018

👍2

All 3 comments

To be clear, this is a question about the standard library; not the language. Zig has no runtime, and so these kinds of decisions are made in userland. One could always create an alternate event loop implementation and make that its own package.

How it works today:

There is a thread pool the size of the number of logical CPUs. The main thread (whatever thread calls event_loop.run()) is not special and is one of the members of the pool. If you only create one event loop (which is the intended usage) then there is only one thread pool.

I'm not sure exactly what you mean by "fs tasks" but I don't think that those exist in this model. It uses non-blocking I/O API. When an I/O operation is pending, it goes into the epoll set / kqueue / IOCP. All the thread pool workers are either crunching cpu tasks or waiting for an I/O event from the OS. When an I/O event completes the OS chooses a worker to wake up (if no worker is available, then the first worker to finish its cpu task will get it). This worker then causes the await to finish of whatever I/O was pending, and continues executing the code from that point on.

I don't understand how it could be any more efficient than this, but, my mind is open that there is something I didn't consider.

andrewrk on 5 Oct 2018

Thanks for the explanation.

I was thinking of one event loop dispatching tasks to multiple threadpools, according to the performance profile of the tasks dispatched.

And in particular, Linux, where non-blocking I/O for disk reads and writes (as opposed to network requests such as DNS) would be implemented by a threadpool rather than AIO (which is what libuv does if I am not mistaken).

In this environment, it's possible for I/O tasks to have wildly different performance profiles, which can lead to head-of-line blocking.

For example:

Disk I/O might be on the order of hundreds of milliseconds, especially for large reads or writes (if these are not partitioned).
Whereas crypto dispatched to the threadpool to avoid blocking the event loop and for multi-core throughput, might be on the order of tens of milliseconds.

In this case, since the disk I/O is not "run hot", it makes sense to have these run in a threadpool which is sized larger than the number of cores, for greater concurrency (e.g. 4 threads on a 4-core system would struggle to saturate a single SSD).

In this case, it would also be good to keep the disk I/O threadpool separate from the async crypto threadpool, to avoid crypto calls sitting behing disk calls. The crypto (or CPU-intensive) threadpool would obviously be sized to the number of cores, since these tasks are run hot.

I know that Node.js and libuv have had numerous head-of-line blocking issues (e.g. a few misbehaving DNS lookups block the entire threadpool) and are currently experimenting with this.

I just wanted to give you a heads-up. Hopefully, this is something you can keep in mind, so that Zig's standard library will make it possible to dispatch from a single loop into multiple threadpools, depending on the performance profile (e.g. slow I/O, fast I/O, CPU).

Obviously, this is less necessary where platforms support true AIO, but not all do.

jorangreef on 6 Oct 2018

👍2

Currently, for Linux and MacOS (Windows has true async file system I/O), there is a dedicated thread, outside the thread pool, for doing blocking file system operations.

andrewrk on 8 Oct 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings