Runtime: Use io_uring instead of epoll when supported

Created on 11 Dec 2019  路  29Comments  路  Source: dotnet/runtime

io_uring is a new method to perform efficient I/O on Linux systems. It provides a completion model (rather than a readiness model), similar to what IOCP on Windows provides, and unlike the standard poll-like interfaces, it can be used to request I/O from regular files as well (and, unlike the old/broken AIO in Linux, it doesn't require files to be opened in O_DIRECT mode).

It is a recent development, but reports of it being used by servers are very promising, often yielding gains exceeding 2 or 4x in throughput. Here's a talk by its main author with details, including benchmarks.

In addition to I/O (read/write/poll), it's also possible to handle connections (accept/connect) and a bunch of other things.

It should be possible to enable this and have both io_uring and epoll (as a fallback) in pal_networking.

area-System.Net.Sockets enhancement os-linux tenet-performance

Most helpful comment

FWIW, I'd be willing to change the liburing license to dual MIT/GPL. There's really nothing fancy in the library, it's mostly just helpers, and a simplified interface should the application wish to use that. But it'd be a shame to have some of this code duplicated just because of licensing constraints.

All 29 comments

Going by the pdf, it seems that polled IO might be the most suited option for PAL networking, because it is efficient, closer to epoll implementation and does not require elevated privileges (like 'kernel side polling' option). Few questions:

  • should the implementation take dependency on liburing, or can it carry some boilerplate code and let go the liburing dependency (which comes from package managers); and instead make the kernel calls directly?
  • should it be added as a shim to support runtime check (like there are shims for libssl, libnuma,). this way the same portable linux build, when running on kernel version lower than 5.1, will switch to epoll?
  • in kernel v5.4, the implementation has improved significantly, so should the PAL implementation take 5.4 as a baseline to switch from epoll to io_uring or keep 5.1 as baseline (where it was first implemented)?

I'd say depend on liburing. Doing stuff by hand is possible but we would be essentially replicating it inside the runtime; better to stick with something that's been debugged and tested already. I don't know how much they care about API and ABI compatibility at this point, so using it as a shim might not be a good idea; maybe using a git submodule?

As for the minimum kernel requirement: for io_uring, we should support 5.4+ only, falling back to epoll on older kernel versions. There were many improvements in the 5.5 series too, so eventually we might even bump the requirements if we end up taking the advantage of these features, just to simplify how we implement stuff -- for instance, async file I/O and not only sockets. (This kernel is still not common in most distributions but would be nice if the performance just appeared out of the blue after a kernel upgrade.)

Possible dupe of:

https://github.com/dotnet/coreclr/issues/24441

This situation with the issues not yet ported is starting to generate noise...

Indeed it's a dupe, @damageboy. (I'll keep this issue open here as it might be easier to reference it and it's unlikely a lot of folks will keep a close eye on the coreclr repo after the consolidation.)

@lpereira Aren't the issues moving? Has anything changed?

@lpereira Aren't the issues moving? Has anything changed?

They're moving, but it should take a month or so. I can close this one once the move is complete (can't easily mark as dupe in different repos.)

It should be possible to enable this and have both io_uring and epoll (as a fallback) in pal_networking.

i think pal_networking, coming from corefx, deserves a separate issue as there is a defined/finite surface area which is currently using epoll where io_uring can be incorporated. It can be tracked here.

coreclr issue is a broader discussion on how to make use of io_uring in variety of scenarios, which currently is done in coreclr's pal without using epoll and friends in kernel-agnostic manner, afaict.

Another thing I think we can use io_uring -- maybe not right now, but we could contribute a patch to the Linux kernel -- is to implement WaitForMultipleObjectsEx() using futexes directly, and have a command in io_uring to perform operations in multiple futexes at the same time.

Another thing I think we can use io_uring -- maybe not right now, but we could contribute a patch to the Linux kernel -- is to implement WaitForMultipleObjectsEx() using futexes directly, and have a command in io_uring to perform operations in multiple futexes at the same time.

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you?
Someone already mentioned supporting futex(2) axboe/liburing#39

epoll bare minimum echo server

50 clients, running 512 bytes, 60 sec.

Speed: 189185 request/sec, 189185 response/sec
Requests: 11351122
Responses: 11351122

io_uring bare minimum echo server (Linux 5.4 needed, lower versions don't return the right amount of bytes read from io_uring_prep_readv in cqe->res.) https://github.com/frevib/io_uring-echo-server

Benchmarking: localhost:5555
50 clients, running 512 bytes, 60 sec.

Speed: 368368 request/sec, 368368 response/sec
Requests: 22102112
Responses: 22102110

The difference looks good, even though it can do even better. E.g. io_uring allows registered buffers and fds, supports IORING_OP_ACCEPT, etc. (or get rid of callocs in the loop...)

edit removed links as author has decided on GPL v3.0

@benaadams changed it to MIT, sorry for the inconvenience. @isilence it definitely needs some optimizations and I think there are some tiny bugs. If you want/like/have time to issue a PR, I鈥檓 happy to merge.

edit author changed to MIT so put link back https://github.com/frevib/io_uring-echo-server :)

It's a networking example using liburing which is LGPL so can be linked to (though not derived from for MIT; so don't look at the source for liburing in case we do our own implementation on io_uring which must be clean and not derived from LGPL).

Though I don't know the dotnet policy on linking to LGPL and whether its allowed? /cc @jkotas

There's a very detailed document from the author of liburing @axboe who is also one of the authors of io_uring https://kernel.dk/io_uring.pdf on the motivation for io_uring and what it achieves, as well as how to use it (including considerations around memory barriers).

That then leads to the motivations for liburing and how to use that (it simplifies all the boilerplate setup and tear down for io_uring and handles all the memory barriers etc)

To quote

With the inner details of the io_uring out of the way, you'll now be relieved to learn that there's a simpler way to do much of the above. The liburing library serves two purposes:

  • Remove the need for boiler plate code for setup of an io_uring instance.
  • Provide a simplified API for basic use cases.

Also a LWN.net article about io_uring

As noted above, I think at least for the usecase in pal_networking.c in this repository, where implementation is currently using epoll, does not require link to liburing (a convenience library). It is more work, yes, but IMO worth it for dotnet runtime. Taking a dependency on another runtime library comes with cost for packaging as well. For example, liburing is not readily available in Alpine Linux package and many other package management systems, see Absent in repositories.

Notwithstanding library availability -- because we could use git submodules, for instance, and statically link with liburing -- there's a bigger issue: linking with LGPL would require us to also distribute .o files in addition to the binaries for .NET.

So I agree that it would be better to reimplement what liburing does; it's a thin wrapper around the kernel API. It mostly reduces a lot of the boilerplate necessary to map the queues and provides a bunch of auxiliary functions and whatnot.

If we're unsure how to use the API, though, it's possible to read from other implementations; for instance, there's a dual-licensed Apache 2/MIT library for Rust that could be used for studying purposes.

Also the libuv PR for io_uring could be something to look at https://github.com/libuv/libuv/pull/2322 (libuv uses an joyent attribution licence); where they also state they can't look at the source for liburing as its LGPL https://github.com/libuv/libuv/pull/2322#issuecomment-500455185

FWIW, I'd be willing to change the liburing license to dual MIT/GPL. There's really nothing fancy in the library, it's mostly just helpers, and a simplified interface should the application wish to use that. But it'd be a shame to have some of this code duplicated just because of licensing constraints.

@axboe That would be appreciated; it would indeed help a lot with io_uring adoption, given that GPL family of licenses aren't, unfortunately (in my personal opinion), that popular these days.

I like GPL for applications, and I still use it, but it makes less sense for libraries. And in particular for something like liburing, which isn't really a lot of smarts, it's mostly just setup and helper code. I'm doing some due diligence by emailing folks that have more than a few commits in liburing, then I'll change it provided nobody objects (can't see why they would).

I'm doing some due diligence by emailing folks that have more than a few commits in liburing, then I'll change it provided nobody objects (can't see why they would).

This has now been done.

For the record, here's an ASP.NET transport by @tkp1n that reimplements liburing in C#: https://github.com/tkp1n/IoUring

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you?
Someone already mentioned supporting futex(2) axboe/liburing#39

Going back to the ignored question... Guys, what's your use case and what would you need to integrate io_uring? Support for futex(2)? Something else?

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you?
Someone already mentioned supporting futex(2) axboe/liburing#39

Going back to the ignored question... Guys, what's your use case and what would you need to integrate io_uring? Support for futex(2)? Something else?

Yeah, futex support for io_uring would be very welcome, especially if it had the FUTEX_WAIT_MULTIPLE command that was proposed a while ago (the use case is for Wine's implementation of WaitForMultipleObjects(), which is currently using polled eventfds, but we also have an implementation in our PAL that could benefit from this.)

Yeah, futex support for io_uring would be very welcome, especially if it had the FUTEX_WAIT_MULTIPLE command that was proposed a while ago (the use case is for Wine's implementation of WaitForMultipleObjects(), which is currently using polled eventfds, but we also have an implementation in our PAL that could benefit from this.)

Great, I'll try to take a look. I'm concerned about not having fast-path in-userspace locking, but it should be any better than eventfd + epoll. I haven't seen FUTEX_WAIT_MULTIPLE, but will need it to be merged first.

This article about using io_uring in modern C++ (with coroutines et al) is a pretty good read and gives some API insights, too: https://cor3ntin.github.io/posts/iouring/

A general update:

All prototyping is being done on https://github.com/tmds/Tmds.LinuxAsync, together with other experiments from #14304 . We hope to see some numbers soon. After that we can think about the productization of the changes.

Is it possible to dupe-close one of these two issues, so that there is one main tracking issue?
https://github.com/dotnet/runtime/issues/12650

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matty-hall picture matty-hall  路  3Comments

btecu picture btecu  路  3Comments

chunseoklee picture chunseoklee  路  3Comments

omajid picture omajid  路  3Comments

nalywa picture nalywa  路  3Comments