Runtime: Proposal: Zero allocation connectionless sockets

Created on 9 Sep 2019  路  49Comments  路  Source: dotnet/runtime

This proposal eliminates allocations for connectionless use of Socket. It augments the SocketAddress class to allow reuse across operations, becoming a high-perf alternative to EndPoint.

Rationale and Usage

APIs which need to translate between IPEndPoint and native sockaddr structures are performing a large amount of defensive copying and layering workarounds.

This affects UDP performance and contributes to excessive GC. A simple example is:

```c#
Socket socket = ...;
byte[] buffer = ...;
var remoteEndPoint = new IPEndPoint(IPAddress.Any, 0);

socket.ReceiveFrom(buffer, 0, buffer.Length, SocketFlags.None, ref remoteEndPoint);
socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteEndPoint);

These two calls allocate 12 times:
* 3x `IPEndPoint`
* 3x `IPAddress`
* 3x `SocketAddress`
* 3x `byte[]` ...6x if IPv6

See also: dotnet/runtime#30196 

New usage has 0 allocations:

```c#
Socket socket = ...;
byte[] buffer = ...;
var remoteAddress = new SocketAddress(AddressFamily.InterNetwork);

socket.ReceiveFrom(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);
socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);

Proposed API

```c#
class Socket
{
public int ReceiveFrom(Span buffer, SocketFlags socketFlags, SocketAddress socketAddress);
public int SendTo(ReadOnlySpan buffer, SocketFlags socketFlags, SocketAddress socketAddress);
}

class SocketTaskExtensions
{
public static ValueTask ReceiveFromAsync(this Socket socket, Memory buffer, SocketFlags socketFlags, SocketAddress socketAddress);
public static ValueTask SendToAsync(this Socket socket, ReadOnlyMemory buffer, SocketFlags socketFlags, SocketAddress socketAddress);
}

class SocketAsyncEventArgs
{
// Only one of RemoteEndPoint or RemoteAddress must be specified.
public SocketAddress RemoteAddress { get; set; }
}

class SocketAddress
{
// If we can merge System.Net.Primitives and System.Net.Sockets, these two methods are unnecessary. That would be ideal.
public static void GetBuffer(SocketAddress address, out byte[] buffer);
public static void SetSockaddrSize(SocketAddress address, int size);
}

class EndPoint
{
public virtual void SerializeTo(SocketAddress socketAddress); // Already has "SocketAddress Serialize()"; default would be to call that and copy.
}
```

Details

  • It is intended that UDP servers will use SocketAddress as a dictionary key to lookup client state, to avoid first converting to EndPoint.

    • It is assumed that users will only rarely care to actually get the IP/port/etc. from the SocketAddress. This duty continues to be delegated to EndPoint.

    • If users ever want to deserialize a SocketAddress into an EndPoint, they can already use EndPoint.Create.

    • Need to ensure only the actual sockaddr structure is compared/hashed, not the entire byte buffer.

  • SocketAddress is currently duplicated in System.Net.Primitives and System.Net.Sockets to avoid exposing its internal buffer. This change will allow avoiding duplication.
  • This relies on users not using a SocketAddress until the I/O is finished.

    • This is a bit safer with EndPoint as we can take defensive copies before methods return.

  • We can currently do some optimizations to avoid all allocations for SendTo and SendToAsync IPv4/IPv6 with some special casing, so this API would primarily be to optimize ReceiveFrom variants as well as (less important) allowing non-IPv4/IPv6 protocols to benefit. Still, if we were to add an API for ReceiveFrom we would probably want an API on SendTo for symmetry.

Open Questions

  • We've put some effort into not doing something like this before. It would be great to understand why. Currently:

    • We duplicate the SocketAddress class in multiple assemblies to avoid exposing its buffer, and have a step to marshal (byte-by-byte) between the two implementations.

    • Tons of APIs take EndPoint, it's a nice abstraction that we wanted here despite performance implications.

  • It isn't immediately obvious from the API surface that socketAddress is written to by ReceiveFrom. Is there a better way we can indicate this?
  • The two new methods on SocketAddress exist purely because System.Net.Sockets needs access to internals in System.Net.Primitives. Any thoughts on how to avoid exposing these "pubternal" bits?

    • One option is to merge System.Net.Primitives and System.Net.Sockets; I don't see harm in this but that is a much larger discussion :)

  • If we merge the Primitives and Sockets assemblies, we can get rid of some of the allocations for ReceiveFrom without making any API changes. It's not a perfect solution but might be good enough.

Related Issues

There are two additional issues to update our APIs with ValueTask/Span/Memory that this will need to be consistent with:

  • Update Socket class dotnet/runtime#938
  • Update UdpClient class dotnet/runtime#864
api-suggestion area-System.Net.Sockets

Most helpful comment

We have a service acting as a video stream redirector, receiving a stream from some network interface via some transport, and retransmitting it over another, possibly changing the transport.

Currently (on .Net Framework 4.8), we are fully limited by GC throughput at around 800 Mbps worth of stream redirection bandwidth. We are getting above the 50% time paused in GC, and we can't keep up with the incoming data, introducing packet losses. We have pooled pretty much everything under our control and 95%+ of the remaining allocations are caused by Socket. In our case we currently rely on the ReceiveFromAsync and BeginSend codepaths.

I understand some work in .Net Core was done that will certainly reduce those allocations, but the big per-receive allocations are still present: SocketAddress, IPEndPoint etc. Having a better way to do high throughput transmission on UDP sockets would help us a lot.

All 49 comments

If the core issue is repeated conversions from EndPoint to SocketAddress, why not cache it on the EndPoint instead?

And aren't some of the allocations you mention a side-effect of what used to be one implementation being split between multiple assemblies? If that split is resulting in a perf impact, we should consider moving types into the same assembly to avoid the overhead, rather than adding new APIs.

Also, we separately want ValueTask/Memory-based APIs for UDP. Are you proposing that these be them? Would we not also have EndPoint-based APIs?

This feels to me like it's breaking an abstraction that shouldn't be broken, and we should have exhausted every other avenue before doing so.

If the core issue is repeated conversions from EndPoint to SocketAddress, why not cache it on the EndPoint instead?

Yes, I considered this option. We've been careful to make defensive copies when using EndPoint, which indicates to me that this could be a subtly breaking behavior change if sendto was suddenly not doing this. Also, given that EndPoint has Serialize and Create to enable EndPoint / sockaddr translation, it looks like a decision was made to explicitly not bake things into a single class -- I'd love some background knowledge on this if you've got it.

And aren't some of the allocations you mention a side-effect of what used to be one implementation being split between multiple assemblies? If that split is resulting in a perf impact, we should consider moving types into the same assembly to avoid the overhead, rather than adding new APIs.

Agreed, this is one of my open questions. I don't really see a downside to merging at least System.Net.Primitives and System.Net.Sockets, but I don't have the background on why they were split in the first place. If you have any knowledge here it would be helpful. Doing so would help us cut out many of these allocations.

Also, we separately want ValueTask/Memory-based APIs for UDP. Are you proposing that these be them?

No, that is not part of this proposal. We should of course make both proposals consistent if it moves forward.

This feels to me like it's breaking an abstraction that shouldn't be broken, and we should have exhausted every other avenue before doing so.

Agreed, this is one of my open questions. Hoping to see some collaboration on those potential avenues.

Related:

  • span-based UDP using EndPoint dotnet/runtime#938.
  • span-based UdpClient dotnet/runtime#864

User may not care about port but you alway have to specify one for sending.
c# socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);
will not work unless you use Connect() before. And when you do, you do not need the remoteAddress/remoteEndPoint at all. Did I miss something?

If you have any knowledge here it would be helpful.

I think there are two historical things going on here. @stephentoub may have more context.

(1) These APIs were originally designed a long time ago and at the time, minimizing allocations was not a major priority, unfortunately.
(2) During the assembly refactoring for .NET Core, some classes that were originally all in the same assembly got moved into separate assemblies. In some cases this has led to unfortunate code duplication and/or inefficiencies introduced to avoid private access to certain classes.

We can and should fix these issues. In other words, there's some bad legacy here we can improve on, I think.

It seems to me that we need to consider the send case and the receive case separately here.

On the send side, there's no reason we need to do any copies at all. All we need to do is produce the native sockaddr bytes when we do the native call. Currently, that means we need to call IPEndPoint.Serialize which will allocate a SocketAddress etc. But if we just had something like IPEndPoint.Serialize(Span sockAddrBuffer) then I think we could avoid allocation in this path entirely without changing the top-level API. I'm glossing over details here so please correct me if this is more complicated than I'm describing

On the receive side, we ultimately need to return the endpoint info somehow. In your proposal we're mutating an existing SocketAddress, so you are correct that this API does not allocate; but presumably every user of this API will convert the SocketAddress into an IPEndPoint, which will cause allocation at that point, right? I'm not sure what to do about this, but if the goal is truly zero allocation then we need to consider the common usage, not just the API itself.

User may not care about port but you alway have to specify one for sending.

@wfurt SocketAddress is an entire sockaddr_in/etc. structure, it includes a port.

presumably every user of this API will convert the SocketAddress into an IPEndPoint, which will cause allocation at that point, right?

@geoffkizer I don't think this needs to be a common use. Really you'd use the address to lookup a client's state in a dictionary -- you don't need an EndPoint to do this, but can instead use SocketAddress directly to avoid any conversions. Outside of this, the main reason I can think of to want ip/port is for display purposes... in which case, yes, you'd need to convert to IPEndPoint to get.

ok. that would make sense. I look at .Net API and I did not see port there. Would you envision adding Port property to it as well?

Would you envision adding Port property to it as well?

Note it doesn't have an IP property either.

My vision is to keep SocketAddress opaque, and force users to use EndPoint.Serialize and EndPoint.Create to shuffle to/from EndPoint if they need to manipulate the IP/port. My expectation is that this would not be done once per op, but rather once per endpoint at the start of a "connection", so it's not needed.

if we just had something like IPEndPoint.Serialize(Span sockAddrBuffer)

@geoffkizer yea, I thought of that. I think keeping the APIs symmetrical is worthwhile, though. Plus, there's some (probably ultimately inconsequential, but...) benefit to avoiding a SerializeTo(span) overhead on every call to SendTo. Less important now if we end up taking another direction with QUIC, but that's a fair amount of redundant calls for such a packet-heavy thing.

A couple related questions that I don't quite understand:

(1) Why does SocketAddress even exist today? Are there any APIs that take it? Is it just so a user can get the bytes for the native sockaddr structure? When would they even need to do this?
(2) Why is the IPEndPoint on ReceiveFrom passed by ref, instead of out?

Really you'd use the address to lookup a client's state in a dictionary -- you don't need an EndPoint to do this, but can instead use SocketAddress directly to avoid any conversions.

Yeah, that's a good point.

I think keeping the APIs symmetrical is worthwhile, though.

I agree but I'm not quite sure the best way to do this.

(1) Why does SocketAddress even exist today? Are there any APIs that take it? Is it just so a user can get the bytes for the native sockaddr structure? When would they even need to do this?

I think it is to allow custom implementations of EndPoint to serialize random non-IP sockaddr without 1st party support.

@scalablecory Yeah that makes sense.

(2) Why is the IPEndPoint on ReceiveFrom passed by ref, instead of out?

Looks like it's to work within EndPoint's design:

  • To know how much to allocate for a sockaddr.
  • To then call EndPoint.Create (instance method, that knows how to deserialize for a specific sockaddr format) to create a new EndPoint based on the sockaddr you get from recvfrom.

Ok, but you already had to bind the socket to a LocalEndPoint, right? So why can't you figure that out from the LocalEndPoint?

Good point. I can't account for that.

Perhaps just an oversight in the design, or LocalEndPoint was made after ReceiveFrom was made.

Has it been measured what performance impact these allocations have? I wonder if it matters with real network communication, even just from a CPU usage standpoint.

@GSPP unfortunately we only have synthetic benchmarks to look at right now. Having good real-world benchmarks for Sockets is a long-term goal. Realistically, high-frequency protocols like QUIC, uTP, and latency-sensitive games (Unity engine) will see excessive gen-0 GC.

@GSPP Here https://github.com/dotnet/corefx/issues/39317 I said that in our game server 40% of our allocations are from from UDP Socket. It is a disaster for us because we use GC.TryStartNoGCRegion and maintenance mode.

@scalablecory I do think this is the right general approach. That is, have APIs that take/return a "sockaddr"-like argument.

I think my main concern here is that the way SocketAddress works is a little weird. One example (that you pointed out) is it's not obvious that ReceiveFrom would fill in the provided SocketAddress. But I'm not sure what to do about this.

One other thought.

It seems like there's some low-hanging fruit here in terms of allocation that we could improve without any new actual API. This would help issues like dotnet/runtime#30196. Seems like we should try to make this better first before we introduce new API; that will provide benefit to existing customers.

@geoffkizer thanks, I appreciate the thought you're putting into this.

One example (that you pointed out) is it's not obvious that ReceiveFrom would fill in the provided SocketAddress. But I'm not sure what to do about this.

Yea I'm not sure either. I wonder if we have any similar APIs we can lean on for inspiration... I'll have to go looking. @bartonjs can you lend us your API design muscle? If we have these two APIs, can you think of a clean way to help us distinguish between SendTo which only reads a SocketAddress, and ReceiveFrom which writes to it?

c# public int ReceiveFrom(Span<byte> buffer, SocketFlags socketFlags, SocketAddress socketAddress); public int SendTo(ReadOnlySpan<byte> buffer, SocketFlags socketFlags, SocketAddress socketAddress);

It seems like there's some low-hanging fruit here in terms of allocation that we could improve without any new actual API. This would help issues like dotnet/runtime#30196. Seems like we should try to make this better first before we introduce new API; that will provide benefit to existing customers.

I've created dotnet/corefx#41039 to merge Primitives and Sockets, which would allow us to get rid of most of these allocations without an API change (and without using the InternalsVisibleTo escape hatch).

Lets decide on that one before moving forward with this one.

Can you help me understand why merging the assemblies helps?

Just looking at the SendTo code....

(1) We always make a "snapshot", i.e. copy, of the IPEndPoint -- which means a copy of the IPAddress as well. We only actually use the snapshot when _rightEndPoint is null, which I believe is uncommon. (I'm actually kind of unclear why we need _rightEndPoint at all here, but let's assume we do.) So at the very least we could just avoid the copy unless _rightEndPoint is null. (There's also some weirdness around dual mode that I'm going to ignore for now.)
(2) We then create an Internals.SocketAddress from the endpoint. But it looks like we special case IPEndPoint here already so we don't need to create a "real" SocketAddress first, anyway. So it doesn't seem like assembly merging actually helps here.
(3) We can and should avoid allocating an Internals.SocketAddress here entirely by reworking the code to generate the appropriate native sockaddr directly on the stack. This could work a couple different ways but it seems doable.

So I think for SendTo at least, we could get to 0 allocation without any API changes at all.

ReceiveFrom is harder, and as the API works it will still require at least an allocation for the returned IPEndPoint and IPAddress. But it seems like we could do better there too.

I gotta say, I really don't understand why we set _rightEndPoint in SendTo.

The main purpose of _rightEndPoint seems to be to store the local endpoint. But in SendTo, we're setting it to the remote endpoint. This seems really weird.

I gotta say, I really don't understand why we set _rightEndPoint in SendTo.

The main purpose of _rightEndPoint seems to be to store the local endpoint. But in SendTo, we're setting it to the remote endpoint. This seems really weird.

We're storing the EndPoint into _rightEndPoint for use in both LocalEndPoint and RemoteEndPoint, but don't actually use the specific address it holds. _rightEndPoint.Serialize() is used to create a SocketAddress, which has a buffer of the correct size to use for getsockname and getpeername. We can then use _rightEndPoint.Create to deserialize that sockaddr into a new EndPoint.

it looks like we special case IPEndPoint here already so we don't need to create a "real" SocketAddress first

We do, inside of IPEndPointExtensions.Serialize.

We can and should avoid allocating an Internals.SocketAddress here entirely by reworking the code to generate the appropriate native sockaddr directly on the stack. This could work a couple different ways but it seems doable.

Yes, we could do this. EndPoint -> SocketAddress -> stack. You still have that allocation in the middle, and if we're async we can't use stack so one more allocation. But we'd still save from having an Internals.SocketAddress allocation.

One alternate to avoid all allocation for (sync) sendto is to add a public EndPoint.SerializeTo(Span<byte>). But, I would only want to do this if we didn't go forward with a full reusable SocketAddress support because it would be redundant at that point. It would also not help for async.

Yes, we could do this. EndPoint -> SocketAddress -> stack. You still have that allocation in the middle,

We are already special-casing IPEndPoint, so it seems to me we could just special case it directly from IPEndPoint -> stack.

if we're async we can't use stack so one more allocation.

We can still use the stack for async. It gets a little weird on Linux because we need to hold on to the IPEndPoint and generate the actual sockaddr on the stack each time we call into the OS, but I think that's fine.

We can still use the stack for async.

Hrm for some reason I had it in my mind that WSASendTo required the address to stay alive, but it appears I'm wrong, or at least I can't find documentation supporting this. That solves one problem.

Can you help me understand why merging the assemblies helps?

For SendTo and SendToAsync we can add an EndPoint.SerializeTo(Span<byte>) as an internal API to avoid allocating any SocketAddress.

For ReceiveFrom we can add an internal EndPoint.CreateFrom(ReadOnlySpan<byte>). We'd still need to allocate an EndPoint and its innards, but it's still better than what we're currently doing.

For ReceiveFromAsync we can write directly to the internal buffer of SocketAddress, saving us from doing the SocketAddress -> Internals.SocketAddress -> SocketAddress shuffle.

We are already special-casing IPEndPoint, so it seems to me we could just special case it directly from IPEndPoint -> stack.

Good point. We'd have to make sure nobody is doing any weird overriding of IPEndPoint.Serialize, but It seems with SendTo we can do without any allocation. As you say, it is only ReceiveFrom to worry about then, and I don't think that is solvable without a new API because we always need to allocate an EndPoint, IPAddress, and if IPv6 a byte[].

Ah, I see.

Still, if those are internal, then 3rd parties can't implement them. So why is that preferable to just special casing IPEndPoint?

(edit: sorry, missed your last reply; sounds like we are in agreement here)

A central problem is that Socket needs to create EndPoint instances to receive UDP packets. Maybe API users could opt into a cache for those endpoint instances. Here is a raw sketch:

class EndPointCache
{
    ConcurrentDictionary<sockaddr_in, EndPoint> items; //cache contents

    internal EndPoint CreateEndPoint(sockaddr_in addr);
}

class Socket
{
    public EndPointCache EndPointCache { get; set; } //set this property to opt in
}

Callers can create one EndPointCache and share it across Socket instances. This design would preserve the EndPoint abstraction. It would not eliminate allocations. But it seems that the number of distinct endpoints in UDP communication typically is far lower than the number of packets. If this achieves a 10x reduction in allocations, those remaining allocations likely would not matter much.

I think an endpoint cache would be a great approach here, if EndPoint objects were immutable. Unfortunately, they are not, so it's not really possible to cache them.

I wish EndPoint objects were immutable because it would solve other problems too. I suppose we could introduce a whole new set of immutable EndPoint classes and APIs that take them, but that seems like a whole lot of churn.

I didn't realize (and expect) these classes to be mutable. There's too much mutable in the framework. I think the value of immutability and functional programming patterns was less well understood 20 years ago (by everyone, not just the framework designers).

I wonder how many people assign to IPEndPoint.Address and IPEndPoint.Port. A bit of an extreme option, but we could always make these throw and enforce the class as immutable.

If we had an opaque readonly struct, immutable, we'd need to make it ~32 bytes large for IPv6, or 128 bytes large if we want to support all protocols and match sockaddr_storage. 32 bytes wouldn't be terrible from a copying perspective, but 128 bytes are a bit much.

I'm not strongly against it but I think it would be a shame to have three separate types -- EndPoint, SocketAddress, and this new ImmutableSocketAddress to accomplish more or less the same thing.

Triage: Does not seem to be critical - Future.

@scalablecory What's required to progress this proposed API?

@macaba it will likely be required if we get a managed QUIC implementation. Otherwise, I'd love to see some good customer stories here.

I'm the architect of a virtual VHF radio simulation backend (for virtual air traffic control).

We typically get around 20k UDP packets per second (20-30Mbps).

There is a future feature upgrade coming in that will probably push this up to 50-70k UDP packets per second.

I'm looking to improve the efficiency as I want to keep the server cost increase to a minimum when packet counts go up.

We have a service acting as a video stream redirector, receiving a stream from some network interface via some transport, and retransmitting it over another, possibly changing the transport.

Currently (on .Net Framework 4.8), we are fully limited by GC throughput at around 800 Mbps worth of stream redirection bandwidth. We are getting above the 50% time paused in GC, and we can't keep up with the incoming data, introducing packet losses. We have pooled pretty much everything under our control and 95%+ of the remaining allocations are caused by Socket. In our case we currently rely on the ReceiveFromAsync and BeginSend codepaths.

I understand some work in .Net Core was done that will certainly reduce those allocations, but the big per-receive allocations are still present: SocketAddress, IPEndPoint etc. Having a better way to do high throughput transmission on UDP sockets would help us a lot.

Same thing here, we are developing a video game.

I am implementing QUIC in C# and, truly most of the allocated memory occurs during SendTo and ReceiveFrom methods. Having allocation free send/receive methods would really help.

Our use case - we have a managed implementation of SCTP - we've managed to reduce allocations down to the bare minimum, and the remaining allocations are almost all within Socket (which we're using in raw mode).

If you want a temporary solution for UDP sockets, here's our abstractions over portable native API.

Was this page helpful?
0 / 5 - 0 ratings