This proposal eliminates allocations for connectionless use of Socket
. It augments the SocketAddress
class to allow reuse across operations, becoming a high-perf alternative to EndPoint
.
APIs which need to translate between IPEndPoint
and native sockaddr
structures are performing a large amount of defensive copying and layering workarounds.
This affects UDP performance and contributes to excessive GC. A simple example is:
```c#
Socket socket = ...;
byte[] buffer = ...;
var remoteEndPoint = new IPEndPoint(IPAddress.Any, 0);
socket.ReceiveFrom(buffer, 0, buffer.Length, SocketFlags.None, ref remoteEndPoint);
socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteEndPoint);
These two calls allocate 12 times:
* 3x `IPEndPoint`
* 3x `IPAddress`
* 3x `SocketAddress`
* 3x `byte[]` ...6x if IPv6
See also: dotnet/runtime#30196
New usage has 0 allocations:
```c#
Socket socket = ...;
byte[] buffer = ...;
var remoteAddress = new SocketAddress(AddressFamily.InterNetwork);
socket.ReceiveFrom(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);
socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);
```c#
class Socket
{
public int ReceiveFrom(Span
public int SendTo(ReadOnlySpan
}
class SocketTaskExtensions
{
public static ValueTask
public static ValueTask
}
class SocketAsyncEventArgs
{
// Only one of RemoteEndPoint or RemoteAddress must be specified.
public SocketAddress RemoteAddress { get; set; }
}
class SocketAddress
{
// If we can merge System.Net.Primitives and System.Net.Sockets, these two methods are unnecessary. That would be ideal.
public static void GetBuffer(SocketAddress address, out byte[] buffer);
public static void SetSockaddrSize(SocketAddress address, int size);
}
class EndPoint
{
public virtual void SerializeTo(SocketAddress socketAddress); // Already has "SocketAddress Serialize()"; default would be to call that and copy.
}
```
SocketAddress
as a dictionary key to lookup client state, to avoid first converting to EndPoint
.SocketAddress
. This duty continues to be delegated to EndPoint
.SocketAddress
into an EndPoint
, they can already use EndPoint.Create
.SocketAddress
is currently duplicated in System.Net.Primitives and System.Net.Sockets to avoid exposing its internal buffer. This change will allow avoiding duplication.SocketAddress
until the I/O is finished.EndPoint
as we can take defensive copies before methods return.SendTo
and SendToAsync
IPv4/IPv6 with some special casing, so this API would primarily be to optimize ReceiveFrom
variants as well as (less important) allowing non-IPv4/IPv6 protocols to benefit. Still, if we were to add an API for ReceiveFrom
we would probably want an API on SendTo
for symmetry.SocketAddress
class in multiple assemblies to avoid exposing its buffer, and have a step to marshal (byte-by-byte) between the two implementations.EndPoint
, it's a nice abstraction that we wanted here despite performance implications.socketAddress
is written to by ReceiveFrom
. Is there a better way we can indicate this?SocketAddress
exist purely because System.Net.Sockets needs access to internals in System.Net.Primitives. Any thoughts on how to avoid exposing these "pubternal" bits?ReceiveFrom
without making any API changes. It's not a perfect solution but might be good enough.There are two additional issues to update our APIs with ValueTask
/Span
/Memory
that this will need to be consistent with:
Socket
class dotnet/runtime#938 UdpClient
class dotnet/runtime#864 If the core issue is repeated conversions from EndPoint to SocketAddress, why not cache it on the EndPoint instead?
And aren't some of the allocations you mention a side-effect of what used to be one implementation being split between multiple assemblies? If that split is resulting in a perf impact, we should consider moving types into the same assembly to avoid the overhead, rather than adding new APIs.
Also, we separately want ValueTask/Memory-based APIs for UDP. Are you proposing that these be them? Would we not also have EndPoint-based APIs?
This feels to me like it's breaking an abstraction that shouldn't be broken, and we should have exhausted every other avenue before doing so.
If the core issue is repeated conversions from EndPoint to SocketAddress, why not cache it on the EndPoint instead?
Yes, I considered this option. We've been careful to make defensive copies when using EndPoint
, which indicates to me that this could be a subtly breaking behavior change if sendto
was suddenly not doing this. Also, given that EndPoint
has Serialize
and Create
to enable EndPoint
/ sockaddr
translation, it looks like a decision was made to explicitly not bake things into a single class -- I'd love some background knowledge on this if you've got it.
And aren't some of the allocations you mention a side-effect of what used to be one implementation being split between multiple assemblies? If that split is resulting in a perf impact, we should consider moving types into the same assembly to avoid the overhead, rather than adding new APIs.
Agreed, this is one of my open questions. I don't really see a downside to merging at least System.Net.Primitives and System.Net.Sockets, but I don't have the background on why they were split in the first place. If you have any knowledge here it would be helpful. Doing so would help us cut out many of these allocations.
Also, we separately want ValueTask/Memory-based APIs for UDP. Are you proposing that these be them?
No, that is not part of this proposal. We should of course make both proposals consistent if it moves forward.
This feels to me like it's breaking an abstraction that shouldn't be broken, and we should have exhausted every other avenue before doing so.
Agreed, this is one of my open questions. Hoping to see some collaboration on those potential avenues.
Related:
EndPoint
dotnet/runtime#938.UdpClient
dotnet/runtime#864User may not care about port but you alway have to specify one for sending.
c#
socket.SendTo(buffer, 0, buffer.Length, SocketFlags.None, remoteAddress);
will not work unless you use Connect() before. And when you do, you do not need the remoteAddress/remoteEndPoint at all. Did I miss something?
If you have any knowledge here it would be helpful.
I think there are two historical things going on here. @stephentoub may have more context.
(1) These APIs were originally designed a long time ago and at the time, minimizing allocations was not a major priority, unfortunately.
(2) During the assembly refactoring for .NET Core, some classes that were originally all in the same assembly got moved into separate assemblies. In some cases this has led to unfortunate code duplication and/or inefficiencies introduced to avoid private access to certain classes.
We can and should fix these issues. In other words, there's some bad legacy here we can improve on, I think.
It seems to me that we need to consider the send case and the receive case separately here.
On the send side, there's no reason we need to do any copies at all. All we need to do is produce the native sockaddr bytes when we do the native call. Currently, that means we need to call IPEndPoint.Serialize which will allocate a SocketAddress etc. But if we just had something like IPEndPoint.Serialize(Span
On the receive side, we ultimately need to return the endpoint info somehow. In your proposal we're mutating an existing SocketAddress, so you are correct that this API does not allocate; but presumably every user of this API will convert the SocketAddress into an IPEndPoint, which will cause allocation at that point, right? I'm not sure what to do about this, but if the goal is truly zero allocation then we need to consider the common usage, not just the API itself.
User may not care about port but you alway have to specify one for sending.
@wfurt SocketAddress
is an entire sockaddr_in
/etc. structure, it includes a port.
presumably every user of this API will convert the SocketAddress into an IPEndPoint, which will cause allocation at that point, right?
@geoffkizer I don't think this needs to be a common use. Really you'd use the address to lookup a client's state in a dictionary -- you don't need an EndPoint
to do this, but can instead use SocketAddress
directly to avoid any conversions. Outside of this, the main reason I can think of to want ip/port is for display purposes... in which case, yes, you'd need to convert to IPEndPoint
to get.
ok. that would make sense. I look at .Net API and I did not see port there. Would you envision adding Port property to it as well?
Would you envision adding Port property to it as well?
Note it doesn't have an IP property either.
My vision is to keep SocketAddress
opaque, and force users to use EndPoint.Serialize
and EndPoint.Create
to shuffle to/from EndPoint
if they need to manipulate the IP/port. My expectation is that this would not be done once per op, but rather once per endpoint at the start of a "connection", so it's not needed.
if we just had something like IPEndPoint.Serialize(Span sockAddrBuffer)
@geoffkizer yea, I thought of that. I think keeping the APIs symmetrical is worthwhile, though. Plus, there's some (probably ultimately inconsequential, but...) benefit to avoiding a SerializeTo(span)
overhead on every call to SendTo
. Less important now if we end up taking another direction with QUIC, but that's a fair amount of redundant calls for such a packet-heavy thing.
A couple related questions that I don't quite understand:
(1) Why does SocketAddress even exist today? Are there any APIs that take it? Is it just so a user can get the bytes for the native sockaddr structure? When would they even need to do this?
(2) Why is the IPEndPoint on ReceiveFrom passed by ref, instead of out?
Really you'd use the address to lookup a client's state in a dictionary -- you don't need an EndPoint to do this, but can instead use SocketAddress directly to avoid any conversions.
Yeah, that's a good point.
I think keeping the APIs symmetrical is worthwhile, though.
I agree but I'm not quite sure the best way to do this.
(1) Why does SocketAddress even exist today? Are there any APIs that take it? Is it just so a user can get the bytes for the native sockaddr structure? When would they even need to do this?
I think it is to allow custom implementations of EndPoint
to serialize random non-IP sockaddr
without 1st party support.
@scalablecory Yeah that makes sense.
(2) Why is the IPEndPoint on ReceiveFrom passed by ref, instead of out?
Looks like it's to work within EndPoint
's design:
sockaddr
.EndPoint.Create
(instance method, that knows how to deserialize for a specific sockaddr
format) to create a new EndPoint
based on the sockaddr
you get from recvfrom
.Ok, but you already had to bind the socket to a LocalEndPoint, right? So why can't you figure that out from the LocalEndPoint?
Good point. I can't account for that.
Perhaps just an oversight in the design, or LocalEndPoint
was made after ReceiveFrom
was made.
Has it been measured what performance impact these allocations have? I wonder if it matters with real network communication, even just from a CPU usage standpoint.
@GSPP unfortunately we only have synthetic benchmarks to look at right now. Having good real-world benchmarks for Sockets is a long-term goal. Realistically, high-frequency protocols like QUIC, uTP, and latency-sensitive games (Unity engine) will see excessive gen-0 GC.
@GSPP Here https://github.com/dotnet/corefx/issues/39317 I said that in our game server 40% of our allocations are from from UDP Socket. It is a disaster for us because we use GC.TryStartNoGCRegion and maintenance mode.
@scalablecory I do think this is the right general approach. That is, have APIs that take/return a "sockaddr"-like argument.
I think my main concern here is that the way SocketAddress works is a little weird. One example (that you pointed out) is it's not obvious that ReceiveFrom would fill in the provided SocketAddress. But I'm not sure what to do about this.
One other thought.
It seems like there's some low-hanging fruit here in terms of allocation that we could improve without any new actual API. This would help issues like dotnet/runtime#30196. Seems like we should try to make this better first before we introduce new API; that will provide benefit to existing customers.
@geoffkizer thanks, I appreciate the thought you're putting into this.
One example (that you pointed out) is it's not obvious that ReceiveFrom would fill in the provided SocketAddress. But I'm not sure what to do about this.
Yea I'm not sure either. I wonder if we have any similar APIs we can lean on for inspiration... I'll have to go looking. @bartonjs can you lend us your API design muscle? If we have these two APIs, can you think of a clean way to help us distinguish between SendTo
which only reads a SocketAddress
, and ReceiveFrom
which writes to it?
c#
public int ReceiveFrom(Span<byte> buffer, SocketFlags socketFlags, SocketAddress socketAddress);
public int SendTo(ReadOnlySpan<byte> buffer, SocketFlags socketFlags, SocketAddress socketAddress);
It seems like there's some low-hanging fruit here in terms of allocation that we could improve without any new actual API. This would help issues like dotnet/runtime#30196. Seems like we should try to make this better first before we introduce new API; that will provide benefit to existing customers.
I've created dotnet/corefx#41039 to merge Primitives and Sockets, which would allow us to get rid of most of these allocations without an API change (and without using the InternalsVisibleTo
escape hatch).
Lets decide on that one before moving forward with this one.
Can you help me understand why merging the assemblies helps?
Just looking at the SendTo code....
(1) We always make a "snapshot", i.e. copy, of the IPEndPoint -- which means a copy of the IPAddress as well. We only actually use the snapshot when _rightEndPoint is null, which I believe is uncommon. (I'm actually kind of unclear why we need _rightEndPoint at all here, but let's assume we do.) So at the very least we could just avoid the copy unless _rightEndPoint is null. (There's also some weirdness around dual mode that I'm going to ignore for now.)
(2) We then create an Internals.SocketAddress from the endpoint. But it looks like we special case IPEndPoint here already so we don't need to create a "real" SocketAddress first, anyway. So it doesn't seem like assembly merging actually helps here.
(3) We can and should avoid allocating an Internals.SocketAddress here entirely by reworking the code to generate the appropriate native sockaddr directly on the stack. This could work a couple different ways but it seems doable.
So I think for SendTo at least, we could get to 0 allocation without any API changes at all.
ReceiveFrom is harder, and as the API works it will still require at least an allocation for the returned IPEndPoint and IPAddress. But it seems like we could do better there too.
I gotta say, I really don't understand why we set _rightEndPoint in SendTo.
The main purpose of _rightEndPoint seems to be to store the local endpoint. But in SendTo, we're setting it to the remote endpoint. This seems really weird.
I gotta say, I really don't understand why we set _rightEndPoint in SendTo.
The main purpose of _rightEndPoint seems to be to store the local endpoint. But in SendTo, we're setting it to the remote endpoint. This seems really weird.
We're storing the EndPoint
into _rightEndPoint
for use in both LocalEndPoint
and RemoteEndPoint
, but don't actually use the specific address it holds. _rightEndPoint.Serialize()
is used to create a SocketAddress
, which has a buffer of the correct size to use for getsockname
and getpeername
. We can then use _rightEndPoint.Create
to deserialize that sockaddr
into a new EndPoint
.
it looks like we special case IPEndPoint here already so we don't need to create a "real" SocketAddress first
We do, inside of IPEndPointExtensions.Serialize
.
We can and should avoid allocating an Internals.SocketAddress here entirely by reworking the code to generate the appropriate native sockaddr directly on the stack. This could work a couple different ways but it seems doable.
Yes, we could do this. EndPoint
-> SocketAddress
-> stack. You still have that allocation in the middle, and if we're async we can't use stack so one more allocation. But we'd still save from having an Internals.SocketAddress
allocation.
One alternate to avoid all allocation for (sync) sendto
is to add a public EndPoint.SerializeTo(Span<byte>)
. But, I would only want to do this if we didn't go forward with a full reusable SocketAddress
support because it would be redundant at that point. It would also not help for async.
Yes, we could do this. EndPoint -> SocketAddress -> stack. You still have that allocation in the middle,
We are already special-casing IPEndPoint, so it seems to me we could just special case it directly from IPEndPoint -> stack.
if we're async we can't use stack so one more allocation.
We can still use the stack for async. It gets a little weird on Linux because we need to hold on to the IPEndPoint and generate the actual sockaddr on the stack each time we call into the OS, but I think that's fine.
We can still use the stack for async.
Hrm for some reason I had it in my mind that WSASendTo
required the address to stay alive, but it appears I'm wrong, or at least I can't find documentation supporting this. That solves one problem.
Can you help me understand why merging the assemblies helps?
For SendTo
and SendToAsync
we can add an EndPoint.SerializeTo(Span<byte>)
as an internal API to avoid allocating any SocketAddress
.
For ReceiveFrom
we can add an internal EndPoint.CreateFrom(ReadOnlySpan<byte>)
. We'd still need to allocate an EndPoint
and its innards, but it's still better than what we're currently doing.
For ReceiveFromAsync
we can write directly to the internal buffer of SocketAddress
, saving us from doing the SocketAddress
-> Internals.SocketAddress
-> SocketAddress
shuffle.
We are already special-casing IPEndPoint, so it seems to me we could just special case it directly from IPEndPoint -> stack.
Good point. We'd have to make sure nobody is doing any weird overriding of IPEndPoint.Serialize
, but It seems with SendTo
we can do without any allocation. As you say, it is only ReceiveFrom
to worry about then, and I don't think that is solvable without a new API because we always need to allocate an EndPoint
, IPAddress
, and if IPv6 a byte[]
.
Ah, I see.
Still, if those are internal, then 3rd parties can't implement them. So why is that preferable to just special casing IPEndPoint?
(edit: sorry, missed your last reply; sounds like we are in agreement here)
A central problem is that Socket
needs to create EndPoint
instances to receive UDP packets. Maybe API users could opt into a cache for those endpoint instances. Here is a raw sketch:
class EndPointCache
{
ConcurrentDictionary<sockaddr_in, EndPoint> items; //cache contents
internal EndPoint CreateEndPoint(sockaddr_in addr);
}
class Socket
{
public EndPointCache EndPointCache { get; set; } //set this property to opt in
}
Callers can create one EndPointCache
and share it across Socket
instances. This design would preserve the EndPoint
abstraction. It would not eliminate allocations. But it seems that the number of distinct endpoints in UDP communication typically is far lower than the number of packets. If this achieves a 10x reduction in allocations, those remaining allocations likely would not matter much.
I think an endpoint cache would be a great approach here, if EndPoint objects were immutable. Unfortunately, they are not, so it's not really possible to cache them.
I wish EndPoint objects were immutable because it would solve other problems too. I suppose we could introduce a whole new set of immutable EndPoint classes and APIs that take them, but that seems like a whole lot of churn.
I didn't realize (and expect) these classes to be mutable. There's too much mutable in the framework. I think the value of immutability and functional programming patterns was less well understood 20 years ago (by everyone, not just the framework designers).
I wonder how many people assign to IPEndPoint.Address
and IPEndPoint.Port
. A bit of an extreme option, but we could always make these throw and enforce the class as immutable.
If we had an opaque readonly struct, immutable, we'd need to make it ~32 bytes large for IPv6, or 128 bytes large if we want to support all protocols and match sockaddr_storage
. 32 bytes wouldn't be terrible from a copying perspective, but 128 bytes are a bit much.
I'm not strongly against it but I think it would be a shame to have three separate types -- EndPoint
, SocketAddress
, and this new ImmutableSocketAddress
to accomplish more or less the same thing.
Triage: Does not seem to be critical - Future.
@scalablecory What's required to progress this proposed API?
@macaba it will likely be required if we get a managed QUIC implementation. Otherwise, I'd love to see some good customer stories here.
I'm the architect of a virtual VHF radio simulation backend (for virtual air traffic control).
We typically get around 20k UDP packets per second (20-30Mbps).
There is a future feature upgrade coming in that will probably push this up to 50-70k UDP packets per second.
I'm looking to improve the efficiency as I want to keep the server cost increase to a minimum when packet counts go up.
We have a service acting as a video stream redirector, receiving a stream from some network interface via some transport, and retransmitting it over another, possibly changing the transport.
Currently (on .Net Framework 4.8), we are fully limited by GC throughput at around 800 Mbps worth of stream redirection bandwidth. We are getting above the 50% time paused in GC, and we can't keep up with the incoming data, introducing packet losses. We have pooled pretty much everything under our control and 95%+ of the remaining allocations are caused by Socket. In our case we currently rely on the ReceiveFromAsync and BeginSend codepaths.
I understand some work in .Net Core was done that will certainly reduce those allocations, but the big per-receive allocations are still present: SocketAddress, IPEndPoint etc. Having a better way to do high throughput transmission on UDP sockets would help us a lot.
Same thing here, we are developing a video game.
I am implementing QUIC in C# and, truly most of the allocated memory occurs during SendTo
and ReceiveFrom
methods. Having allocation free send/receive methods would really help.
Our use case - we have a managed implementation of SCTP - we've managed to reduce allocations down to the bare minimum, and the remaining allocations are almost all within Socket (which we're using in raw mode).
If you want a temporary solution for UDP sockets, here's our abstractions over portable native API.
Most helpful comment
We have a service acting as a video stream redirector, receiving a stream from some network interface via some transport, and retransmitting it over another, possibly changing the transport.
Currently (on .Net Framework 4.8), we are fully limited by GC throughput at around 800 Mbps worth of stream redirection bandwidth. We are getting above the 50% time paused in GC, and we can't keep up with the incoming data, introducing packet losses. We have pooled pretty much everything under our control and 95%+ of the remaining allocations are caused by Socket. In our case we currently rely on the ReceiveFromAsync and BeginSend codepaths.
I understand some work in .Net Core was done that will certainly reduce those allocations, but the big per-receive allocations are still present: SocketAddress, IPEndPoint etc. Having a better way to do high throughput transmission on UDP sockets would help us a lot.