Runtime: Binding with ReuseAddress not working with UdpClient on Linux

Created on 30 Aug 2018  路  40Comments  路  Source: dotnet/runtime

_From @olijf on August 30, 2018 11:13_

Binding multiple clients on Linux platform in dotnet framework 2.1 does not work as expected

I am binding to a socket with SO_REUSEADDR (MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);) but on Linux this gives me an address already in use exception.
In dotnet core 2.0 this was working fine.

General

I have the following relevant piece of code:
```c#
...
MultiCastClient = new UdpClient();

MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
var EndPoint = new IPEndPoint(IPAddress.Any, _listenPort);
MultiCastClient.JoinMulticastGroup(IPAddress.Parse(multicastAddress));

MultiCastClient.Client.Bind(EndPoint); // <--- this is where the bind exception happens.

try
{
MultiCastClient.BeginReceive(RecieveCallBack, null);
}
...

I have a project targeting netcoreapp2.0 

When I am running this with ``dotnet-hosting-2.0.8`` everything is fine. However when I am running this with the newer CLR ``aspnetcore-runtime-2.1`` (all on Debian 9) I am getting a bind exception: 

Application startup exception: System.Net.Sockets.SocketException (98): Address already in use
at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Bind(EndPoint localEP)
at UDPNMEAMessageReciever.UdpMessageProcessor.Start()
...
```

I havent looked into it much further, but I would like to be able to use the newer CLR.
Thanks for helping me out here, I really appreciate your efforts.

_Copied from original issue: dotnet/coreclr#19765_

area-System.Net.Sockets os-linux tenet-compatibility

Most helpful comment

@karelz - I was able to copy over our parts of the UdpClient issue, but not the ones from the others having he same problem.

All 40 comments

This may be caused by changes in https://github.com/dotnet/corefx/pull/24809.

With 2.0 ReuseAddress did:

[pid  5814] setsockopt(23, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid  5814] setsockopt(23, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0

With 2.1 it does:

[pid  5921] setsockopt(24, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0

I'm trying to reproduce this, but I'm missing something. I can run two instances of this program concurrently:

static void Main(string[] args)
{
    UdpClient MultiCastClient = new UdpClient();
    MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
    var EndPoint = new IPEndPoint(IPAddress.Any, 5000);
    MultiCastClient.JoinMulticastGroup(IPAddress.Parse("239.0.0.1"));
    Console.Read();
}

@olijf does this happen with two instances of your own program? Or is another program also using the port?

Hi Tom,
Tnx for looking into this. I'm running socat and another java client all on the same binding.
I haven't tried running multiple instances but will try tomorrow.
Hope this helps

I can reproduce this. The issue occurs when two applications each use a different option: one does SO_REUSEADDR and the other SO_REUSEPORT.

using System;
using System.Net;
using System.Net.Sockets;
using System.Runtime.InteropServices;

namespace console
{
    class Program
    {
        static unsafe void Main(string[] args)
        {
            bool reuseAddr = args.Length > 0;
            Socket s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
            if (reuseAddr)
            {
                System.Console.WriteLine("reuse address");
                int value = 1;
                setsockopt(s.Handle.ToInt32(), 1, 2, &value, sizeof(int));
            }
            else
            {
                System.Console.WriteLine("reuse port");
                s.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
            }
            s.Bind(new IPEndPoint(IPAddress.Parse("0.0.0.0"), 5000));
            Console.Read();
        }


        [DllImport("libc", SetLastError = true)]
        private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);
    }
}

The fix will be to change back to changing both options for SocketOptionName.ReuseAddress.

@davidsh you can assign this to me.

Hi @tmds ,
I see you've already figured out how to reproduce this and have pushed a fix to mitigate my issue. Can you give me an indication when I can expect this into the regular release? I'm very happy with how fast you've resolved this. Thank you.

@karelz when the PR is merged on master, will it become part of the 2.2 release? Can we consider this for 2.1?

@olijf by adding the setsockopt(s.Handle.ToInt32(), 1, 2, &value, sizeof(int)); you should be able to unblock yourself. Please give it a try, an confirm that resolves the issue.

peudo code:

UdpClient MultiCastClient = new UdpClient();
MultiCastClient.Client.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

@tmds Your workaround works fine in one scenario but in another one i'm still getting the bind exception. I'm still investigation why this happens.

@olijf When setting both SO_REUSEADDR (via setsockopt) and SO_REUSEPORT (via SocketOptionName.ReuseAddress), I only got a bind exception when a previous socket was bound that didn't set any option.

@tmds master flows into 3.0. 2.2 is almost-servicing bar. Can you sum up the impact of this problem? If it is wide-impact, or if there is not a good workaround, we could consider it for 2.2 or 2.1.x.

Based on reviewing the PR in master, it seems to be rather rare corner case, right?
Plus the workaround, sounds reasonably (although kind of ugly).

Given that we have only 1 report so far, I recommend to NOT port it to 2.2/2.1.x, until we get more developers hitting the problem.

Hi @karelz , imho I think this is actually a pretty big issue. Because you can not distinguish between SO_REUSEPORT and SO_REUSEADDR in dotnet core this is a major problem if you have multiple clients running created in different programming languages on the same binding (which all have different ways of doing the same thing). Plus it's a regression compared to the previous 2.0 runtime. I hope my opinion helps to decide what's best.

@olijf the key question is: How common is such setup?
I understand that if someone needs it, then it is bad, although there is a workaround.

but in another one i'm still getting the bind exception. I'm still investigation why this happens.

@olijf have you found the reason for the exception?

Closing as fixed in dotnet/corefx#32046

Copying over from dotnet/corefx#37044:

@karelz I have an app with a large number of users affected by this. Any Dlna media app will be affected by this. A back-port would be much appreciated. Thanks.

> _Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-480127404_

@LukePulverenti did you validate your problem is indeed the same root cause and is fixed in .NET Core 3.0? (and that it is not just same symptom)
There seems to be enough +1s to justify backport, we just need to be sure it is the right fix ... first step would be to validate on 3.0. Then we can cherry pick and ask for private validation on 2.2/2.1 build.

> _Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-480349496_

For us, this is the one that we need:
https://github.com/dotnet/corefx/pull/32046/files

_Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-494950738_


@LukePulverenti did you confirm that particular change helps your case? Or did you use latest .NET Core 3.0 to validate that?

_Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495010883_

Repro Szenario for UDP Bug

@karelz - you wrote:

@LukePulverenti did you validate your problem is indeed the same root cause and is fixed in .NET Core 3.0? (and that it is not just same symptom)
There seems to be enough +1s to justify backport, we just need to be sure it is the right fix ... first step would be to validate on 3.0. Then we can cherry pick and ask for private validation on 2.2/2.1 build.

and

We still need someone to help us track this down:
Anyone has an environment where it happens on somewhat regular basis, where we could work with you to collect more logs and experiment? It would be great help. Thanks!

Following up your chat with @LukePulverenti about backporting the fix to 2.2, I have created a reproduction scenario for you: https://github.com/softworkz/ReuseBug

The solution contains a native Linux app and a netcore console app, multi-targeting netcore 2.0, 2.2 and 3.0

This demonstrates:

  • works in 2.0
  • fails in 2.2
  • works again in 3.0

I hope this helps getting the fix backported to 2.2...

_Originally posted by @softworkz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495592262_

@softworkz thank you !

@karelz Yes it would be great to get this back-ported because ever since the 2.1 release we've had to tell users to shutdown all other upnp or dlna software on the machine in order to prevent this from happening.

> _Originally posted by @LukePulverenti in https://github.com/dotnet/corefx/issues/37044#issuecomment-495726635_

@softworkz @LukePulverenti I think we may be dealing with multiple problems here as some people on this thread said that 3.0 does not fix it for them.
Either way, we have a repro now, so let's try it -- @tmds or @wfurt will you have time to try it out and reproduce? If we can reproduce in-house, it should be easier for us to track it down. I'd be also interested in the repro result on 2.1.

Thanks @softworkz for repro!!! That is a HUGE step towards root cause and solution. Let's hope we can reproduce it too :)

> _Originally posted by @karelz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495762650_

Thanks @softworkz for repro!!! That is a HUGE step towards root cause and solution. Let's hope we can reproduce it too :)

@karelz @softworkz is talking about a UDP issue https://github.com/dotnet/corefx/issues/32027 which was decided not to be backported: https://github.com/dotnet/corefx/issues/32027#issuecomment-418447086.

The main issue reported here is a TCP issue observed when using HttpClient.

_Originally posted by @tmds in https://github.com/dotnet/corefx/issues/37044#issuecomment-495904766_

@karelz @softworkz is talking about a UDP issue dotnet/corefx#32027 which was decided not to be backported: #32027 (comment).

And still we're asking for it. It's a bug - not a "corner case".

The main issue reported here is a TCP issue observed when using HttpClient.

Not quite. We're not the only ones referring to the UDP bug here.

_Originally posted by @softworkz in https://github.com/dotnet/corefx/issues/37044#issuecomment-495927124_

@karelz - I was able to copy over our parts of the UdpClient issue, but not the ones from the others having he same problem.

Thanks @softworkz!
So far it seems we have 2 customers confirmed to hit UdpClient problem -- @olijf and @softworkz. If anyone else hit the UdpClient problem, please reply here and tell us so.

@softworkz I wonder if it would be acceptable for you to wait for 3.0 RC in July (see roadmap). If there are more customers hitting it, impacting their production, we could consider backporting to 2.1/2.2.

We are only one customer but we do bring a lot of users across every OS and NAS device that we can deploy the runtime to. Right now this is creating enough troubleshooting for us that in order to save face we are passing this information onto users and saying that we'll just have to wait for an updated runtime. We would prefer to not have to start building our own fork of the runtime from source, but it looks like that's where we're going to be headed if nothing changes here.

Once we discovered the problem we thought we could wait until March which was the original roadmap date for netcore 3.0.
But we're getting increasing pressure from customers as nobody wants to stick to our old version based on netcore 2.0 anymore (where it was still working).
Also, migrating to a new framework version is not a trivial task, because in fact we're delivering to the widest range of platforms one could think of: Windows, Linux (7 different distributions), MacOs, FreeBSD and Android.

(funny: Luke just wrote about the same...)

You should be able to work around the issue as described here: https://github.com/dotnet/corefx/issues/32027#issuecomment-418082355

Just to clarify @softworkz @LukePulverenti: Are you from the same company?

@softworkz AFAIK, March was never original roadmap for .NET Core 3.0.
.NET Core 2.0 is out of support, so I understand your desire to not use it. We would recommend the same.

Can you please try workaround from @tmds? Would that be acceptable until you are able to upgrade to 3.0?

Just to clarify @softworkz @LukePulverenti: Are you from the same company?

More 'for' than 'from', but yes.

@softworkz AFAIK, March was never original roadmap for .NET Core 3.0.

Well, now I'm confused because you're the one who edited the roadmap document:
https://github.com/dotnet/core/blob/1b9b75a242b09f85a6dd7916ff08e7c28154f2b5/roadmap.md

Q1 2019 was the announced ship date from May 24, 2018 until Nov 6, 2018.
The last month in Q1 2019 is March 2019.

Can you please try workaround from @tmds? Would that be acceptable until you are able to upgrade to 3.0?

We'll try and report back, thanks.

you're the one who edited the roadmap document

Fair point. It is so long time ago that I forgot :), sorry!

Quick update: The workaround was successful in case of my repro scenario. We're currently adding this to a new beta and then I'll report back..

I'm experiencing a similar issue.
I have a custom Azure IoT Edge module which runs in a Linux container. The IoT Edge module uses the Zeroconf library to discover devices in the network via mDNS. ZeroConf uses UdpClient to bind to a certain socket. When the IoT Edge module starts, ZeroConf is called to discover the devices.
I get the following exception:

<06/04/2019 10:03:23> Address already in use
<06/04/2019 10:03:23>    at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.Bind(EndPoint localEP)
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, NetworkInterface adapter, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 107
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, NetworkInterface adapter, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 169
   at Zeroconf.NetworkInterface.NetworkRequestAsync(Byte[] requestBytes, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`2 onResponse, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\NetworkInterface.cs:line 34
   at Zeroconf.ZeroconfResolver.ResolveInternal(ZeroconfOptions options, Action`2 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.cs:line 79
   at Zeroconf.ZeroconfResolver.ResolveAsync(ResolveOptions options, Action`1 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.Async.cs:line 98
   at Zeroconf.ZeroconfResolver.ResolveAsync(IEnumerable`1 protocols, TimeSpan scanTime, Int32 retries, Int32 retryDelayMilliseconds, Action`1 callback, CancellationToken cancellationToken) in D:\a\1\s\Zeroconf\ZeroconfResolver.Async.cs:line 69

When I look into the relevant ZeroConf code, I notice that UdpClient is used and that

using (var client = new UdpClient())
 {
    for (var i = 0; i < retries; i++)
    {
        try
        {
            var socket = client.Client;

            if (socket.IsBound) continue;

            socket.SetSocketOption(SocketOptionLevel.IP,
                        SocketOptionName.MulticastInterface,
                        IPAddress.HostToNetworkOrder(ifaceIndex));

            client.ExclusiveAddressUse = false;
            socket.SetSocketOption(SocketOptionLevel.Socket,
                                                      SocketOptionName.ReuseAddress,
                                                      true);
            socket.SetSocketOption(SocketOptionLevel.Socket,
                                                      SocketOptionName.ReceiveTimeout,
                                                      (int)scanTime.TotalMilliseconds);
            client.ExclusiveAddressUse = false;


            var localEp = new IPEndPoint(IPAddress.Any, 5353);

            Debug.WriteLine($"Attempting to bind to {localEp} on adapter {adapter.Name}");
            socket.Bind(localEp);

(The exception is thrown on the socket.Bind() call).

Was this ever backported to 2.2/2.1?

Was this ever backported to 2.2/2.1?

It was not. There are no plans to backport it. In generally, not all fixes get backported to previous releases.

Check out the latest .NET Core 3.0 preview. It is suitable for 'go-live' scenarios:

https://devblogs.microsoft.com/dotnet/announcing-net-core-3-0-preview-9/

@davidsh Thanks, was hoping to not have to use the workaround. I am writing a library, users could be on any version.

@matthew798 Not sure if this can help you but I posted a hack that works, it was tested on .NET Core 2.1. It could be improved, see https://github.com/QTimort/bind-reuse-port

@QTimort Thanks. I'll have a look!

Alternatively, you can us the code from https://github.com/dotnet/corefx/issues/32027#issuecomment-417395637

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

Alternatively, you can us the code from #32027 (comment)

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
    // set SO_REUSEADDR (https://github.com/dotnet/corefx/issues/32027)
    int value = 1;
    setsockopt(MultiCastClient.Client.Handle.ToInt32(), 1, 2, &value, sizeof(int));
}

[DllImport("libc", SetLastError = true)]
private unsafe static extern int setsockopt(int socket, int level, int option_name, void* option_value, uint option_len);

We had the same problem and I can confirm that this has fixed it for us.

I'm not sure this is related to this issue, so if need by, I'll start another. I have observed what I think is a discrepency in how .net core handles sockets compared to native (on linux, at least)

I am talking about the case where two sockets are bound to the same endpoint, but only one is connected to a remote endpoint. I have an SO question on the subject, and it seems that the behaviour is "undefned", yet the code here works perfectly, and does exactly that.

Specifically, the code I linked creates 2 sockets, one to listen for incoming dtls "connection" requests, and another for a connected client. Both of these sockets are bound to the same endpoint, and the second is connected to the client's endpoint. The result is that all traffic originating from the "connected" client is forwarded to the socket created specifically for them, and all other traffic is forwarded to the unconnected socket.

I tried to replicate this behaviour in C# with no luck. As I mentioned, it seems that in this specific case, the behavior is undefined and the data seems to be forwarded to a socket at random.

My code is as follows:

var localEp = new IPEndPoint(IPAddress.Loopback, 1114);

var socket = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
socket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
socket.Bind(localEp);
...
Setting up SSL
...
var clientSocket = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
clientSocket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
clientSocket.Bind(localEp);
clientSocket.Connect(bioAddr);

At this point, there is no way to guarantee that the client's dgrams will make it to clientSocket. This does not match the behavior of the code I linked above.

Is this a bug in .net? I am using .net core 3.0, so I know that both SO_REUSEADDRESS and SO_REUSEPORT are being set. I'm not sure what I am missing...

Is this a bug in .net? I am using .net core 3.0, so I know that both SO_REUSEADDRESS and SO_REUSEPORT are being set. I'm not sure what I am missing...

In the code you linked to: SO_REUSEPORT is not set on Linux: https://github.com/nplab/DTLS-Examples/blob/226f222e528858b3a8c5fa3326b0599d25d3ef1c/src/dtls_udp_echo.c#L652-L654

Yes you are correct. My code doesn't work in both dotnet core 2.2 (where SO_REUSEPORT is not set) and 3.0 (where it is). So it seems that SO_REUSEPORT has no effect on the result.

The bottom line is that the behavior I described is achievable in native code, but there seems to be no way in dotnet.

Ill open another issue.

Was this page helpful?
0 / 5 - 0 ratings