Runtime: Implement portable support for TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL socket options

Created on 3 Nov 2017  ·  26Comments  ·  Source: dotnet/runtime

Allow for configuring TCP Keepalive in a portable manner.

Rationale

TCP Keepalive is an optional feature of TCP that is implemented in most widely used OSes and the feature can already be enabled in .NET by using SocketOptionName.KeepAlive.
However, it is not possible, as of now, to set the various keepalive options using a standard .NET API.

Most platforms, more than simply allowing TCP keepalive, provide a subset of the following three options:

  • Keepalive Time
  • Keepalive Interval
  • Keepalive Retry Count

Since Windows 2000, it has been possible to set both Keepalive Time and Keepalive Interval by using SIO_KEEPALIVE_VALS with Winsock IOCTL (Exposed via Socket.IOControl and IOControlCode.KeepAliveValues in .NET)
Under Linux, TCP Keepalive can be configured with setsockopt under the SOL_TCP level. The allowed parameters are TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL.

It seems OSX also has some good support for the feature since OSX Lion, only using slightly different names than Linux:
https://lists.apple.com/archives/macnetworkprog/2012/Jul/msg00005.html

From the docs, since Windows 10 version 1703 and 1709, settings that are code-compatible with Linux and OSX were introduced:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms738596(v=vs.85).aspx

As explained in https://github.com/dotnet/corefx/issues/14237, it is impossible today to configure TCP keepalive on platforms other than Windows, where Socket.IOControl can do the trick.

Proposed API

enum SocketOptionName
{
//…
    #region SocketOptionLevel.Tcp
//…
    // TCP KeepAlive options
    TcpKeepAliveRetryCount = 16, // TCP_KEEPCNT value from Ws2ipdef.h
    TcpKeepAliveTime = 3, // TCP_KEEPIDLE = TCP_KEEPALIVE value from Ws2ipdef.h
    TcpKeepAliveInterval = 17, // TCP_KEEPINTVL value from Ws2ipdef.h
    #endregion
//…
}

Example

void EnableKeepAlive(Socket socket, byte retryCount, int keepAliveTimeInSeconds, int keepAliveIntervalInSeconds)
{
    socket.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.KeepAlive, true);
    socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveRetryCount, (int)retryCount);
    socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveTime, keepAliveTimeInSeconds);
    socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAliveInterval, keepAliveIntervalInSeconds);
}

Details

The PAL for each supported platform would translate the call to Socket.SetSocketOption(SocketOptionLevel.Tcp, SocketOptionName.TcpKeepAlive*, *) into the appropriate call for the platform.

  • On Linux, and OSX ≥ 10.7, .NET enumeration values would be translated to corresponding system values for setsockopt().
  • On Windows 10 version 1709 and newer, the flags would be transferred as-is to the underlying Winsock API, as they already are today.
  • Ideally, TcpKeepAliveTime and TcpKeepAliveInterval would be marshalled to socket.IOControl(IOControlCode.KeepAliveValues, KeepAliveValues, null) for previous versions of windows, while TcpKeepAliveRetryCount would be simply ignored.

⚠️ Beware that commonly used socket option names TCP_KEEPIDLE (or TCP_KEEPALIVE) and TCP_KEEPINTVL are expressed in seconds, while SIO_KEEPALIVE_VALS expresses durations in milliseconds.

Questions

The TCP keepalive feature being optional, it is not required that all platforms support it or provide any specific way of configuring the feature.

  • Should a call to a non supported TCP keepalive option throw an exception ?
  • How should a caller determine if a given option is indeed supported on the current platform ?
  • Should an additional socket property, similar to Socket.LingerState be also added (e.g. KeepAliveState of type KeepAliveOption), in order to group the various options and handle everything at one place ?

    • The feature could be used with something like socket.KeepAliveState = new KeepAliveOption(true, 15, 7200, 1)

    • This would also be more consistent with the way the feature is configured on legacy Windows system.

Hackathon api-approved area-System.Net.Sockets up-for-grabs

Most helpful comment

These 3 options are not in .netstandard 2.1 right now, do you have a plan to add them into netstandard?

  • TcpKeepAliveInterval
  • TcpKeepAliveRetryCount
  • TcpKeepAliveCount

All 26 comments

Sounds like a reasonable proposal to us (@karelz @Priya91 @wfurt) -- @davidsh @DavidGoll any opinions?

BTW: We already have protocol specific options (e.g. IpTimeToLive) on the type: https://apisof.net/catalog/System.Net.Sockets.SocketOptionName.IpTimeToLive

Should we throw an exception if isKeepAliveSupported is false?
To add an exception message is it sufficient to edit the resx file?

OS X version >= 10.7.0 i.e. Darwin kernel version >= 11.0.0:

var isKeepAliveSupported = Environment.OSVersion.Version.Major >= 11

Windows version >= 10.0.1709:

var os = Environment.OSVersion;
var version = Environment.OSVersion.Version;
var isKeepAliveSupported = os.PlatformID == PlatformID.Win32NT && (version.Major > 10 || version.Major == 10 && version.Build >= 16299)

.netcore is supported on OSX 10.12 and above. Even if we don't add explicit check, the socket call should fail and return error.

On versions below Windows 10 1709 it is not possible to retrieve the keep-alive values set via IOControl and an exception will be thrown when calling the getter of the KeepAliveOption 😞

I created a branch and implemented the features requested to close this issue, but would like to know if there is a way to test them better or to try them (e.g. the F5 way like in simple .NET projects), before opening a PR

First step is to get API approved @luigiberrettini. I did not see that in the debates. Is this something you can take on @karelz and pitch that to the committee?
Having working prototype is certainly good step.

Sadly, it is too late for 2.1. We are busy finishing the release for next Preview. Let's open the API discussion next week for post-2.1 release.

I am sorry it took me too long to implement it 😞

@luigiberrettini while it would be useful to have it, it is not the end of the day to land it in the next release. The priority of this API was not very high. When it came to our attention over last few weeks, it was already fairly late for 2.1 anyway.

I've created some tests to check if setting socket options related to keep-alive works.
Unfortunately, even if both Linux and OS X kernels supports the option name constants I used, the tests are failing:

Do you have an idea of why the SocketPal.SetSockOpt is throwing a System.Net.Sockets.SocketException: Operation not supported?

I think you are reproducing the problem from https://github.com/dotnet/corefx/issues/14237 in these tests. (Also the reason why I opened this issue)

The values used in SocketOptionName were obviously derived from Windows headers, thus allowing the current Windows SocketPal to simply forward the SocketOptionName value to the underlying API. (But we can expect this to change someday)
However, for non Windows platforms, the SocketOptionName values will always have to be converted into their platform-specific equivalent. This means that any SocketOptionName value that is not specifically supported by the SocketPal will not work.

Calling the native setsockopt API via P/Invoke would still work, however.

I found the problem is TryGetPlatformSocketOption which is called passing option level (SocketOptionLevel.Socket in my code but SocketOptionLevel.Tcp in this issue proposal) and name:

extern "C"
Error SystemNative_SetSockOpt(intptr_t socket, int32_t socketOptionLevel, int32_t socketOptionName, uint8_t* optionValue, int32_t optionLen)]
{
    // ...
    int optLevel, optName;
    if (!TryGetPlatformSocketOption(socketOptionLevel, socketOptionName, optLevel, optName))
    {
        return PAL_ENOTSUP;
    }
    // ...
    int err = setsockopt(fd, optLevel, optName, optionValue, static_cast<socklen_t>(optionLen));
    // ...
}

It has misleading parameter names and return false for options not present in the SocketOptionName enum:

static bool TryGetPlatformSocketOption(int32_t socketOptionName, int32_t socketOptionLevel, int& optLevel, int& optName)
{
    // ...
}

This means that the keep-alive:

  • retry count TCP_KEEPCNT (Linux = 0x6, OS X = 0x102) generates a PAL_ENOTSUP error and does not call the setsockopt syscall
  • time TCP_KEEPIDLE (Linux = 0x4) is equal to PAL_SO_REUSEADDR and PAL_SO_IP_TTL therefore generates an error calling the setsockopt syscall
  • time TCP_KEEPALIVE (OS X = 0x10) is equal to PAL_SO_DONTROUTE and PAL_SO_IP_DROP_SOURCE_MEMBERSHIP therefore generates an error calling the setsockopt syscall
  • interval TCP_KEEPINTVL (Linux = 0x5, OS X = 0x101) generates a PAL_ENOTSUP error and does not call the setsockopt syscall

Thanks @GoldenCrystal!

It now works with P/Invoke:

Here is the code:

API discussion with Networking team:

  • We should think how to expose it in HttpClientHandler/SocketsHttpHandler (I think there is already existing related issue)
  • If platform does not support the option, we should not throw (it's annoying), just do noop + document it + log diagnostic entry
  • The KeepAliveOption is problematic if we cannot get the information on all platforms. Also it is weird if the setter is noop (or even worse - exception) on some platforms for some values (e.g. RetryCount on pre-Win10). Until we have confirmation it is implementable on all OSs in a reasonable way, we should skip this part and focus on the 3 enum values first.

Summary: Networking team is fine with the first part of the API. We are going to bring it to BCL API review.

Video

Looks good.

Can we double check what happens if options are no-ops or are unsupported. Do we throw? If so, should the name of the option indicate that things might be unsupported or are no-ops?

Next step: Validate other options.

The change should be fairly straightforward. Adding new API is the trickiest part.

Please let me know if I should do something since the code is ready on my side

@luigiberrettini if you can match the scoped API shape, feel free to put up a PR for review of your changes. Let me know if you plan to do that - I will assign the issue to you in such case to signal it is taken. Thanks!

I have worked on this at the end of March, commented many times and asked support to understand if my approach was correct before submitting a PR.
You told me you were going to open the API discussion and probably it's me not understanding, but these comments look like the discussion is still open:

  • "Networking team is fine with the first part of the API. We are going to bring it to BCL API review."
  • "Can we double check what happens if options are no-ops or are unsupported. Do we throw? If so, should the name of the option indicate that things might be unsupported or are no-ops?"
  • "Next step: Validate other options. The change should be fairly straightforward. Adding new API is the trickiest part."

I was waiting for an update to everybody on the outcome of the discussion, as usually happens to people offering their contribution to open-source projects, but it was only looking at the issue that I discovered it had been labeled as up-for-grabs and Hackathon...

I am ready to submit a PR but I honestly do not know where the scoped API shape you are talking about is defined.

Yeah, the issue is not super-clear, my apologies.
The final API review happened https://github.com/dotnet/corefx/issues/25040#issuecomment-391097721 (it was switched to api-approved).

The remaining work is to do a research on other options and how they behave if they are not supported on specific OS/versions. We should match the behavior (no-op vs. exception) with them to be consistent.
Does it make sense?

Makes sense, I'll start working on this in two days.

On Linux/OS X the option not supported generate an exception: SystemNative_SetSockOpt uses TryGetPlatformSocketOption to check for supported options.
I could not find any case in which a specific versions of an OS was checked in the code (only in tests).

Keep alive is supported on all versions of Linux/OS X on which .NET core is supported.

On Windows, the keep-alive retry count is set to 10 and cannot be changed via *SocketOption methods before Windows 10 version 1703: in the Socket.KeepAliveState property I used value 10 in the getter and ignored setting it in the setter.

Do you think it is ok or you prefer to not allow setting KeepAliveState even if only one of its property cannot be set?

@luigiberrettini I sent you collaborator request, so that we can assign the issue to you. Ping me once you accept (assigning to myself temporarily).
Pro-tip: Being collaborator will subscribe you to all repo notifications (500+ per day). We recommend to switch it to "Not Watching" which will limit notifications only to your mentions, assignments and explicit subscirptions.

I joined the repo as a collaborator and checked that the Watch setting was "Not Watching"

These 3 options are not in .netstandard 2.1 right now, do you have a plan to add them into netstandard?

  • TcpKeepAliveInterval
  • TcpKeepAliveRetryCount
  • TcpKeepAliveCount

is this feature fixed if so which versions of .netcore and .netstandard should I expect the fix in

I am more interested in knowing if its fixed in the below versions
netstandard1.5 .netcoreapp2.1,.netstandard2.1,.net4.5.2,.netstandard2.0

can someone please provide some insights

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sahithreddyk picture sahithreddyk  ·  3Comments

bencz picture bencz  ·  3Comments

noahfalk picture noahfalk  ·  3Comments

matty-hall picture matty-hall  ·  3Comments

omariom picture omariom  ·  3Comments