While testing PowerShell Core 6.1 I ran into an issue with not being able to authenticate to a Kerberized REST API running on Linux unless I disable SocketsHttpHandler.
https://github.com/PowerShell/PowerShell/issues/7801
It seems like that when the server responds with both Negotiate and NTLM, the SocketsHttpHandler picks NTLM which in my case results in a 401 as the service in question is really expecting Negotiate / SPNego and is not working with NTLM.
As requested by @karelz in https://github.com/dotnet/corefx/issues/30166
I've reproduced it on the daily builds without PowerShell Core involved and same results, so submitting a new issue for this.
When server sends multiple auth schemes like Negotiate and NTLM, pick the strongest one which in this case is Negotiate.
C:\dev\test\httpclient-spnego\test2>dotnet --info
.NET Core SDK (reflecting any global.json):
Version: 2.1.403-servicing-009270
Commit: def6c5f48d
Runtime Environment:
OS Name: Windows
OS Version: 10.0.15063
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\2.1.403-servicing-009270\
Host (useful for support):
Version: 2.1.5-servicing-26911-03
Commit: efdba896f7
.NET Core SDKs installed:
2.1.201-preview-007614 [C:\Program Files\dotnet\sdk]
2.1.202 [C:\Program Files\dotnet\sdk]
2.1.400 [C:\Program Files\dotnet\sdk]
2.1.402 [C:\Program Files\dotnet\sdk]
2.1.403-servicing-009270 [C:\Program Files\dotnet\sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.1.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.1.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.1.5-rtm-31008 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.1.4 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.1.5-rtm-31008 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.0.7 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.0.9 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.4 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.5-servicing-26911-03 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download
var handler = new HttpClientHandler
{
UseDefaultCredentials = true,
AllowAutoRedirect = true,
};
using (var client = new HttpClient(handler))
{
var res = client.SendAsync(new HttpRequestMessage(HttpMethod.Get, uri)).GetAwaiter().GetResult();
System.Console.WriteLine(res);
}
Result: 401
GET / HTTP/1.1
Host: mykerberossite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:31:42 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
Content-Length: 0
GET / HTTP/1.1
Authorization: Negotiate **
Host: mykerberossite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:31:42 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: NTLM
Content-Length: 0
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false);
var handler = new HttpClientHandler
{
UseDefaultCredentials = true,
AllowAutoRedirect = true,
};
using (var client = new HttpClient(handler))
{
var res = client.SendAsync(new HttpRequestMessage(HttpMethod.Get, uri)).GetAwaiter().GetResult();
System.Console.WriteLine(res);
}
result: 200
GET / HTTP/1.1
Connection: Keep-Alive
Host: mykerberosite.lab.local
HTTP/1.1 401 Unauthorized
Date: Mon, 17 Sep 2018 21:30:27 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
Content-Length: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
GET / HTTP/1.1
Connection: Keep-Alive
Host: mykerberossite.lab.local
Authorization: Negotiate *
HTTP/1.1 200 OK
Date: Mon, 17 Sep 2018 21:30:27 GMT
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate *
Cache-Control: no-cache
Expires: -1
Content-Type: text/plain;charset=UTF-8
Content-Length: 103
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
It seems like that when the server responds with both Negotiate and NTLM, the SocketsHttpHandler picks NTLM which in my case results in a 401 as the service in question is really expecting Negotiate / SPNego and is not working with NTLM.
Your attached output from the wire trace doesn't show that. The client-side HTTP is picking Negotiate.
GET / HTTP/1.1
Authorization: Negotiate ****
Host: mykerberossite.lab.local
However, the server is still not authenticating and returning back a 401.
Your attached output from the wire trace doesn't show that. The client-side HTTP is picking Negotiate.
Hmm, could be a copy paste error. Will re-test on Monday and update.
Hey, sorry for the slow response...
So I've checked again and it's indeed using NTLM with that Negotiate response ( NTLM_NEGOTIATE).
It was just not obvious with my *-ed out HTTP traffic as it didn't show the size of that blob.
NTLM is way smaller than Kerberos so you can just eye ball it.
Wireshark can actually dissect and show that it's NTLMSSP_NEGOTIATE as show below.

And here is the snip when I disable SocketsHttpHandler

So I've checked again and it's indeed using NTLM with that Negotiate response ( NTLM_NEGOTIATE).
So, what you're really saying is that the HTTP stack correctly responded with Negotiate scheme. But Negotiate ended up "negotiating" NTLM instead of Kerberos.
That happens a lot when the requirements for a valid Kerberos infrastructure don't exist. For example, if the client machine is not joined to the Windows Active Directory (or Linux Kerberos) domain of the server, or timestamps aren't matching etc.
In your example, the server is a Linux machine using Kerberos. I'm assuming that the Linux machine is also the Kerberos ticketing server, or is there a separate machine for that?
Usually, this kind of problem is a configuration problem with the machines and not a problem with the client-side HTTP stack since it is picking Negotiate scheme correctly.
cc: @wfurt @geoffkizer @karelz
TLDR: Unlike the rest of the Windows web clients (browsers, .NET full, etc) SocketsHTTPHandler is not canonicalizing the given host when trying to request the SPN which breaks Kerberos if the Url has a CNAME and the SPN is only on the DNS A record of the host.
Details
While writing a pretty long reply linking to docs on MSDN about how Kerberos is implemented on windows in terms of SPNego's selection of auth mechanisms, etc etc and preparing the example traces I just found the issue why it's not working.
phew.
This is what's happening in very high level when it's working as expected ( after purged krb tickets and flushed DNS cache - which actually helped me nailing it down - always purge and always flush!)
And this is what's happening with SocketsHttpHandler
So the reason SocketsHTTPHandler is not working is because it's trying to find the SPN for the CNAME instead of canonicalizing it with a forward lookup.
While Windows SSPI is not canonicalizing, all applications are. IE, Firefox, Chrome, .NET web clients, etc are all canonicalizing.
i.e.: coming from .NET Full this is a breaking change.
So SocketsHTTPHandler would need to stand in line and stay consistent.
Perhaps allow a flag to not canonicalize but have the default to do whatever IE,Chrome, etc .NET full web is doing.
Linux
On Linux this is configured via krb5.conf so SocketsHTTPHandler should honor that setting.
rdns = true|false
TLDR: Unlike the rest of the Windows web clients (browsers, .NET full, etc) SocketsHTTPHandler is not canonicalizing the given host when trying to request the SPN which breaks Kerberos if the Url has a CNAME and the SPN is only on the DNS A record of the host.
Thanks for the additional details.
SocketsHttpHandler on Windows uses the Windows SSPI libraries for doing Negotiate and NTLM protocols. I do not think it is something that is directly controllable by SocketsHttpHandler. It would need to be investigated further.
We have tested Linux clients against a Windows server/ActiveDirectory domain. We've demonstrated that Negotiate scheme will use Kerberos if all the machines are configured properly. See: dotnet/runtime#26418.
But we have not extensively tested a Windows client using Negotiate (Kerberos) into a Linux server environment.
In Linux, canonicalization is controlled by configuration knobs in krb5.conf. Applications are explicitly not supposed to do any kind of canonicalization there and rely on the GSSAPI implementation and its configuration.
In Windows, SSPI does not canonicalize, and AFAIK, there are no global knobs to turn. But many applications, including all of the major browsers, WinHTTP (and thus legacy .NET HTTP clients), and most (but not all) OS components do forward canonicalize CNAMEs prior to calling into SSPI. People/things now generally expect that behavior, witness the outcry when Chrome accidentally stopped pre-canonicalizing recently: https://bugs.chromium.org/p/chromium/issues/detail?id=872665
But we have not extensively tested a Windows client using Negotiate (Kerberos) into a Linux server environment.
My guess is that this will reproduce on windows->windows as well ( haven't tested it yet).
i.e.: Setup IIS with negotiate:kerberos as the only windows auth provider. Configure the Restrict NTLM GPO, then have an SPN for for the A and use a CNAMe to access it.
If configured correctly SockesHttpHandler will break as it will try the wrong SPN then can't use NTLM while the legacy .NET client will work as is.
But many applications, including all of the major browsers, WinHTTP (and thus legacy .NET HTTP clients), and most (but not all) OS components do forward canonicalize CNAMEs prior to calling into SSPI.
So, this scenario does work using WinHttpHandler (WinHTTP) on the client-side? So, did you turn off SocketsHttpHandler (via AppContext switch for example) and demonstrate that the scenario works?
See: https://github.com/dotnet/core/blob/master/release-notes/2.1/2.1.0.md
Networking Performance
You can use one of the following mechanisms to configure a process to use the older HttpClientHandler:
From code, use the AppContext class:
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false);
The AppContext switch can also be set by config file.
The same can be achieved via the environment variable DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER. To opt out, set the value to either false or 0.
If this works with WinHttpHandler, then it should be possible to fix it for SocketsHttpHandler. WinHttpHandler uses native WinHTTP which uses the same Windows SSPI libraries as SocketsHttpHandler for doing Negotiate and NTLM.
@stephentoub
So, this scenario does work using WinHttpHandler (WinHTTP) on the client-side? So, did you turn off SocketsHttpHandler (via AppContext switch for example) and demonstrate that the scenario works?
Yep. see my very first post with the below workaround to make this work
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false);
I think we just need to use the canonical host name here (as returned by Dns.GetHostEntry) instead of the hostname in the uri. Correct?
Almost. That would handle CNAMEs and partially qualified names of As (that become fully qualified when the OS resolver appends one of the configured search suffixes). But...
@mattpwhite Thank you for the added details regarding CNAMEs and the issues with LLMNR, NETBIOS, etc.
Since you observe the correct behavior in .NET Framework, we will look at that implementation to see where it differs from .NET Core SocketsHttpHandler. That will give us more insight into the correct implementation.
I've done some research into why this works on .NET Framework.
.NET Framework does make sure to do canonicalization when it is computing the proper SPN to use:
AuthenticationState.GetComputeSpn
https://github.com/Microsoft/referencesource/blob/master/System/net/System/Net/_AuthenticationState.cs#L114
which then calls internal method Dns.TryInternalResolve
https://github.com/Microsoft/referencesource/blob/master/System/net/System/Net/DNS.cs#L553
So, we would need to use similar DNS resolution logic in SocketsHttpHandler.
I was able to research this problem with a Windows-Windows setup in our separate Enterprise Testing environment.
Given an IIS server called "corefx-net-iis" on a domain called "corefx-net.contoso.com", we are able to get Negotiate to use Kerberos with using any of the following URI's.
```c#
// Use A record of server
string server = "http://corefx-net-iis/test/NegotiateTest.ashx";
string server = "http://corefx-net-iis.corefx-net.contoso.com/test/NegotiateTest.ashx";
// Use CNAME of server
string server = "http://iis-server/test/NegotiateTest.ashx";
string server = "http://iis-server.corefx-net.contoso.com/test/NegotiateTest.ashx";
"iis-server.corefx-net.contoso.com" is a CNAME.
But for .NET Core 2.1.5, Negotiate will only use Kerberos when using the original FQDN of the server (A record):
```c#
string server = "http://corefx-net-iis/test/NegotiateTest.ashx";
string server = "http://corefx-net-iis.corefx-net.contoso.com/test/NegotiateTest.ashx";
Any of the DNS names using the CNAME results in Negotiate using NTLM.
Thanks @davidsh for your research and the effort to reproduce this issue!
(Hopefully your setup could later be used to run Kerberos integration tests!)
If NTLM is disabled due to security considerations(which can be the case in sensitive environments), then calls with CName on .NET Core 2.1.5 won't be able to authenticate and fail, so that can be a good test for the fix.
Here is how to disable NTLM:
https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/network-security-restrict-ntlm-ntlm-authentication-in-this-domain#security-considerations
Or if NTLM is not supported by the target server running on Linux in a trusted Kerberos realm then again the auth will fail ( my original use-case, which is much more involved to have a lab for).
Linux
On Linux this is configured via krb5.conf so SocketsHTTPHandler should honor that setting.
rdns = true|false
Just to be clear on that particular Linux kerberos setting, it only affects whether REVERSE DNS is done on ip addresses. It doesn't change how FORWARD normalization is done with respect to traversing CNAME records.
Service principal canonicalization
MIT Kerberos clients currently always do forward resolution (looking up the IPv4 and possibly IPv6
addresses using getaddrinfo()) of the hostname part of a host-based service principal to canonicalize the
hostname. They obtain the “canonical” name of the host when doing so. By default, MIT Kerberos clients
will also then do reverse DNS resolution (looking up the hostname associated with the IPv4 or IPv6
address using getnameinfo()) of the hostname. Using the krb5.conf setting:
[libdefaults]
rdns = false
will disable reverse DNS lookup on clients. The default setting is “true”.
Operating system bugs may prevent a setting of rdns = false from disabling reverse DNS lookup. Some
versions of GNU libc have a bug in getaddrinfo() that cause them to look up PTR records even when not
required. MIT Kerberos releases krb5-1.10.2 and newer have a workaround for this problem, as does the
krb5-1.9.x series as of release krb5-1.9.4.
Reverse DNS mismatches
Sometimes, an enterprise will have control over its forward DNS but not its reverse DNS. The reverse
DNS is sometimes under the control of the Internet service provider of the enterprise, and the enterprise
may not have much influence in setting up reverse DNS records for its address space. If there are
difficulties with getting forward and reverse DNS to match, it is best to set rdns = false on client
machines.
.NET Framework has never done any reverse DNS lookup checks. And in general, it won't do any normalization of the SPN name (from the hostname in the Uri specified in the http request) if the hostname is actually an IP address, i.e. "http://10.0.0.5/NegotiateEndpoint"
I am working on a fix for this issue. But the PR will likely only match existing .NET Framework behavior and won't do any reverse DNS checks nor any specific Linux kerberos config file lookups.
Just to be clear on that particular Linux kerberos setting, it only affects whether REVERSE DNS is done on ip addresses. It doesn't change how FORWARD normalization is done with respect to traversing CNAME records.
Thanks for pointing this out! The correct setting is actually dns_canonicalize_hostname which is a "relatively" recent ( khmm 5y old, but that's recent in Krb) addition.
https://github.com/krb5/krb5/commit/60edb321af64081e3eb597da0256faf117c9c441
It's default value is true.
Indicate whether name lookups will be used to canonicalize hostnames for use in service principal names. Setting this flag to false can improve security by reducing reliance on DNS, but means that short hostnames will not be canonicalized to fully-qualified hostnames. The default value is true.
https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html#libdefaults
So a correct implementation would need to honor that setting on Linux and if it's not then at least it should be mentioned somewhere so it's clear and avoids confusions ( probably saving few hours of debugging ).
Just to be clear, I am already a happy person with your proposed fix of having the existing .NET framework behavior, but for the sake of completeness wanted to mention that Linux krb5.conf knob.
Thanks!
I am working on a fix for this issue. But the PR will likely only match existing .NET Framework behavior and won't do any reverse DNS checks nor any specific Linux kerberos config file lookups.
Well, under no circumstances should .NET be trying to parse a krb5.conf. On non-Windows systems, my understanding is that no name canonicalization should need to be performed before calling into a GSSAPI/Kerberos implementation because that library already takes care of this (or not, depending on how someone chose to configure it for their environment). My understanding is also that the typical configuration is forward canonicalization on, reverse off. Forward canonicalization was just implicitly enabled by MIT for some time, though as @csharmath points out, it's now a knob.
The Windows case is different because SSPI does not do canonicalization on behalf of applications. There is no way for a developer/administrator to express how they would like all applications to behave, so the most reasonable thing to do is to just do what IE did way back when and the other browsers subsequently emulated - forward canoncialize, no reverse. FWIW, browsers did eventually add knobs to customize this behavior on Windows (https://www.chromium.org/developers/design-documents/http-authentication, https://blogs.technet.microsoft.com/askds/2009/06/22/internet-explorer-behaviors-with-kerberos-authentication/). The reason these are application knobs on Windows is because the application actually controls it; disabling CNAME resolution wouldn't work if SSPI did it for you in the way that MIT does in a default configuration.
Well, under no circumstances should .NET be trying to parse a krb5.conf.
..
no name canonicalization should need to be performed before calling into a GSSAPI/Kerberos implementation because that library already takes care of this
Make sense, thanks for this!
so the most reasonable thing to do is to just do what IE did way back when and the other browsers subsequently emulated - forward canoncialize, no reverse
This is what the current .NET Framework behavior is. And this is what the fix for .NET Core will be also.
fyi. I'll be OOF for about a week+. So, I'll be submitting the PR for this fix as soon as I get back.
thanks for the fix @davidsh !
So this will ship with 3.0 ?
So this will ship with 3.0 ?
Yes, the fix is in the master branch for 3.0.
Thank you for your fix @davidsh !
Just wondering if it would be possible to backport this merge request to either 2.2 or one of it's servicing releases so this fix become available for 2.2 as well and for PowerShell Core 6.2 ?
@csharmath we do not port changes to servicing branches unless there is a very good reason - i.e. impact on larger set of customers, without reasonable workaround. Is that the case here?
It's hard for me to tell the impact of this, but surely impacts all shops using Negotiate with CNames.
Plus this is a breaking change for Negotiate auth introduced with SocketsHttpHandler requiring a workaround to disable it and loosing on perf.
The workaround is to either use the DNS A record or disable SocketsHttpHandler (more preferable in cases when CNames can change).
AppContext.SetSwitch("System.Net.Http.UseSocketsHttpHandler", false);
or set the env var
$env:DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0
Once set, these settings have a potential to be easily forgotten to be undone by dev teams after moving to 3.0 and missing out on the new perf improvements (unless these settings will be ignored with 3.0).
@csharmath correct, disabling SockertsHttpHandler is not something we recommend. However, porting every fix into servicing would basically make the servicing branch new master (incl. instability, lower quality, higher chance of other regressions, etc.)
If we hear from more customers, we can consider porting it back.
Does it have specific impact on your environment(s)? Is the first workaround reasonable / acceptable in the meantime for you?
To be clear I wasn't trying to propose to port all fixes, but I would consider anything security related as important to evaluate for backporting consideration.
The impact of this issue could be a downgrade from Kerberos to NTLM.
If NTLM is configured and allowed, then it's very unlikely that most of your customers will notice this unless they explicitly audit NTLM ( which we do) as it's still authenticating, but not with the recommended and desired mechanism.
If the webserver is running on Linux or in an environment where NTLM is explicitly disabled, then it breaks auth so those customers will notice it immediately.
If a company already made investments to setup and use Kerberos, but for reasons did not completely disable NTLM then this change weakens their security with them potentially not even realizing it.
If you put your security hat on, then it's a less optimal situation.
So my angle on this is security and would recommend to review your position based on that.
Let me paste from MSDN:
NTLM and NTLMv2 authentication is vulnerable to a variety of malicious attacks, including SMB replay, man-in-the-middle attacks, and brute force attacks. Reducing and eliminating NTLM authentication from your environment forces the Windows operating system to use more secure protocols, such as the Kerberos version 5 protocol, or different authentication mechanisms, such as smart cards.
There is another workaround/mitigation that can be considered.
This problem only occurs if a CNAME is used in the URI and that CNAME is not registered as an SPN in Kerberos. So, the workaround is to register this additional CNAME SPN in Windows Active Directory / Kerberos environment.
@csharmath correct, disabling SockertsHttpHandler is not something we recommend....
If we hear from more customers, we can consider porting it back.
I will be forced to deploy the current project I am working on with SocketsHttpHandler disabled. I'm delivering a Web API that needs to pass through delegated auth to a SAP OData service.
@csharmath correct, disabling SockertsHttpHandler is not something we recommend....
If we hear from more customers, we can consider porting it back.I will be forced to deploy the current project I am working on with SocketsHttpHandler disabled. I'm delivering a Web API that needs to pass through delegated auth to a SAP OData service.
Hit this issue in another project for another client. Forced to disable SocketsHttpHandler. :/
So practically, 1-2 affected projects per year so far.
Did you consider updating to .NET Core 3.0 or 3.1 (if you prefer LTS versions)? It is fixed there.
Fair enough. Small numbers. For me, that represents the last two projects I've worked on.
I'm not making the call on version for this one. And the workaround is easy enough once you finally figure out (remember) where the problem lies. Just seems weird to leave the bug there.
Isn't every bug weird to have? That is their definitions - they are bugs, unexpected behaviors. Fixing is usually done based on wide-spread impact.
You can use existence of this bug as reason to upgrade to newer version - take it up with decision makers. If they don't care ... the bug is likely not such high priority for them, or get info from them why they cannot upgrade.
I guess it's partly frustration on my part. Both projects have burnt many hours over days going back and forth with infrastructure people to try to get the magic soup of SPNs, etc right for Kerberos to work in their environment. Something that certainly isn't my specialty and in these cases don't have the level of access to directly tinker with myself.
Bugs like this one are fun because they don't (to me, at least) immediately point to the code. Authentication just doesn't work. So you go back and beg people to triple-check SPNs and firewalls and who-knows-what-else to figure out what part of the environment is not configured right. Until someone finally ran across this issue in GitHub, it never occurred to me that there might be a new network handler in play (by default) that just didn't do the same thing that the old one did.
Anyway, the bug has been dealt with. It just requires upgrading. Or a workaround if that is not feasible.
Most helpful comment
fyi. I'll be OOF for about a week+. So, I'll be submitting the PR for this fix as soon as I get back.