KestrelHttpServer 🚀 - Configure header parsing to allow non-compliant headers

In practice I expect this to become an FAQ where most people turn on UTF-8, so we should consider making that the default.

Have we tested how these headers forward through IIS? WebListener?

Tratcher on 4 Oct 2016

I like how we say "We" Nope, please try it :p

blowdart on 4 Oct 2016

@halter73 please work with @cesarbs, based on timing this may be load balanced to you.

cc @davidfowl

muratg on 7 Oct 2016

A big change (essentially a rewrite in header processing) so moving to 1.2.0.

@Tratcher could you file a corresponding bug on WebListener repo?

muratg on 31 Oct 2016

Should also fix: https://github.com/aspnet/KestrelHttpServer/issues/1125

muratg on 31 Oct 2016

Investigating this. Will check how those headers forward through IIS, and also what other servers do.

cesarblum on 17 Jan 2017

Here's what I found so far:

Server | Behavior
-- | --
IIS | accepts UTF-8 in header value, don't know if immediatelly decoded
IIS running ASP.NET 4 app | accepts UTF-8, decodes it as such
IIS with ANCM | rejects, not yet sure why, but the request is not forwarded at all
WebListener | accepts UTF-8, decodes it as such
nginx | accepts non-ASCII, haven't checked yet what it does with it
node.js | accepts non-ASCII, it's up to the app to decode it
Apache |accepts non-ASCII, it's up to the app to decode it

A relevant bit from the RFC:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding.  In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets.  A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

I think the most relevant part here is:

A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

So it's not forbidding chars above 0x7F. obs-text is actually defined in the next section as

obs-text       = %x80-FF

The most correct behavior seems to be to accept characters in the 0x80 - 0xFF range (which we reject at present), but to let the app decide the encoding. Http.sys appears to deviate from this though, by decoding as UTF-8.

cesarblum on 18 Jan 2017

For reference, which header did you test? You may get different results for a common header vs a custom header. E.g. Host and Location are often special cased.

Tratcher on 18 Jan 2017

How does Apache leave it up to the app to decode it? Does it expose the raw header bytes?

Tratcher on 18 Jan 2017

@Tratcher I was testing with Referer, as in #1125.

I tested Apache with a PHP app, which saw the header as raw bytes. You get a "string" for it, but I'd have to manually decode it as UTF-8 to get the right chars.

cesarblum on 18 Jan 2017

@blowdart We're currently rejecting if the header is UTF8, so I don't think this has security implications. Moving to 2.1.0

muratg on 7 Apr 2017

Won't this still reject UTF8 cookies? if so that's a problem

blowdart on 7 Apr 2017

FYI if you care SNI says the host is utf8

https://tools.ietf.org/html/rfc3546#section-3.1

The hostname is represented as a byte
string using UTF-8 encoding [UTF8], without a trailing dot.

Now a good server should probably check the host header and SNI match, might be impossible if you can't have a utf8 header

There is even a spec on how to deal with hostnames

https://tools.ietf.org/html/rfc3490#section-4

But a lot is left as an exercise for the reader.

Drawaes on 7 Apr 2017

Won't this still reject UTF8 cookies? if so that's a problem

Why is that a problem? We've had 2 complaints overall so far.

davidfowl on 7 Apr 2017

If you provide service for customers, and they allowed to insert javascript,
and If it accidentally inserts a unicode character.
then their customers will have problems forever.

critical problem. please hot fix

vinhhrv on 4 Aug 2017

👍6

IE doesn't encode the referer header. As a result, this breaks our .NET Core APIs when requests are coming from IE users on some of our sites with e.g. Cyrillic characters in the URL. Would be good to have support for this.

jlandersen on 1 Sep 2017

@davidfowl I think you are not getting many complaints because it takes so long to track down what the issue is. Because many different applications might set cookies with utf-8 characters on the root of the domain (.example.com) and thus some developers might blame it on user error because that encoding error is not popping up in the logs and may be hard to reproduce.

KLuuKer on 4 Sep 2017

We could really use this, we have a site running on a subdomain (site.bigcompany.com) and user navigate to our site through our company site (bigcompany.com). Our company site uses cookies with non-ascii characters that are also sent to the subdomains and causes users that access our site through our company site to receive bad requests. We're now working with the team from our company site to have them fix their cookies.

I believe this is something that can also be misused for a DOS-attacks. If you manage to set a cookie in the users browser then your site (running ANC) will not function anymore.

avaneerd on 4 Sep 2017

As other mentions, the error might originate from javascript cookies and not your own code, so you might not able to correct the error.

In our situation Google Analytics sat the malformed/utf-8 cookie.

Our marketing department used the querystring utm_campaign to name their campaigns (containing danish characters), so when they made an url to our site, and posted it on facebook or other places, all users that clicked that link was no longer able to view our website after the first page view. Only solution was to clear cookies (or wait the 6 months until cookie expiration) - but not many users told us we had a problem, they just went on to our competitor instead.

User clicks link http://website/page?utm_campaign=MinKampagneUndersøgelse
Page loads and user sees it fine
Google Analytics kicks in on the client side and sees the utm_campaign querystring and stores a __utmz cookie on our domain containing the utf-8 string
Next pageload by the user is unable to process because of the utf-8 cookie and the user sees a blank page with no error.

It took us ages to track down, because the cookie was actually sat on a parent domain and not the kestrel site (which, at the time, was running on a subdomain). And it was not even sat by our own code, but third party javascript, so searching our codebase for the cookie name came out with 0 results.

Since we discovered the error, a newer version of Google Analytics has changed behaviour of storing all their data at their side and only store a user-id cookie on the client side, so the malformed cookie is no longer set. See this for explanation: https://stackoverflow.com/questions/18604715/google-analytics-missing-utmz-cookie

I think you might be able to reproduce the error if you are using the old Google Analytics scripts (ga.js) on your kestrel website (instead of the newer analytics.js), by simply loading a url with a utm_campaign querystring containing international characters (we have switched to the newer version, so can't test it on my own anymore).

It would be great to at least have some kind of option to ignore utf8 cookies, so we at least can get around it by programming instead of loosing users forever.

netranger on 20 Sep 2017

We also had this problem, which was extremely difficult to narrow down to being this. We use shibboleth which is injecting request headers into our application. One of our staff had í in their name, which just resulted in them getting the 400 error and us being confused for a long time.

TommyRush on 16 Oct 2017

👍1

I also had an issue reported in context of Server-Timing header. The definition of this header allows using quoted-string for server-timing-param-value and quoted-string allows broader set of characters:

quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
obs-text = %x80-FF

So this header would require an option of relaxing the ascii-only check also for responses.

tpeczek on 14 Nov 2017

rfc7230

3.2.6.  Field Value Components

   Most HTTP header field values are defined using common syntax
   components (token, quoted-string, and comment) separated by
   whitespace or specific delimiting characters.  Delimiters are chosen
   from the set of US-ASCII visual characters not allowed in a token
   (DQUOTE and "(),/:;<=>?@[\]{}").

     token          = 1*tchar

     tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                    / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                    / DIGIT / ALPHA
                    ; any VCHAR, except delimiters

   A string of text is parsed as a single value if it is quoted using
   double-quote marks.

     quoted-string  = DQUOTE *( qdtext / quoted-pair ) DQUOTE
     qdtext         = HTAB / SP /%x21 / %x23-5B / %x5D-7E / obs-text
*    obs-text       = %x80-FF

   Comments can be included in some HTTP header fields by surrounding
   the comment text with parentheses.  Comments are only allowed in
   fields containing "comment" as part of their field value definition.

     comment        = "(" *( ctext / quoted-pair / comment ) ")"
     ctext          = HTAB / SP / %x21-27 / %x2A-5B / %x5D-7E / obs-text

   The backslash octet ("\") can be used as a single-octet quoting
   mechanism within quoted-string and comment constructs.  Recipients
   that process the value of a quoted-string MUST handle a quoted-pair
   as if it were replaced by the octet following the backslash.

     quoted-pair    = "\" ( HTAB / SP / VCHAR / obs-text )

   A sender SHOULD NOT generate a quoted-pair in a quoted-string except
   where necessary to quote DQUOTE and backslash octets occurring within
   that string.  A sender SHOULD NOT generate a quoted-pair in a comment
   except where necessary to quote parentheses ["(" and ")"] and
   backslash octets occurring within that comment.

However

3.2.4.  Field Parsing
...
   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
*  Newly defined header fields SHOULD limit their field values to
*  US-ASCII octets.  A recipient SHOULD treat other octets in field
*  content (obs-text) as opaque data.

As string is UTF-16; that would suggest the correct approach would be to reject 0x00 and simple widen all other chars converting ISO-8859-1 -> UTF-16 - which also mean any UTF-8 outside the ASCII range would not be interpreted correctly?

benaadams on 14 Nov 2017

e.g. treat opaque data as 8 byte data, converting (byte) 0xDD -> (char)0x00 0xDD

benaadams on 14 Nov 2017

@muratg you said

Moving to 2.1.0

it's still attached to the backlog milestone while a 2.1.0 milestone exists
is this going to get fixed in 2.1.0? or not?

KLuuKer on 7 Dec 2017

@KLuuKer bringing this back to triage.

muratg on 7 Dec 2017

👍3

@shirhatti @DamianEdwards what are your thoughts on this one?

muratg on 11 Jan 2018

Backlogging this, no work planned in 2.x.

muratg on 14 Feb 2018

Do you have any timeline on this? For now, I need to look for an alternative server since I cannot use my API from JavaScript in IE when a URL contains a non-ASCII characters. Every http-request contains those characters in the Referer header and the server returns 400 (bad error).

jusper-dk on 24 Apr 2018

Let's investigate this for 2.2

DamianEdwards on 24 Apr 2018

🎉2

My issue was related to a cookie set on the client and getting sent to server. Had this issue with not encoding cookies on the server, and they were set to empty. If the cookie is set on the client side, and posted, it shouldnt give a Bad Request page with no way to handle it in code. Another gotcha with cookies in aspnet core.

joetherod on 26 Apr 2018

I have a similar problem. Our ASP.NET Core applications runs behind a "kind of" reverse proxy, which adds additional headers (for whatever purpose). These headers can contain german umlauts (e.g. "ö"), which lead to "Malformed request: invalid headers." and the requests ends immediately (400).

Is there any workaround to still serve these requests?

axelheer on 14 May 2018

No, there's no workaround when the request can't be decoded like this.

Tratcher on 14 May 2018

These headers can contain german umlauts (e.g. "ö"), which lead to "Malformed request: invalid headers." and the requests ends immediately (400).

You can url encode the header? e.g. ö is %C3%B6

benaadams on 14 May 2018

https://www.punycoder.com/ 😄

davidfowl on 14 May 2018

@benaadams the application setting these headers isn't under my control; I could decode anything. 🤷‍♂️

But I'm going to ask its vendor; getting an answer there is generally a painful process and often not very helpful, though.

axelheer on 14 May 2018

I am facing the same problem as @TommyRush. I am using Shibboleth which injects headers. I neither have control over the Shibboleth installation nor over the dataset used by the shibboleth installation. So now MOST users can work without any problems, BUT the poor ones with umlauts in their name or other data can't do anything.

Would appreciate this being fixed soon.

My Application is running behind IIS, is there any way IIS could parse the headers and "fix" them?

@TommyRush did you ever find a workaround?

DaBeSoft on 2 Jul 2018

@amrmahdi you can check your logs for info level logs of the form Connection id "<ID>" bad request data: "Invalid request target: <Invalid Chars>" and Connection id "<ID>" bad request data: "Malformed request: invalid headers."

halter73 on 14 Jul 2018

👍1

Since RFC 5987 required supporting ISO-8859-1, which in RFC (which obsoletes 5987) the requirement was removed, although it was encouraged to support ISO-8859-1 for backward compatibility, why did Kestrel not support ISO-8859-1?

Also given that HttpConnection on corefx allows it, isn't it strange not to be supported by the server ?

amrmahdi on 16 Jul 2018

We're going to drop the entire header if the header value not a valid UTF8 sequence.

davidfowl on 19 Jul 2018

@DaBeSoft I did, but it's outside of IIS, which I don't think helps you. Shibboleth can URL encode the values. This of course means all of our applications have to be aware and decode any values, which also caused a lot of problems.

In shibboleth2.xml we added encoding=URL like this

TommyRush on 19 Jul 2018

We're going to drop the entire header if the header value not a valid UTF8 sequence.

Do you mean ASCII? UTF8 has never been a valid header encoding for HTTP1.x. Closest is treating as opaque bytes, but don't think that helps with them as string

benaadams on 19 Jul 2018

Do you mean ASCII? UTF8 has never been a valid header encoding for HTTP1.x. Closest is treating as opaque bytes, but don't think that helps with them as string

We know, but people keep insisting on sending us UTF-8 headers. Some servers allow it.

Tratcher on 19 Jul 2018

Do you mean ASCII? UTF8 has never been a valid header encoding for HTTP1.x. Closest is treating as opaque bytes, but don't think that helps with them as string

We know, but people keep insisting on sending us UTF-8 headers. Some servers allow it.

Do they; or are they sending Latin1/IEC 8859-1 as called out by the spec

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].

For example \xD6\xD0\xCE\xC4 from https://github.com/aspnet/KestrelHttpServer/issues/2647 in UTF8 this is an invalid code sequence resulting in �� whereas in Latin1/ISO-8859-1 this is ÖÐÎÄ

benaadams on 20 Jul 2018

Not sure ÖÐÎÄ makes much more sense, but �� is a one way transform to jibberish and would fail the UTF8 test

benaadams on 20 Jul 2018

Looking at the various encodings:

// UTF8
Encoding.UTF8.GetString(new byte[] {0xD6,0xD0,0xCE,0xC4})
// Output: ����

```csharp
// UTF16LE
Encoding.Unicode.GetString(new byte[] {0xD6,0xD0,0xCE,0xC4})
// Output: 탖쓎

```csharp
// Widen to UTF16 (i.e. Latin1)
new string(new char[] {(char)0xD6,(char)0xD0,(char)0xCE,(char)0xC4})
// Output: ÖÐÎÄ

benaadams on 20 Jul 2018

@davidfowl you say

We're going to drop the entire header if the header value not a valid UTF8 sequence.

But what about the case when some ~~stupid~~ piece of script (usually not changeable because of ~~even more stupid~~ reasons) is inserting incorrect values in the cookies?

sometimes people don't care about the incorrect headers, but we still are going to have to need that cookie

KLuuKer on 20 Jul 2018

There's no way to represent that data so the choices are:

Drop the header
Garble the value
Reject the request.

davidfowl on 21 Jul 2018

for most headers I would choose: drop the header
for some selective headers (like cookie) I would choose: garble the value, and maybe have some way of trying to get the bits I need parsed out manually (at your own risk)

KLuuKer on 23 Jul 2018

There's no way to represent that data so the choices are ...

Suggesting this should be applied if header is outside the non-printable ascii range, rather than valid utf8; as any encoding outside non-ascii will likely garble the value; as it may not be correct one.

Or... allow an option to specify fallback Encoding to be used when ascii fast-path fails (while still rejecting control codes)

benaadams on 23 Jul 2018

Update on the investigation and recommendations. I've tested the following with the Referer and Cookies headers:

A few notes on the behaviours:

IIS + Managed Handler (ASP.NET 4)
It seems like the managed handler will first attempt to parse all header values as UTF-8 but if that fails, it will fallback to ASCII
IIS + ANCM In-Proc
The encoding is hard-coded as UTF-8 https://github.com/aspnet/HttpSysServer/blob/eba2d7e380fef9f75203bb9a1f4d30827451d512/shared/Microsoft.AspNetCore.HttpSys.Sources/RequestProcessing/HeaderEncoding.cs#L10-L13
IIS + ANCM Out-of-proc
This scenario will go through the WinHttp stack on its way to the out of proc server. During this process what happens will depend on the actual header values instead of only the encoding. Extended ASCII will be passed onto the server correctly. If all the headers are UTF-8 and can be re-encoded as Extended ASCII (for example Franç°isë), the headers will be re-encoded as Extended ASCII and passed onto the server. If all the headers are UTF-8 but they cannot be re-encoded as Extended ASCII (for example 你好世界), the request will fail with a 502.3 from ANCM. If the headers contain a mix of Extended ASCII and UTF-8, there will be no re-encoding attempted and the raw bytes are passed onto the server.
Apache + PHP
PHP may be able to parse different charsets via configurations and utilities as mb-strings. I wasn't able to get it to work in a reasonable amount of time so I didn't verify it as we were more interested in the out-of-box default behaviours here
Node.js
All strings are parsed as opaque Extended ASCII: https://github.com/nodejs/node/issues/17390

Based on the comparisons, here are the recommendations and proposals:

We should, by default, have Kestrel accept and parse headers as UTF-8. UTF-8 encoding is the same as US ASCII for characters 127 and below. In the comparisons above, we are the only server that rejects UTF-8 headers.
We could, by default, fallback to Extended ASCII if UTF-8 encoding fails. This will emulate the Managed Handler behaviour. Extended ASCII is a safe fallback since it will parse each byte using a 8-bit character.
We could consider making the fallback encoding for failed UTF-8 parsing configurable.

These 3 points are independent and I think it's reasonable to start with 1 and 2. We can look into 3 if there is enough demand.

Other alternatives that have been proposed:

expose headers as raw bytes or opaque string
drop headers with non US ASCII characters
reject the request (current behaviours)

But I don't think any of these alternatives are as desirable.

JunTaoLuo on 20 Aug 2018

❤2 👍1

@JunTaoLuo Can you look at response headers as well?

davidfowl on 21 Aug 2018

I was hoping we could address response headers separately. Although related, apps have more control over the response headers whereas they cannot control what request headers are sent by clients.

JunTaoLuo on 22 Aug 2018

❤1

We ran into this issue as well.
Is there any temporary fix to ignore invalid request headers? (our issue is in the Referrer header)

effyteva on 25 Aug 2018

Parsing of request headers with UTF-8 encoded values has been merged and will be available in 2.2.0-preview2.

JunTaoLuo on 31 Aug 2018

🎉5 👍4

I've continued to do some additional investigation in how servers handle header names, path and query strings with non-ascii characters and this is what I found:

Header names with non-ascii characters are almost universally rejected, other than nginx. We should follow the same pattern and continue to reject these requests. There's less consensus in how to treat requests with non-ascii characters in path and query string but these should be URL encoded. I think we can continue to reject these requests too, unless we have a compelling reason to do otherwise.

JunTaoLuo on 5 Sep 2018

👍1

I think it's highly critical to address this issue., at least in term of "removing" non-ascii headers.
Currently requests fail in case any header has non-ascii value, which on many cases can occur due to an external factor, such as referrer pointing to a UTF-8 URL.

effyteva on 5 Sep 2018

@effyteva I have already made the changes to accept UTF-8 encoded characters in header values, which means UTF-8 encoded urls in the Referer header will now be accepted. However, we are still planning to reject requests containing non-ascii characters in the header names as well as path and query string values.

JunTaoLuo on 5 Sep 2018

👍1

@JunTaoLuo sorry, I didn't understand you already committed those changes for the 2.2 release.
Thanks, and keep up the great work!

effyteva on 5 Sep 2018

We decided to not pursue accepting non-ASCII characters in header names, path and query string at this time. If there is compelling reason to enable these scenarios, please file another issue and we will re-prioritize. A follow up issue to address UTF-8 values in response header values has been filed at https://github.com/aspnet/KestrelHttpServer/issues/2884.

JunTaoLuo on 5 Sep 2018

Thanks for this fix. Is there an ETA for 2.2.0 release? @JunTaoLuo

sandeepchauhan on 11 Oct 2018

See the roadmap.

Tratcher on 12 Oct 2018

👍1

We can encode. It is okay but what if we are migrating an old application to aspnetcore that already have too many customers who has the cookie contains non-ascii characters?

For now, there is a workaround if you are using nginx at the front of your aspnetcore application.

To remove all of the non ascii characters from the request header:

server {
    set_by_lua_block $cookie_ascii {
        local cookie = ngx.var.http_cookie
        if cookie == nil or cookie == '' then return cookie end
        local cookie_ascii, n, err = ngx.re.gsub(cookie, "[^\\x00-\\x7F]", "")
        return cookie_ascii
    }

    listen 80;
    server_name example.com;

    location / {
        proxy_pass          http://localhost:5000;
        ...
        proxy_set_header    Cookie $cookie_ascii;
        ...
    }    
}

turgayozgur on 23 Nov 2018

We still suffer from the issue, after upgrading to ASP.NET Core 2.2.
Is there any setting required to enable this?

effyteva on 8 Dec 2018

Comments on closed issues are not tracked, please open a new issue with the details for your scenario.

Tratcher on 8 Dec 2018

Kestrelhttpserver: Configure header parsing to allow non-compliant headers

Most helpful comment

All 64 comments

Related issues