Httpx: Key headers are lowercased before sent ?

Created on 18 Nov 2019 · 11Comments · Source: encode/httpx

Is it a normal behavior ?
From RFC2616, headers are case-insensitive ... but, why httpx/httpcore always lowercase them ?

It could be problematic for server/proxy/tools that don't respect rfc2616, and want case sensitive headers ... no ?

I understand the lowercasing for http/2. But for http/1 ... no ;-(

external htt1.1 interop requests-compat

Source

manatlan

Most helpful comment

I've come here on suggestion of florimond to explain a possible "specific use case" issue with h11's case lowering of headers when sending HTTP/1.1 requests through HTTPX:

The issue is linked to Cloudflare, the popular and widely used service that delivers _(as wikipedia puts it)_ content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.

Now Cloudflare, through its security services, will try to block bots and suspicious connections from accessing a website by returning a HTTP 403 Forbidden with a captcha challenge, while allowing regular web browsers to pass through without much of a hassle.
Now this security seems to be reliant on mostly two criterias:

The request's IP (as the website's admin can specify which countries should be considered suspicious and challenged with a captcha)
The request's headers (including most importantly the User-Agent)

Now you may see where I'm going here, but as it turns out Cloudflare does NOT respect RFC 2616 - "Hypertext Transfer Protocol -- HTTP/1.1", Section 4.2, "Message Headers" that states

Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.

Indeed, Cloudflare requires field names to be properly capitalized.

And so from here stems the issue with h11's case lowering of headers. When trying to connect to a website with HTTPX, if that website uses cloudflare's security, it will be met with a HTTP 403 Forbidden response and a captcha challenge, no matter the headers, no matter the IP. Here is a simple code example when trying to send a GET request to a website that uses cloudflare:

import trio
import httpx
from collections import OrderedDict

# Here we made sure to delcare proper ordered and capitalized headers to avoid triggering Cloudflare's antibot security
headers = OrderedDict({'Accept-Encoding': 'identity','Host': 'grimaldis.myguestaccount.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0', 'Connection': 'close'})
async def asks_worker():
    async with httpx.AsyncClient(trust_env=True, headers=headers) as s:
        r = await s.get('https://grimaldis.myguestaccount.com/guest/accountlogin')
        print(r.status_code)
        print(r.text)
async def run_task():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(asks_worker)
trio.run(run_task)

When we look at the traceback of HTTPX's request:

TRACE [2020-07-09 18:56:54] httpcore._async.http11 - send_request method=b'GET' url=(b'https', b'grimaldis.myguestaccount.com', 443, b'/guest/accountlogin') headers=[(b'accept', b'*/*'), (b'accept-encoding', b'identity'), (b'host', b'grimaldis.myguestaccount.com'), (b'user-agent', b'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'), (b'connection', b'close')]

DEBUG [2020-07-09 18:56:54] httpx._client - HTTP Request: GET https://grimaldis.myguestaccount.com/guest/accountlogin "HTTP/1.1 403 Forbidden"

We can see that by lower-casing the set headers' field names, the request is met with that infamous 403 Forbidden response.

This is a behaviour that is NOT replicated by other httplibs such as urllib.request:

import urllib.request
# Here the headers are lowercased to try and mimick the headers sent by h11
headers = {'accept-encoding': 'identity','host': 'grimaldis.myguestaccount.com', 'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0', 'connection': 'close'}
request = urllib.request.Request("https://grimaldis.myguestaccount.com/guest/accountlogin", headers=headers)
r = urllib.request.urlopen(request).read()
print(r.decode('utf-8'))

In the example above, we've tried sending a request to the same url with same header that we've purposefully lowercased to try and mimick h11's behaviour, however when we look at the raw request:

send: b'GET /guest/accountlogin HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: grimaldis.myguestaccount.com\r\nUser-Agent: mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0\r\nConnection: close\r\n\r\n'

reply: 'HTTP/1.1 200 OK\r\n'

We notice that urllib.request will automatically capitalize the headers' field names, and by such will be met with a 200 OK response (meaning it did not trigger the cloudflare bot).

In conclusion, the fact that HTTPX and h11 do not allow the capitalization of a header's field name leads to an incompatibility between HTTPX and any website that may use Cloudflare's security, meaning that any HTTP/1.1 request from HTTPX to a "cloudflare-enabled" website will end up in the said request being blocked by Cloudflare's security.

Spyder-exe on 9 Jul 2020

👍5

All 11 comments

It could be problematic for server/proxy/tools that don't respect rfc2616

It's feasible that we could alter this behavior if we'd determined that in real-world usage it was problematic. Have you observed an issue with some specific tooling related to this?

Any idea what urllib3 / requests behavior is wrt. preserving header casing?

tomchristie on 18 Nov 2019

Sadly, I don't know any real-world usage with that problem, but I'm pretty sure : that should exist ;-(

At my work, in our intranet : I know (at least) one ...

BTW, I edit a tool, based on httpx (httpcore) ... and I will need to find another http async lib ;-( ... which send headers1.X as is ;-(

manatlan on 18 Nov 2019

BTW, requests seems to be ok

import asyncio
import httpx
import requests

asynk=httpx.AsyncClient()

url="http://headers.jsontest.com/"
myheaders={"idSPECIAL":"yo"}

r=requests.get(url,headers=myheaders)
print(r.content.decode())


async def test():
    r=await asynk.get(url,headers=myheaders)
    print(r.content.decode())

asyncio.run( test() )

will display :

{
   "X-Cloud-Trace-Context": "e9c4e43c5980315155c2537d6e594d83/11304935074059934945",
   "Accept": "*/*",
   "idSPECIAL": "yo",
   "User-Agent": "python-requests/2.22.0",
   "Host": "headers.jsontest.com"
}

{
   "X-Cloud-Trace-Context": "c6119f95ab6f5e65fae2a10b5dcbea84/7193114510141344900",
   "host": "headers.jsontest.com",
   "idspecial": "yo",
   "user-agent": "python-httpx/0.7.7",
   "accept": "*/*"
}

manatlan on 18 Nov 2019

here is a reqman's test
Now, it's ok ... since version 2.2.0, it uses aiohttp in place of httpx.
(hope to come back on httpx, when it will be more mature)

manatlan on 18 Nov 2019

aiohttp does the job too ! (I will go with this one !)

import aiohttp

url="http://headers.jsontest.com/"
myheaders={"idSPECIAL":"yo"}

async def test():
    async with aiohttp.ClientSession() as session:
        r=await session.get(url,headers=myheaders)
        t=await r.content.read()
        print(t.decode())

asyncio.run( test() )

manatlan on 18 Nov 2019

Any idea what urllib3 / requests behavior is wrt. preserving header casing?

Did a bit of digging:

There was a similar issue on the Requests repo, opened in 2013: https://github.com/psf/requests/issues/1561. It was eventually attributed to urllib3, which used to make response headers case-insensitive too. The urllib3 issue (https://github.com/urllib3/urllib3/issues/236) was opened in 2013, and solved some time in 2017, though it's unclear when/how it was solved exactly. The rationale was also that some services don't properly comply with RFC2616.

Anyway, it seems urllib3 does not normalize case on response headers — unless we replace them once the response was received, e.g. res.headers["KeyThatWillBeLowerCased"] = "value".

florimondmanca on 19 Nov 2019

Dropping the bug label from this, since our behavior is absolutely within spec here.
Tho from a UX perspective I'd be happy enough to see header casing preserved for HTTP/1.1.
I don't recall if h11 lowercases both incoming and outgoing headers?

tomchristie on 28 Nov 2019

👎2

Some research led me to discover that this is strictly following the HTTP/2 spec[1] for headers, whereas the HTTP/1.x spec[2] is case-insensitive and should work with normalized header keys, some do not, and expect specific casing for the headers.

I considered for about 5 minutes adding an http2 flag to the Headers object and removing the .lower()ing on everything switched on that, but then realized it would take more work to keep it a case-insensitive multi-dict. Cribbing from requests' CaseInsensitiveDict would probably be a good start, using .lower_items for http2 requests and regular items() for http/1.x.

[1] https://tools.ietf.org/html/rfc7540#section-8.1.2
[2] https://tools.ietf.org/html/rfc2616#section-4.2

StephenBrown2 on 6 Jan 2020

❤2

I've come here on suggestion of florimond to explain a possible "specific use case" issue with h11's case lowering of headers when sending HTTP/1.1 requests through HTTPX:

The request's IP (as the website's admin can specify which countries should be considered suspicious and challenged with a captcha)
The request's headers (including most importantly the User-Agent)

Now you may see where I'm going here, but as it turns out Cloudflare does NOT respect RFC 2616 - "Hypertext Transfer Protocol -- HTTP/1.1", Section 4.2, "Message Headers" that states

Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.

Indeed, Cloudflare requires field names to be properly capitalized.

import trio
import httpx
from collections import OrderedDict

# Here we made sure to delcare proper ordered and capitalized headers to avoid triggering Cloudflare's antibot security
headers = OrderedDict({'Accept-Encoding': 'identity','Host': 'grimaldis.myguestaccount.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0', 'Connection': 'close'})
async def asks_worker():
    async with httpx.AsyncClient(trust_env=True, headers=headers) as s:
        r = await s.get('https://grimaldis.myguestaccount.com/guest/accountlogin')
        print(r.status_code)
        print(r.text)
async def run_task():
    async with trio.open_nursery() as nursery:
        nursery.start_soon(asks_worker)
trio.run(run_task)

When we look at the traceback of HTTPX's request:

TRACE [2020-07-09 18:56:54] httpcore._async.http11 - send_request method=b'GET' url=(b'https', b'grimaldis.myguestaccount.com', 443, b'/guest/accountlogin') headers=[(b'accept', b'*/*'), (b'accept-encoding', b'identity'), (b'host', b'grimaldis.myguestaccount.com'), (b'user-agent', b'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'), (b'connection', b'close')]

DEBUG [2020-07-09 18:56:54] httpx._client - HTTP Request: GET https://grimaldis.myguestaccount.com/guest/accountlogin "HTTP/1.1 403 Forbidden"

We can see that by lower-casing the set headers' field names, the request is met with that infamous 403 Forbidden response.

This is a behaviour that is NOT replicated by other httplibs such as urllib.request:

import urllib.request
# Here the headers are lowercased to try and mimick the headers sent by h11
headers = {'accept-encoding': 'identity','host': 'grimaldis.myguestaccount.com', 'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0', 'connection': 'close'}
request = urllib.request.Request("https://grimaldis.myguestaccount.com/guest/accountlogin", headers=headers)
r = urllib.request.urlopen(request).read()
print(r.decode('utf-8'))

In the example above, we've tried sending a request to the same url with same header that we've purposefully lowercased to try and mimick h11's behaviour, however when we look at the raw request:

send: b'GET /guest/accountlogin HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: grimaldis.myguestaccount.com\r\nUser-Agent: mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0\r\nConnection: close\r\n\r\n'

reply: 'HTTP/1.1 200 OK\r\n'

We notice that urllib.request will automatically capitalize the headers' field names, and by such will be met with a 200 OK response (meaning it did not trigger the cloudflare bot).

Spyder-exe on 9 Jul 2020

👍5

Thanks @Spyder-exe, that one's a really useful data point.

tomchristie on 13 Jul 2020

👍1

I've opened a proposal in h11 for this... https://github.com/python-hyper/h11/pull/103

On our side this would mean:

Header casing would be preserved over-the-wire.
The Header model would continue to be case-insensitive, but headers.raw would provide access to the raw bytewise name/value pairs for cases such as terminal or debug output that wishes to preserve the casing information.

tomchristie on 25 Aug 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Finalising the Transport API for 1.0.

tomchristie · 4Comments

More explicit documentation that `httpx.Client` is equivelent to `requests.Session`.

innawe · 3Comments

Supporting `Response(content=..., text=..., html=..., json=...)`

tomchristie · 3Comments

Test case `test_write_timeout` fails

kde713 · 3Comments

Support multiple background tasks in BackgroundManager

florimondmanca · 3Comments