Is it a normal behavior ?
From RFC2616, headers are case-insensitive ... but, why httpx/httpcore always lowercase them ?
It could be problematic for server/proxy/tools that don't respect rfc2616, and want case sensitive headers ... no ?
I understand the lowercasing for http/2. But for http/1 ... no ;-(
It could be problematic for server/proxy/tools that don't respect rfc2616
It's feasible that we could alter this behavior if we'd determined that in real-world usage it was problematic. Have you observed an issue with some specific tooling related to this?
Any idea what urllib3 / requests behavior is wrt. preserving header casing?
Sadly, I don't know any real-world usage with that problem, but I'm pretty sure : that should exist ;-(
At my work, in our intranet : I know (at least) one ...
BTW, I edit a tool, based on httpx (httpcore) ... and I will need to find another http async lib ;-( ... which send headers1.X as is ;-(
BTW, requests seems to be ok
import asyncio
import httpx
import requests
asynk=httpx.AsyncClient()
url="http://headers.jsontest.com/"
myheaders={"idSPECIAL":"yo"}
r=requests.get(url,headers=myheaders)
print(r.content.decode())
async def test():
r=await asynk.get(url,headers=myheaders)
print(r.content.decode())
asyncio.run( test() )
will display :
{
"X-Cloud-Trace-Context": "e9c4e43c5980315155c2537d6e594d83/11304935074059934945",
"Accept": "*/*",
"idSPECIAL": "yo",
"User-Agent": "python-requests/2.22.0",
"Host": "headers.jsontest.com"
}
{
"X-Cloud-Trace-Context": "c6119f95ab6f5e65fae2a10b5dcbea84/7193114510141344900",
"host": "headers.jsontest.com",
"idspecial": "yo",
"user-agent": "python-httpx/0.7.7",
"accept": "*/*"
}
here is a reqman's test
Now, it's ok ... since version 2.2.0, it uses aiohttp in place of httpx.
(hope to come back on httpx, when it will be more mature)
aiohttp does the job too ! (I will go with this one !)
import aiohttp
url="http://headers.jsontest.com/"
myheaders={"idSPECIAL":"yo"}
async def test():
async with aiohttp.ClientSession() as session:
r=await session.get(url,headers=myheaders)
t=await r.content.read()
print(t.decode())
asyncio.run( test() )
Any idea what urllib3 / requests behavior is wrt. preserving header casing?
Did a bit of digging:
There was a similar issue on the Requests repo, opened in 2013: https://github.com/psf/requests/issues/1561. It was eventually attributed to urllib3, which used to make response headers case-insensitive too. The urllib3 issue (https://github.com/urllib3/urllib3/issues/236) was opened in 2013, and solved some time in 2017, though it's unclear when/how it was solved exactly. The rationale was also that some services don't properly comply with RFC2616.
Anyway, it seems urllib3 does not normalize case on response headers — unless we replace them once the response was received, e.g. res.headers["KeyThatWillBeLowerCased"] = "value".
Dropping the bug label from this, since our behavior is absolutely within spec here.
Tho from a UX perspective I'd be happy enough to see header casing preserved for HTTP/1.1.
I don't recall if h11 lowercases both incoming and outgoing headers?
Some research led me to discover that this is strictly following the HTTP/2 spec[1] for headers, whereas the HTTP/1.x spec[2] is case-insensitive and should work with normalized header keys, some do not, and expect specific casing for the headers.
I considered for about 5 minutes adding an http2 flag to the Headers object and removing the .lower()ing on everything switched on that, but then realized it would take more work to keep it a case-insensitive multi-dict. Cribbing from requests' CaseInsensitiveDict would probably be a good start, using .lower_items for http2 requests and regular items() for http/1.x.
[1] https://tools.ietf.org/html/rfc7540#section-8.1.2
[2] https://tools.ietf.org/html/rfc2616#section-4.2
I've come here on suggestion of florimond to explain a possible "specific use case" issue with h11's case lowering of headers when sending HTTP/1.1 requests through HTTPX:
The issue is linked to Cloudflare, the popular and widely used service that delivers _(as wikipedia puts it)_ content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.
Now Cloudflare, through its security services, will try to block bots and suspicious connections from accessing a website by returning a HTTP 403 Forbidden with a captcha challenge, while allowing regular web browsers to pass through without much of a hassle.
Now this security seems to be reliant on mostly two criterias:
The request's IP (as the website's admin can specify which countries should be considered suspicious and challenged with a captcha)
The request's headers (including most importantly the User-Agent)
Now you may see where I'm going here, but as it turns out Cloudflare does NOT respect RFC 2616 - "Hypertext Transfer Protocol -- HTTP/1.1", Section 4.2, "Message Headers" that states
Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.
Indeed, Cloudflare requires field names to be properly capitalized.
And so from here stems the issue with h11's case lowering of headers. When trying to connect to a website with HTTPX, if that website uses cloudflare's security, it will be met with a HTTP 403 Forbidden response and a captcha challenge, no matter the headers, no matter the IP. Here is a simple code example when trying to send a GET request to a website that uses cloudflare:
import trio
import httpx
from collections import OrderedDict
# Here we made sure to delcare proper ordered and capitalized headers to avoid triggering Cloudflare's antibot security
headers = OrderedDict({'Accept-Encoding': 'identity','Host': 'grimaldis.myguestaccount.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0', 'Connection': 'close'})
async def asks_worker():
async with httpx.AsyncClient(trust_env=True, headers=headers) as s:
r = await s.get('https://grimaldis.myguestaccount.com/guest/accountlogin')
print(r.status_code)
print(r.text)
async def run_task():
async with trio.open_nursery() as nursery:
nursery.start_soon(asks_worker)
trio.run(run_task)
When we look at the traceback of HTTPX's request:
TRACE [2020-07-09 18:56:54] httpcore._async.http11 - send_request method=b'GET' url=(b'https', b'grimaldis.myguestaccount.com', 443, b'/guest/accountlogin') headers=[(b'accept', b'*/*'), (b'accept-encoding', b'identity'), (b'host', b'grimaldis.myguestaccount.com'), (b'user-agent', b'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0'), (b'connection', b'close')]
DEBUG [2020-07-09 18:56:54] httpx._client - HTTP Request: GET https://grimaldis.myguestaccount.com/guest/accountlogin "HTTP/1.1 403 Forbidden"
We can see that by lower-casing the set headers' field names, the request is met with that infamous 403 Forbidden response.
This is a behaviour that is NOT replicated by other httplibs such as urllib.request:
import urllib.request
# Here the headers are lowercased to try and mimick the headers sent by h11
headers = {'accept-encoding': 'identity','host': 'grimaldis.myguestaccount.com', 'user-agent': 'mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0', 'connection': 'close'}
request = urllib.request.Request("https://grimaldis.myguestaccount.com/guest/accountlogin", headers=headers)
r = urllib.request.urlopen(request).read()
print(r.decode('utf-8'))
In the example above, we've tried sending a request to the same url with same header that we've purposefully lowercased to try and mimick h11's behaviour, however when we look at the raw request:
send: b'GET /guest/accountlogin HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: grimaldis.myguestaccount.com\r\nUser-Agent: mozilla/5.0 (windows nt 10.0; win64; x64; rv:77.0) gecko/20100101 firefox/77.0\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
We notice that urllib.request will automatically capitalize the headers' field names, and by such will be met with a 200 OK response (meaning it did not trigger the cloudflare bot).
In conclusion, the fact that HTTPX and h11 do not allow the capitalization of a header's field name leads to an incompatibility between HTTPX and any website that may use Cloudflare's security, meaning that any HTTP/1.1 request from HTTPX to a "cloudflare-enabled" website will end up in the said request being blocked by Cloudflare's security.
Thanks @Spyder-exe, that one's a really useful data point.
I've opened a proposal in h11 for this... https://github.com/python-hyper/h11/pull/103
On our side this would mean:
headers.raw would provide access to the raw bytewise name/value pairs for cases such as terminal or debug output that wishes to preserve the casing information.
Most helpful comment
I've come here on suggestion of florimond to explain a possible "specific use case" issue with h11's case lowering of headers when sending HTTP/1.1 requests through HTTPX:
The issue is linked to Cloudflare, the popular and widely used service that delivers _(as wikipedia puts it)_ content-delivery-network services, DDoS mitigation, Internet security, and distributed domain-name-server services.
Now Cloudflare, through its security services, will try to block bots and suspicious connections from accessing a website by returning a HTTP 403 Forbidden with a captcha challenge, while allowing regular web browsers to pass through without much of a hassle.
Now this security seems to be reliant on mostly two criterias:
The request's IP (as the website's admin can specify which countries should be considered suspicious and challenged with a captcha)
The request's headers (including most importantly the User-Agent)
Now you may see where I'm going here, but as it turns out Cloudflare does NOT respect RFC 2616 - "Hypertext Transfer Protocol -- HTTP/1.1", Section 4.2, "Message Headers" that states
Indeed, Cloudflare requires field names to be properly capitalized.
And so from here stems the issue with h11's case lowering of headers. When trying to connect to a website with HTTPX, if that website uses cloudflare's security, it will be met with a HTTP 403 Forbidden response and a captcha challenge, no matter the headers, no matter the IP. Here is a simple code example when trying to send a GET request to a website that uses cloudflare:
When we look at the traceback of HTTPX's request:
We can see that by lower-casing the set headers' field names, the request is met with that infamous 403 Forbidden response.
This is a behaviour that is NOT replicated by other httplibs such as urllib.request:
In the example above, we've tried sending a request to the same url with same header that we've purposefully lowercased to try and mimick h11's behaviour, however when we look at the raw request:
We notice that urllib.request will automatically capitalize the headers' field names, and by such will be met with a 200 OK response (meaning it did not trigger the cloudflare bot).
In conclusion, the fact that HTTPX and h11 do not allow the capitalization of a header's field name leads to an incompatibility between HTTPX and any website that may use Cloudflare's security, meaning that any HTTP/1.1 request from HTTPX to a "cloudflare-enabled" website will end up in the said request being blocked by Cloudflare's security.