Httpx: Skip HSTS preloading for single-label domains

Created on 16 Jun 2020  路  10Comments  路  Source: encode/httpx

Checklist

  • [x] The bug is reproducible against the latest release and/or master.
  • [x] There are no similar issues or pull requests to fix it yet.

Describe the bug

I run two services in docker-compose. One of them named "app" which is Django application.
During startup, another service makes HTTP call to "app" service:
GET http://app:8000

The request then fails with ConnectionReset error because it was rewritten to HTTPS.

I dug into the issue and found that httpx._client.BaseClient.merge_url checks against hstspreload.in_hsts_preload and, if succeeds, replaces http with https.

Inside of hstspreload.in_hsts_preload I found _GTLD_INCLUDE_SUBDOMAINS constant that includes app value and the check I have mentioned before is always True.

This means, that any request to any host that is listed in _GTLD_INCLUDE_SUBDOMAINS will be rewritten to HTTPS.

To reproduce

Dockerfile, docker-compose.yml:
https://gist.github.com/alex-oleshkevich/6695124d41208c5dc692a30ea817ea82

Expected behavior

The scheme should not be rewritten.

Actual behavior


URL http://app:8000 was changed to https://app:8000 by httpx.

Debugging material

Environment

  • OS: Arch Linux
  • Python version: 3.8
  • HTTPX version: 0.13
  • Async environment: asyncio
  • HTTP proxy: no
  • Custom certificates: no

Additional context

_GTLD_INCLUDE_SUBDOMAINS = {
    b"android",
    b"app",
    b"bank",
    b"chrome",
    b"dev",
    b"foo",
    b"gle",
    b"gmail",
    b"google",
    b"hangout",
    b"insurance",
    b"meet",
    b"new",
    b"page",
    b"play",
    b"search",
    b"youtube",
}
# hstspreload.in_hsts_preload

@functools.lru_cache(maxsize=1024)
def in_hsts_preload(host: typing.AnyStr) -> bool:
    """Determines if an IDNA-encoded host is on the HSTS preload list"""

    if isinstance(host, str):
        host = host.encode("ascii")
    labels = host.lower().split(b".")

    # Fast-branch for gTLDs that are registered to preload all sub-domains.
    if labels[-1] in _GTLD_INCLUDE_SUBDOMAINS:
        return True

# rest omitted
# httpx._client.BaseClient.merge_url
    def merge_url(self, url: URLTypes) -> URL:
        """
        Merge a URL argument together with any 'base_url' on the client,
        to create the URL used for the outgoing request.
        """
        url = self.base_url.join(relative_url=url)
        if url.scheme == "http" and hstspreload.in_hsts_preload(url.host):
            port = None if url.port == 80 else url.port
            url = url.copy_with(scheme="https", port=port)
        return url
bug good first issue user-experience

All 10 comments

Yup, small standalone script to reproduce:

import asyncio
import httpx


async def app(scope, receive, send):
    await send({"type": "http.response.start", "status": 200})
    await send({"type": "http.response.body", "body": b"Hello, world!"})


async def main():
    async with httpx.AsyncClient(app=app) as client:
        r = await client.get("http://app")
        assert r.request.url == "https://app"


asyncio.run(main())

The meat of it being that:

>>> import hstspreload
>>> hstspreload.in_hsts_preload("app")
True

Note that as per #896 allow_redirects=False won't disable HSTS either, so I'm not aware of a workaround for forcibly disabling HSTS. Maybe we need allow_hsts=...?

Removing the bug label since I'm not convinced _we_ are doing doing anything wrong here.

Maybe hstspreload could skip applying HSTS if the labels is of length 1 (eg http:/:app, http://android, etc), but I'm not sure about it at all (don't have much background about how the HSTS preload list should be used). Any case this should probably discussed over its repo.

@florimondmanca It looks like a bug to me because that rewrite was totally unexpected. As name _GTLD_INCLUDE_SUBDOMAINS states, it is about subdomains and I think httpx should check if the host contains a subdomain and then call hstspreload,

Not saying this is not buggy, but I'm not sure it'd be _us_ that should do something, or whether hstspreload is being wrong reporting single-label domains (i.e. w/o subdomains) as needing HSTS if they're one of the subdomains in the list.

Should we perhaps reach out over https://github.com/sethmlarson/hstspreload about this? 鈥ctually I might directly cc @sethmlarson if he's got a quick insight about whether this is expected. :-)

HSTS preloading rules don't require an additional label when includeSubdomain is provided. An example is paypal.com which is on the list with includeSubdomain which means both paypal.com and www.paypal.com are covered by HSTS requirements.

You can tell that browsers (in my case Firefox) have something going on with single-label domain names triggering HSTS as well. If you type this in your URL bar:

  • http://app will be transformed to https://app and fail on DNS.
  • http://brussels will retain it's HTTP when DNS fails, Firefox automatically tries a .com, and an HTTP request is sent to brussels.com. This automatic retry doesn't happen for the above case or any HTTPS URL. (.brussels is a valid gTLD)

But you're right there's definitely a "usability" issue here when it comes to things like Docker networks. Excluding single-label domain names from HSTS preload checking almost certainly isn't a security issue as under all normal circumstances they wouldn't resolve to anything on their own. I don't think this is an issue with the hstspreload library.

Thanks for the clarifications @sethmlarson :)

So I think we can proceed with flagging this as a bug, and the fix is most likely to be to skip HSTS preloading for single-label hostnames (making sure we skip the case of an IP):

I think httpx should check if the host contains a subdomain and then call hstspreload

Is there someone tackling this? If not, I would be happy to work on it 馃檪

I was thinking of checking if there was at least one dot in the hostname, but it sounds a bit naive to me. I'm open for better suggestions 馃檪

I think this kind of issue should make us reconsider whether enforcing HSTS from a server-side client is a sensible thing to do at all. @tomchristie and I chatted privately about this today. From a UX perspective it can lead to ambiguous behaviors (#896) and cases when you really don't want HSTS preload to kick in (like in this case). Some options are: make HSTS preload opt-in, make it opt-out, drop it entirely (?).

But in the meantime, while this discussion is being held, I guess it's still valuable to have a fix in master for this?

The way I'd have gone about it is to check how many parts the hostname contains, yes. There might be a util in urllib to do this, otherwise let's just len(host.split(".")) > 0?

Ok, it sounds also to me that HSTS is quite heavy for a server-side client (from my experience, I always make sure to use the HTTPS URL if available).

Anyway, I'll make this simple fix with a dedicated unit test that'll be easy to remove if needed.

Closed via #1074

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kde713 picture kde713  路  3Comments

sethmlarson picture sethmlarson  路  5Comments

tomchristie picture tomchristie  路  3Comments

szelenka picture szelenka  路  4Comments

florimondmanca picture florimondmanca  路  3Comments