Httpx: Finalising the Transport API for 1.0.

Created on 10 Sep 2020 · 4Comments · Source: encode/httpx

There's a few bits of thinking around the Transport API that we'd like to be completely set on before we roll with a 1.0 release.

We should consider splitting the sync/async interface as .request/.arequest. This would allow transport implementations that support both sync+async usage, which would be a really nice usability for some use-cases.
There's a very strong case to be made for .request being a context manager. See https://github.com/encode/httpcore/issues/145 - Needs some very careful investigation.
We need to do a bit of thinking around making custom transports easier to use. Eg from a POV of not having to specify defaults verbosely as soon as dropping down to that level. The docs at https://www.python-httpx.org/advanced/#custom-transports are a good example of where we could be making some usability improvements.
We ought to consider exposing something like httpx.HTTPTransport(...) rather than forcing users to drop into httpcore once they're configuring transports directly.
The work on exposing the content stream API is also somewhat related... See #1253

To give a few examples of the effect of some of these decisions...

Expose low-level configuration options on the transport but not the client, without having gnarly extra configuration needed...

client = httpx.Client(transport=httpx.HTTPTransport(local_address="..."))

Mount a customised transport for one specific domain, without having gnarly extra configuration needed...

client = httpx.Client(..., mounts={"all://www.example.com": httpx.HTTPTransport(verify=False, retries=1)})

Transport API using .request and .arequest...

# A custom transport class, that always redirects to https URLs, and supports both sync and async API,
# meaning it can be used with either kind of client.
client = httpx.Client(..., mounts={"http": HTTPSRedirect()})
client = httpx.AsyncClient(..., mounts={"http": HTTPSRedirect()})

user-experience

Source

tomchristie

Most helpful comment

I've been doing some thinking around this, and I do think we'll want to re-jig the Transport API slightly.
Here's where I'm at...

1. Supporting both sync & async on a single class

I think we really want .request() and .arequest(). Being able to have classes such as a single httpx.MockTransport(handler) that can work with either case is a really good argument in favour of this. Additionally I can imagine midleware-like functionality such as HTTPSOnlyTransport, HSTSTransport that'd benefit from being able to work with both async and sync clients. Switching to .request/.arequest is also a really easy one for us to do.

The next set is a bit more complex to talk through. So...

2. Optional context / extensions

We're aware that we might at some point want to add functionality such as trailing headers #1149 or server push, that don't currently fit in to the existing API. The proposal for trailing headers was that when we got there we could essentially just bolt it on to the content streams as an additional method.

We also have some really nice functionality where we can point httpx at a WSGI or ASGI app. Really I'd like to replace starlette's requests-based test client with httpx. One area where the transport API doesn't fully support us here is that for templated responses, you'd really like to be able to inspect eg response.template == 'index.html' and response.context == {...}, rather than only having the raw rendered HTML to work with. The Transport API doesn't have any way of passing additional out-of-bounds information through it, so there's no great way to do this. (Even though the ASGI spec could end up with an extension that does support it at the ASGI layer - https://github.com/django/asgiref/issues/135)

As a result, here's what I'm thinking...

def request(method, url, headers, stream, context):
    return (status_code, headers, stream, context)

Here, the context is used for transport context, that provides optional additional information.

Once you start digging there's actually quite a lot of areas where you might want to dig into additional extentions/context...

Response context

http_version - Optional.
reason_phrase - Optional. Not used for HTTP/2. Most clients will just want the status code, but if you want to expose exactly what comes back from the network, then this is useful. Feels rather nice that we can drop this out into an optional, since many transports won't care about providing it at all.
Server push
Trailing headers
Sendfile
Test info including template name, template context
Raw connection, for CONNECT/Upgrade requests

Request context

timeout - Feels a bit nicer having this in the context.
http_version/client_addr/mount_path - If the Transport API was being called into by a server rather than a client, we would want to populate these, in order to encompass the full set of info that WSGI/ASGI provides.

The raw connection one if particularly interesting, since it would allow us to support CONNECT & Upgrade & HTTP/2 bi-directional streaming https://github.com/encode/httpx/issues/1150 without needing a new kind of method on the transport API in order to do so. Eg...

with transport.request(method, url, headers, stream, context) as response:
    status_code, headers, stream, context = response
    if status_code >= 400:
        ...  # request was declined

    # CONNECT requests, Upgrade requests, and HTTP/2 requests may provide
    # an interface onto the raw connection. For HTTP/2 the "connection" will represent
    # a single data stream. Once we have a connection we can perform
    # raw I/O on it.
    connection = context["connection"]
    while True:
        outgoing = input()
        print(f">>> {outgoing}")
        connection.send(outgoing.encode("utf-8")
        incoming = connection.recv()
        print(f"<<< {incoming}")

This allows third parties to build components such as websockets or gRPC on top of httpx. It might also allow us to revisit our proxy implementations, so that they're properly compose with the connection pool, rather than subclassing it and hooking into bit of otherwise private behaviour.

3. Transport API as a context manager

I think I'm fairly sold on #145. Note that 2 together with 3 mean that stream really can just by Iterable[bytes]/AsyncIterable[bytes], since we don't need a close() method on the byte stream, and we wouldn't ever be planning on stuffing extra methods onto the stream interface, such as .get_trailing_headers.

tomchristie on 18 Sep 2020

👍2

All 4 comments

I've been doing some thinking around this, and I do think we'll want to re-jig the Transport API slightly.
Here's where I'm at...

1. Supporting both sync & async on a single class

The next set is a bit more complex to talk through. So...

2. Optional context / extensions

As a result, here's what I'm thinking...

def request(method, url, headers, stream, context):
    return (status_code, headers, stream, context)

Here, the context is used for transport context, that provides optional additional information.

Once you start digging there's actually quite a lot of areas where you might want to dig into additional extentions/context...

Response context

http_version - Optional.
reason_phrase - Optional. Not used for HTTP/2. Most clients will just want the status code, but if you want to expose exactly what comes back from the network, then this is useful. Feels rather nice that we can drop this out into an optional, since many transports won't care about providing it at all.
Server push
Trailing headers
Sendfile
Test info including template name, template context
Raw connection, for CONNECT/Upgrade requests

Request context

timeout - Feels a bit nicer having this in the context.
http_version/client_addr/mount_path - If the Transport API was being called into by a server rather than a client, we would want to populate these, in order to encompass the full set of info that WSGI/ASGI provides.

with transport.request(method, url, headers, stream, context) as response:
    status_code, headers, stream, context = response
    if status_code >= 400:
        ...  # request was declined

    # CONNECT requests, Upgrade requests, and HTTP/2 requests may provide
    # an interface onto the raw connection. For HTTP/2 the "connection" will represent
    # a single data stream. Once we have a connection we can perform
    # raw I/O on it.
    connection = context["connection"]
    while True:
        outgoing = input()
        print(f">>> {outgoing}")
        connection.send(outgoing.encode("utf-8")
        incoming = connection.recv()
        print(f"<<< {incoming}")

3. Transport API as a context manager

tomchristie on 18 Sep 2020

👍2

I think we probably ought to get a bit more lenient on the allowable types.

For instance... h11 is really careful about it's types too, but allows either bytes or ascii-only strings on the inputs.

We could do the same thing and...

Allow either bytes or ascii-only strings for method
Allow either bytes or ascii-only strings for the headers key/values.
Allow URL to be either a raw parsed URL tuple or a str.

The nice thing together with httpx.HTTPTransport is that we end up in a place where we've got a really tightly defined low-level transport interface, but it's also really nice and user friendly.

We could??? even consider adding mandatory headers automatically for the case where headers = None...

Default headers=None to include an automatic Host header.
Default headers=None to include an automatic Transfer-Encoding: chunked if stream is not None.

# Make a request using the transport API directly.
transport = httpx.HTTPTransport()
with transport.request("GET", "https://www.example.com") as response:
    status_code, headers, stream, ext = response
    body = b"".join(stream)

tomchristie on 24 Sep 2020

@tomchristie Just to confirm something I'm not yet sure about…

Eventually, are we meaning to:

Have a single httpcore.HTTPTransport class.
Or make sure httpcore.SyncHTTPTransport and httpcore.AsyncHTTPTransport have a different request/close entrypoint (request + close, vs arequest + aclose) (which is done now) so that an httpx.HTTPTransport can be added, implementing both interfaces into a single class API?

In case 1/ this would lead us to reconsider a big bunch of things, such as our unasync approach, and the general way we'd expect people to build transports, not thinking about breaking API changes, so… Just wanted to clarify that aspect. :)

florimondmanca on 28 Sep 2020

I don't expect us to have a single httpcore.HTTPTransport class, nope.
It's more making sure that they have different entry points, yup, so that...

We can have a single httpx.MockTransport, that works for both cases.
Intermediaries such as CacheControl can have implementations that cover both cases.
We could potentially have a single httpx.HTTPTransport, that works for both cases, dispatching to whichever is required. (Tho I don't know if we want that for sure vs. httpx.HTTPTransport + httpx.AsyncHTTPTransport, or not.)