Httpx: Reviewing the streaming API

Created on 3 Dec 2019  Â·  7Comments  Â·  Source: encode/httpx

Right now we issue streaming requests using...

response = await client.get(url, stream=True)

And have the following stream-specific methods...

  • .read()
  • .raw()
  • .stream()
  • .stream_text()
  • .stream_lines()
  • .close()

For the 1.0 release I think we should consider? droping stream=True, as it makes it exceptionally easy to leave unclosed connections hanging around.

I'm interested in if we might want to use the following instead...

async with client.stream('GET', url) as response:  # Can only be used as a context manager.
    ...

That way users never have to think about "did I call close on the response", or leave open connections hanging around accidentally.

This is related to #393, but it's a stricter API, that doesn't rely on "users might use a response as a context manger". And also makes it far clearer where and when the streaming methods should be used. Ie. they only make sense inside a .stream() block.

I also think we could rationalize the streaming methods, so...

  • .read()
  • .stream_raw()
  • .stream_bytes()
  • .stream_text()
  • .stream_lines()
  • .close()
user-experience

Most helpful comment

Explicit request methods, for the streaming case...

That’s the most reasonable option, IMO.

All 7 comments

Thanks for bringing this up! I remember we mentioned it briefly elsewhere but it's good it has its own issue now.

I agree with the idea of a dedicated method, because passing stream=True induces an important change in behavior that should be better reflected in the API.

I also agree on enforcing context manager usage.

Thinking about alternatives, the fact that we have "one method per HTTP verb" for the non-streaming case and "one method accepting the HTTP verb" for the streaming case might look a bit odd.

(Strictly speaking, users can use .request("GET", ...) for the non-streaming case, but we don't really advertise that possibility e.g. in the docs.)

The only constructive idea I might have regarding this would be…

async with client.stream.get(url) as response: ...

But a) flat is better tha nested and b) that would require us to implement yet another host of methods for each HTTP verb.

So, I think .stream("<method", url, **kwargs) is fine; at least much better than .get(url, stream=True).

I also think we could rationalize the streaming methods

:100:, yes.

Thinking about alternatives, the fact that we have "one method per HTTP verb" for the non-streaming case and "one method accepting the HTTP verb" for the streaming case might look a bit odd.

Indeed yeah.

Options I can see here are...

Provide nested .get() and pals...

async with client.stream.get(url) as response:
    ...

Explicit request methods, for the streaming case...

async with client.stream("GET", url) as response:
    ...

Support either one or two postional args for the streaming case...

async with client.stream(url) as response:  # Defaults to GET
    ...

async with client.stream("POST", url) as response:
    ...

Another take could be that we enforce that responses must be used as a context manager when stream=True is used, so...

response = await client.get(url, stream=True)
response.<whatever>  # Raises an error.

async with client.get(url, stream=True) as response:
    ...  # Happy days

Although I. think that feels a bit weirdly ambiguous to me.

Explicit request methods, for the streaming case...

That’s the most reasonable option, IMO.

Okay, so here's something super nice that the more constrained variant of the streaming API can potentially give us...

  • We can have .request(), .get(), .post() and pals return a Response instance, which doesn't actually include the streaming methods at all.
  • We can have the context managed .stream() return an AsyncResponse instance, which does.

Why is that a nice thing? Well, for one, we're more tightly scoping the "do some I/O with this thing" to a very particular case. You can only perform I/O on an AsyncResponse - a Response is just a plain old bit of state+interface.

This also means that Response is pickleable, where AsyncResponse is not, which is a really clear boundary-line.

The biggest thing tho' is that once we've got a SyncClient and an AsyncClient they actually return the same type for almost all operations, except the more snazzy "I need to control the I/O directly" case. (We won't need SyncRequest and AsyncRequest, but I won't go into that now.)

There's one other bit of API that would need finessing in order to achieve that, which would be to drop the one other bit of I/O on the response, which is response.next(), for calling the next redirect in a chain. Instead that ought to just be something plainer, which returns a redirect response instance, that can be sent explicitly.

We won't go into any of that initially - Let's just get the streaming API squared away first - but wanted to share what I'm thinking there.

AsyncResponse

Or... StreamingResponse? If « async » there really just means « this response may still do I/O » then this would make even more sense to me.

But yeah; at the moment we have a rather strange « read the response content, unless we’re steaming » bit in the client. Switching to a clear distinction between the two will definitely make things cleaner. We might even just get the entire content from inside the actual connections, and pass plain bytes the Response?

There's one other bit of API that would need finessing in order to achieve that, which would be to drop the one other bit of I/O on the response, which is response.next(), for calling the next redirect in a chain. Instead that ought to just be something plainer, which returns a redirect response instance, that can be sent explicitly.

@tomchristie FYI, I'm looking into this at the moment. :-)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

szelenka picture szelenka  Â·  4Comments

FlorianREGAZ picture FlorianREGAZ  Â·  4Comments

njsmith picture njsmith  Â·  3Comments

sethmlarson picture sethmlarson  Â·  5Comments

tomchristie picture tomchristie  Â·  3Comments