Httpx: Allow setting `chunk_size` for `Response.iter_bytes()` etc...

Created on 26 Sep 2019 · 11Comments · Source: encode/httpx

Requests allowed setting chunk_size within .iter_content() which is currently not an option for our alternatives .stream() and .stream_text().

For .stream_text() we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.

requests-compat

Source

sethmlarson

👍3

Most helpful comment

@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

florimondmanca on 13 Mar 2020

👍2

All 11 comments

I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?

tomchristie on 27 Sep 2019

A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.

On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.

We'd then be able to add a chunk_size argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.

tomchristie on 20 Dec 2019

Updated the issue title to reflect the current Response.aiter_* API :-) (see #610).

florimondmanca on 21 Dec 2019

😄1

How could I help with this issue?

b0g3r on 12 Mar 2020

Hi @b0g3r! I think this is still something we’d like to have, and given discussions in https://github.com/python-gitlab/python-gitlab/pull/1036 it seems like some folks would like to see it too. :)

Ways to move forward would be:

Propose an API for this, with context on the existing API on Requests
Investigate implementation details (ie how are we going to split chunks: buffering, other? Looking at how Requests/urllib3 do this can help)
Draft a PR :)

florimondmanca on 12 Mar 2020

🚀2

Do I understand correctly that we will need to forward chunk_size here?
https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_dispatch/urllib3.py#L112-L115

b0g3r on 13 Mar 2020

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

florimondmanca on 13 Mar 2020

👍2

@b0g3r So, as with comment https://github.com/encode/httpx/issues/394#issuecomment-567899958 - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.

(And correspondingly, changing the places where the response calls the decoder such as https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_models.py#L915 to deal with a list of bytes as a return result.)

I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)

tomchristie on 13 Mar 2020

(a)iter_raw(self, chunk_size=1)

chunk_size=1 because requests.Response.iter_content has it
instead of

for part in self._raw_stream:
    yield part

let's use bytestring as buffer

buffer = b""
for part in self._raw_stream:
    buffer += part
    while len(buffer) >= chunk_size:
        yield buffer[:chunk_size]
        buffer = buffer[chunk_size:]
if buffer:
    yield buffer

(a)iter_bytes(self, chunk_size=ITER_CHUNK_SIZE)

chunk_size=ITER_CHUNK_SIZE (512) because requests has it 🌚
calls (a)iter_raw

(a)iter_text(self, chunk_size=ITER_CHUNK_SIZE)

calls (a)iter_bytes

(a)iter_line(self, chunk_size=ITER_CHUNK_SIZE)

calls (a)iter_test
current code expects that each chunk containt full line(s), but it's not true (UPD: found code for splitting in LineDecoder)
requests has elegant solution

b0g3r on 13 Mar 2020

@tomchristie As I see (a)iter_raw doesn't use any decoder 🤔

b0g3r on 13 Mar 2020

Would be good to have chunk_size=None option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.