Httpx: Allow setting `chunk_size` for `Response.iter_bytes()` etc...

Created on 26 Sep 2019  Â·  11Comments  Â·  Source: encode/httpx

Requests allowed setting chunk_size within .iter_content() which is currently not an option for our alternatives .stream() and .stream_text().

For .stream_text() we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.

requests-compat

Most helpful comment

@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

All 11 comments

I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?

A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.

On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.

We'd then be able to add a chunk_size argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.

Updated the issue title to reflect the current Response.aiter_* API :-) (see #610).

How could I help with this issue?

Hi @b0g3r! I think this is still something we’d like to have, and given discussions in https://github.com/python-gitlab/python-gitlab/pull/1036 it seems like some folks would like to see it too. :)

Ways to move forward would be:

  • Propose an API for this, with context on the existing API on Requests
  • Investigate implementation details (ie how are we going to split chunks: buffering, other? Looking at how Requests/urllib3 do this can help)
  • Draft a PR :)

Do I understand correctly that we will need to forward chunk_size here?
https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_dispatch/urllib3.py#L112-L115

@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.

I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…

@b0g3r So, as with comment https://github.com/encode/httpx/issues/394#issuecomment-567899958 - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.

(And correspondingly, changing the places where the response calls the decoder such as https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_models.py#L915 to deal with a list of bytes as a return result.)

I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)

(a)iter_raw(self, chunk_size=1)

for part in self._raw_stream:
    yield part

let's use bytestring as buffer

buffer = b""
for part in self._raw_stream:
    buffer += part
    while len(buffer) >= chunk_size:
        yield buffer[:chunk_size]
        buffer = buffer[chunk_size:]
if buffer:
    yield buffer

(a)iter_bytes(self, chunk_size=ITER_CHUNK_SIZE)

  • chunk_size=ITER_CHUNK_SIZE (512) because requests has it 🌚
  • calls (a)iter_raw

(a)iter_text(self, chunk_size=ITER_CHUNK_SIZE)

  • calls (a)iter_bytes

(a)iter_line(self, chunk_size=ITER_CHUNK_SIZE)

  • calls (a)iter_test
  • current code expects that each chunk containt full line(s), but it's not true (UPD: found code for splitting in LineDecoder)
  • requests has elegant solution

@tomchristie As I see (a)iter_raw doesn't use any decoder 🤔

Would be good to have chunk_size=None option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.

Was this page helpful?
0 / 5 - 0 ratings