Requests allowed setting chunk_size within .iter_content() which is currently not an option for our alternatives .stream() and .stream_text().
For .stream_text() we should go the extra step and fix the issue that users sometimes run into when using this feature and use chunk-size for measuring the decoded text, not the raw bytes.
I seem to remeber this playing into some primitives around streaming bytes vs text that we never ended up digging into?
A good first pass onto this would be to change the decoder interface slightly, so that instead of eg. yeilding a byte chunk, they yield a list of byte chunks.
On the first refactoring pass, we don't need to actually change the internal implmentation much - the decoders can just always yield a list with a single item.
We'd then be able to add a chunk_size argument to the decoders, which would return 0, 1, or many properly-sized chunks on each yield.
Updated the issue title to reflect the current Response.aiter_* API :-) (see #610).
How could I help with this issue?
Hi @b0g3r! I think this is still something we’d like to have, and given discussions in https://github.com/python-gitlab/python-gitlab/pull/1036 it seems like some folks would like to see it too. :)
Ways to move forward would be:
Do I understand correctly that we will need to forward chunk_size here?
https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_dispatch/urllib3.py#L112-L115
@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.
I think we want to look at controlling the chunk size directly from response.iter_bytes()/response.aiter_bytes(), instead…
@b0g3r So, as with comment https://github.com/encode/httpx/issues/394#issuecomment-567899958 - the right place to start with this would be a pull request to https://github.com/encode/httpx/blob/master/httpx/_decoders.py that changes the interface of the decoders, so that they return a list of bytes rather than bytes.
(And correspondingly, changing the places where the response calls the decoder such as https://github.com/encode/httpx/blob/a82adcc933345c6b8cb1623b031eb85723e7665b/httpx/_models.py#L915 to deal with a list of bytes as a return result.)
I'd start with that as a foundational pull request, which will then make the remaining work much easier. (Adding chunk sizes to the decoder interface, and through to the response methods.)
chunk_size=1 because requests.Response.iter_content has itfor part in self._raw_stream:
yield part
let's use bytestring as buffer
buffer = b""
for part in self._raw_stream:
buffer += part
while len(buffer) >= chunk_size:
yield buffer[:chunk_size]
buffer = buffer[chunk_size:]
if buffer:
yield buffer
chunk_size=ITER_CHUNK_SIZE (512) because requests has it 🌚 (a)iter_raw(a)iter_bytes(a)iter_test@tomchristie As I see (a)iter_raw doesn't use any decoder 🤔
Would be good to have chunk_size=None option so that httpx can return chunks at the HTTP chunk boundaries as per the requests library - this is useful for apps that require timely delivery.
Most helpful comment
@b0g3r Careful that we're in a sort of transition state w.r.t. urllib3 usage due to #804 (we'll soon use our own sync implementation, though keeping urllib3 as an option). Due to this I wouldn't advise relying on any existing urllib3 functionality — also because we'd want to provide chunk sizing on the async layer too, and it'd be odd to have a different implementation on both sides.
I think we want to look at controlling the chunk size directly from
response.iter_bytes()/response.aiter_bytes(), instead…