馃悶 Describe the bug
If the response data is fed into HttpPayloadParser.feed_data in a particular way, the parser is unable to successfully parse chunked data. Specifically, this happens when one call to the function contains the last 0\r\n chunk but the following \r\n is supplied in a separate call.
馃挕 To Reproduce
import aiohttp.http_parser
import io
class Payload: # A minimal payload implementation
def __init__(self):
self.data = io.BytesIO()
self.exc = None
def feed_data(self, data, size):
self.data.write(data)
def feed_eof(self):
pass
def set_exception(self, exc):
self.exc = exc
def begin_http_chunk_receiving(self):
pass
def end_http_chunk_receiving(self):
pass
payload = Payload()
parser = aiohttp.http_parser.HttpPayloadParser(payload, length = None, chunked = True, compression = None, code = 200, method = 'GET')
print(repr(parser.feed_data(b'4\r\nasdf\r\n0\r\n')))
eof, data = parser.feed_data(b'\r\n')
print(repr((eof, data)))
if not eof:
print(repr(parser.feed_eof()))
print(repr(payload.data.getvalue()))
I added print statements here to debug what exactly aiohttp is returning compared to the simple feed_data(b'4\r\nasdf\r\n0\r\n\r\n') call (which works fine).
馃挕 Expected behavior
The parser is able to process the data, and the last line produces b'asdf'.
馃搵 Logs/tracebacks
(False, b'')
(False, b'')
Traceback (most recent call last):
File "aiohttp-test.py", line 32, in <module>
print(repr(parser.feed_eof()))
File ".../lib/python3.6/site-packages/aiohttp/http_parser.py", line 575, in feed_eof
"Not enough data for satisfy transfer length header.")
aiohttp.http_exceptions.TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'
馃搵 Your version of the Python
$ python --version
Python 3.6.10
馃搵 Your version of the aiohttp/yarl/multidict distributions
$ python -m pip show aiohttp | grep Version
Version: 3.6.2
```console
$ python -m pip show multidict | grep Version
Version: 4.7.5
```console
$ python -m pip show yarl | grep Version
Version: 1.4.2
馃搵 Additional context
Discovered with aiohttp 2.3.10 due to errors in qwarc, which definitely uses aiohttp in weird, undocumented, and unsupported ways. But I believe the error could happen also in normal aiohttp usage if the data returned from the server is just the right size, namely two bytes over a multiple of the internal buffer size.
This happens on the split b'0\r\n\r' + b'\n' as well.
Also note that while feed_data may return remaining data in the second argument, it actually only does this when reaching EOF (i.e. when the data stream contains multiple responses), so the fact that I am not handling that properly is not relevant for this bug.
Could you please submit a PR with a test for that? Also, are you relying on the cext version of http_parser or pure-python?
I'd love to, but it wasn't immediately obvious to me how this needs to be fixed since the parser is not straightforward to understand due to the state transitions and is entirely undocumented, and I don't currently have time to dig into this deeper.
The HttpPayloadParser is Python-only and does not exist in the extension. I haven't tested what the extension HttpResponseParser would do with such a payload.
@JustAnotherArchivist Could you provide PR with the test only, without fixing the bug ?
@socketpair Done: #4651
While writing this, I realised that the error does not occur if there is a trailer field and the split happens before the closing CRLF, but it does happen if the split is between that CR and LF. I also added test cases for both of these.
FTR the PR with tests is in. Once the defect is fixed, pytest will report them as XPASS.
Fixed by #4846
Thanks, @rhdxmr !
Most helpful comment
@socketpair Done: #4651
While writing this, I realised that the error does not occur if there is a trailer field and the split happens before the closing CRLF, but it does happen if the split is between that CR and LF. I also added test cases for both of these.