I've got a Flask application doing natural language processing (NLP), and it accepts a request body consisting of a JSON array of tokens (words). Thus, a document 400 words long become an array of length 400, and so on.
When I pass in an array longer than about 8K characters (including the commas and such), I get no error whatsoever in the gunicorn log, but the request passed to the API is completely empty. If I pass in a smaller document, I have no problem. If I run the app with werkzeug.serving.run_simple rather than gunicorn, I have no problems with long documents.
I've had a look at #376 and #1659, and judging by the response to #1659, the request body shouldn't be limited at all. Nonetheless, I added the following to my config file:
limit_request_line = 0
limit_request_fields = 32768
limit_request_field_size = 0
The new lines in the config don't help. That fact suggests that this is a different issue from #1704, since I can't get my app to work even with values of unlimited (0) for the two settings that allow that--and at any rate, I'm not getting the "Bad Request" error I've seen referenced elsewhere (I don't recall where).
Thanks for any help you can provide.
OK, apparently the problem is Flask's inability to handle chunked data: https://medium.com/@DJetelina/python-and-chunked-transfer-encoding-11325245a532
As noted in #1264, Django has a similar problem.
Ugh.
EDIT: And more to the point, see #1653.
For anyone else experiencing this problem with a Flask app, adding this function to the app code (note the "before_request" decorator) fixes it:
@app.before_request
def handle_chunking():
"""
Sets the "wsgi.input_terminated" environment flag, thus enabling
Werkzeug to pass chunked requests as streams. The gunicorn server
should set this, but it's not yet been implemented.
"""
transfer_encoding = request.headers.get("Transfer-Encoding", None)
if transfer_encoding == u"chunked":
request.environ["wsgi.input_terminated"] = True
I believe the content of this ticket is different from the title. Rename?
Also, chunked data support is handled in https://github.com/benoitc/gunicorn/issues/1653. Close this one?
On the other hand, I would like to ask clarification on body size:
Actually, I asked also about it https://github.com/benoitc/gunicorn/issues/1659 but I'm not sure I understand. In a nutshell, my conclusion there was that (using Flask):
gunicorn
->werkzeug
->code). It's not possible to know the stream size beforehand, so application must read it up to a point, and then kill the connection if it's just too large. There's no built-in support for this - you have to check it manually.Content-Length
in your code (or in werkzeug via MAX_CONTENT_LENGTH
), the whole request has already been buffered, and you can only prevent code from _processing_ an overly large request, not prevent _reading_ it? If someone sends 1 GB payload, it helps little to check Content-Length
after everything has been already downloaded.If my second bullet is correct, then limit_request_body
option would make sense (to have gunicorn check Content-Length
asap, before buffering the whole request)?
Ok, so if gunicorn doesn't buffer it, _werkzeug_ (or Flask) still might. I thought I could _assume_ that werkzeug is smart enough to check MAX_CONTENT_LENGTH
on the fly, before actually downloading the content, but after looking at the docs/code I'm not so sure anymore. Anyway, I might open a ticket on their tracker, thanks for answering on this repetition again @tilgovi
we now support wsgi.input_terminated
.
Most helpful comment
For anyone else experiencing this problem with a Flask app, adding this function to the app code (note the "before_request" decorator) fixes it: