Hello,
I would like to bring this issue to your attention: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801506
The problem shows in requests 2.10.0.
This is my test script:
import requests
URL = 'https://contributors.debian.org/contributors/test_post'
FILE = 'test.json.xz'
# FILE = 'test-small.json.xz' # This works
if __name__ == '__main__':
payload = {'source': 'bugs.debian.org', 'data_compression': 'xz'}
with open(FILE, 'rb') as f:
r = requests.post(URL, files={'data': f}, data=payload)
print(r.text)
test-small.json.xz is the same of test.json.xz only smaller: in my test I used only 10 objs.
curl is able to post correctly:
curl https://contributors.debian.org/contributors/test_post -F source=bugs.debian.org -F [email protected] -F data_compression=xz
github doesn't allow to upload a .xz so please take it from Debian BTS: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801506;filename=test.json.xz;att=1;msg=15
Any suggestions?
So after digging around I have checked some things. Requests does seem to be in agreement with itself, at least: the length of the request body it sends does match the length of the body it _actually_ sends, at least according to Python. So that's a good start.
However, an interesting problem occurs. If I use mitmproxy to try to spy on the upload, curl starts to get a 413! It weirdly seems like if anything touches Python code it explodes, which seems just _so_ unlikely. On top of that, mitmproxy doesn't seem to be able to load the request/response: it just hangs. That's extremely perplexing.
Yup, even mitmdump sees this problem. What. The. Hell.
Got it.
curl by default sends the Expect: 100-Continue header. Requests does not send this header. That appears to be affecting Apache's decision-making logic here: for large bodies it clearly wants that to be set so that it can validate that the request is actually wanted.
If you prevent curl from sending that header by using the command curl https://contributors.debian.org/contributors/test_post -F source=bugs.debian.org -F [email protected] -F data_compression=xz -H "Expect:", that causes curl to see the 413 as well.
Requests cannot, in its current form, support the 100-Continue response, so there is nothing we can do about this: if you'd like to use requests here you'll have to adjust your Apache configuration appropriately.
@Lukasa many thanks for the investigation!
For the records, this is relevant, and they have a rationale and a work-around:
But you should really design your site to ensure that the first request to a
client-cert-protected area is not a POST request with a large body; make it a
GET or something. Any request body has to be buffered into RAM to handle this
case, so represents an opportunity to DoS the server.
I confirm that the workaround works, as long as the get and post happen on the same session:
s = requests.Session()
res = s.get("https://example.org")
res.raise_for_status()
res = s.post("https://example.org", **args)
res.raise_for_status()
Most helpful comment
Got it.
curl by default sends the
Expect: 100-Continueheader. Requests does not send this header. That appears to be affecting Apache's decision-making logic here: for large bodies it clearly wants that to be set so that it can validate that the request is actually wanted.If you prevent curl from sending that header by using the command
curl https://contributors.debian.org/contributors/test_post -F source=bugs.debian.org -F [email protected] -F data_compression=xz -H "Expect:", that causes curl to see the 413 as well.Requests cannot, in its current form, support the 100-Continue response, so there is nothing we can do about this: if you'd like to use requests here you'll have to adjust your Apache configuration appropriately.