Got a strange case of failing timeout on a particular url, when trying in browser, seems like this page never completes loading...
The expected behavior would be for requests to timeout, however it is not happening:
this times out fine:
requests.get("http://www.iva.net/blog/", timeout=0.1)
but this doesn't:
requests.get("http://www.iva.net/blog/", timeout=1)
I'm on python 2.7.5 and requests 1.2.3
This is expected behaviour
http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
Note: timeout only affects the connection process itself, not the downloading of the response body.
The connection is established successfully, but the page just keeps sending the response body indefinitely.
Thanks for raising this issue @laruellef!
The timeout parameter does not work the way most people seem to expect it to. Whether that is actually a bug is up for discussion. =)
The actual behaviour is that timeout is set as the socket timeout. This timeout applies to each individual blocking socket operation, _not_ to the connection as a whole. This means you only trigger the timeout if connecting takes longer than a second, or if any of the page responses take more than a second to download.
Cool, tks for the prompt response,
if this is expected behavior,
how can I catch and recover from this behavior in code?
coz at the moment, my script is indefinitely stuck :-(
You have a few options:
I much more strongly favour (2) than (1). =) Let me know if you need help with 2 and I can show you some sample code.
Yes, sample code would be wonderful,
I however am using requests.get(url, timeout=90, verify=False, stream=False)
I've found that setting stream to False solves dangling connection problems, esp. on servers where many requests are being processed in parallel.
So, I'd much rather keep stream=False if possible
If you set stream=False, Requests will endeavour to download the entire Request body before it returns, which will fail. =) What specific dangling connection problems do you have? (Additionally, stream=False is the default behaviour, so you shouldn't need to explicitly set it).
As for sample code, try:
import time
import requests
url = "http://www.iva.net/blog/"
timeout = 90
body = []
start = time.time()
r = requests.get(url, verify=False, stream=True)
for chunk in r.iter_content(1024): # Adjust this value to provide more or less granularity.
body.append(chunk)
if time.time() > (start + timeout):
break # You can set an error value here as well if you want.
content = b''.join(body)
On my machine this downloads roughly 60MB of data before breaking out. Don't make the mistake I did and print the whole thing out. It takes a while. :grin:
Tks.
Ha! didn't know stream=False was the default, I was having strange errors that I couldn't explain, and I was suspecting that lots of connections were being left open, when I set stream=False, the problems went away,
I understand this isn't very helpful from your standpoint, but it's the honest truth... ;-)
How about creating another timeout param to address this case,
so peeps don't need the sample code you just provided... ;-)
@laruellef Creating another timeout parameter is something I'm considering. It suffers from creating two parameters that sound similar but do different things. If we redefined the current timeout parameter that'll break a _lot_ of working code, which would be bad. But most importantly, we cannot easily redefine the timeout parameter without setting stream=True as the default (I think).
So, @laruellef, it looks like @kevinburke is doing some work on the urllib3 side to add better timeout control. When that gets sorted we'll probably try to plumb it through to Requests. I think waiting for that issue (shazow/urllib3#231) to be resolved is the correct next step here.
Thanks for raising this, and keep track of the urllib3 issue!
Using streaming mode with iter_content doesn't seem to solve the problem though. At least not my problem. I got a socket that is stuck in ESTABLISHED state, but it is reading no more data. The socket has blocked a celery worker for over 72 hours now (only got around to checking the worker this morning.)
As far as I can tell the workaround suggested above would not help in this case, since no data is available for reading iter_content would block indefinitely.
Edit: socket.setdefaulttimeout fixes the problem, which is weird, I was under the impression that the timeout parameter would do the same.
@blubber - sorry for a blast from the past. I'm experiencing stuck sockets in ESTABLISHED although the other side closed the socket, and timeout (new implementation apparently) is in place.
You mentioned socket.setdefaulttimeout. Where exactly did you change it?
Anywhere is fine, it's a global setting.
what is the final solution?
replace requests with urllib3 if you want to set timeout?
@sunshusunshf no that's not the final solution. If you want help with your code, though, go to StackOverflow
Most helpful comment
If you set
stream=False, Requests will endeavour to download the entire Request body before it returns, which will fail. =) What specific dangling connection problems do you have? (Additionally,stream=Falseis the default behaviour, so you shouldn't need to explicitly set it).As for sample code, try:
On my machine this downloads roughly 60MB of data before breaking out. Don't make the mistake I did and print the whole thing out. It takes a while. :grin: