Requests: Requests not timing out

Created on 4 Sep 2013 · 14Comments · Source: psf/requests

Got a strange case of failing timeout on a particular url, when trying in browser, seems like this page never completes loading...
The expected behavior would be for requests to timeout, however it is not happening:
this times out fine:
requests.get("http://www.iva.net/blog/", timeout=0.1)
but this doesn't:
requests.get("http://www.iva.net/blog/", timeout=1)

I'm on python 2.7.5 and requests 1.2.3

Source

laruellef

Most helpful comment

If you set stream=False, Requests will endeavour to download the entire Request body before it returns, which will fail. =) What specific dangling connection problems do you have? (Additionally, stream=False is the default behaviour, so you shouldn't need to explicitly set it).

As for sample code, try:

import time
import requests

url = "http://www.iva.net/blog/"
timeout = 90
body = []

start = time.time()
r = requests.get(url, verify=False, stream=True)

for chunk in r.iter_content(1024): # Adjust this value to provide more or less granularity.
    body.append(chunk)

    if time.time() > (start + timeout):
        break # You can set an error value here as well if you want.

content = b''.join(body)

On my machine this downloads roughly 60MB of data before breaking out. Don't make the mistake I did and print the whole thing out. It takes a while. :grin:

Lukasa on 4 Sep 2013

👍4

All 14 comments

This is expected behaviour
http://docs.python-requests.org/en/latest/user/quickstart/#timeouts

Note: timeout only affects the connection process itself, not the downloading of the response body.

The connection is established successfully, but the page just keeps sending the response body indefinitely.

joealcorn on 4 Sep 2013

Thanks for raising this issue @laruellef!

The timeout parameter does not work the way most people seem to expect it to. Whether that is actually a bug is up for discussion. =)

The actual behaviour is that timeout is set as the socket timeout. This timeout applies to each individual blocking socket operation, _not_ to the connection as a whole. This means you only trigger the timeout if connecting takes longer than a second, or if any of the page responses take more than a second to download.

Lukasa on 4 Sep 2013

Cool, tks for the prompt response,
if this is expected behavior,
how can I catch and recover from this behavior in code?
coz at the moment, my script is indefinitely stuck :-(

laruellef on 4 Sep 2013

You have a few options:

Run a timer on another thread which will kill the operation if it takes too long.
Stream the download. This allows you to read the response one chunk at a time, and you can combine this with a manual timeout to abandon the download midway through.

I much more strongly favour (2) than (1). =) Let me know if you need help with 2 and I can show you some sample code.

Lukasa on 4 Sep 2013

Yes, sample code would be wonderful,
I however am using requests.get(url, timeout=90, verify=False, stream=False)
I've found that setting stream to False solves dangling connection problems, esp. on servers where many requests are being processed in parallel.
So, I'd much rather keep stream=False if possible

laruellef on 4 Sep 2013

As for sample code, try:

import time
import requests

url = "http://www.iva.net/blog/"
timeout = 90
body = []

start = time.time()
r = requests.get(url, verify=False, stream=True)

for chunk in r.iter_content(1024): # Adjust this value to provide more or less granularity.
    body.append(chunk)

    if time.time() > (start + timeout):
        break # You can set an error value here as well if you want.

content = b''.join(body)

On my machine this downloads roughly 60MB of data before breaking out. Don't make the mistake I did and print the whole thing out. It takes a while. :grin:

Lukasa on 4 Sep 2013

👍4

Tks.
Ha! didn't know stream=False was the default, I was having strange errors that I couldn't explain, and I was suspecting that lots of connections were being left open, when I set stream=False, the problems went away,
I understand this isn't very helpful from your standpoint, but it's the honest truth... ;-)

How about creating another timeout param to address this case,
so peeps don't need the sample code you just provided... ;-)

laruellef on 4 Sep 2013

@laruellef Creating another timeout parameter is something I'm considering. It suffers from creating two parameters that sound similar but do different things. If we redefined the current timeout parameter that'll break a _lot_ of working code, which would be bad. But most importantly, we cannot easily redefine the timeout parameter without setting stream=True as the default (I think).

Lukasa on 4 Sep 2013

So, @laruellef, it looks like @kevinburke is doing some work on the urllib3 side to add better timeout control. When that gets sorted we'll probably try to plumb it through to Requests. I think waiting for that issue (shazow/urllib3#231) to be resolved is the correct next step here.

Thanks for raising this, and keep track of the urllib3 issue!

Lukasa on 5 Sep 2013

Using streaming mode with iter_content doesn't seem to solve the problem though. At least not my problem. I got a socket that is stuck in ESTABLISHED state, but it is reading no more data. The socket has blocked a celery worker for over 72 hours now (only got around to checking the worker this morning.)

As far as I can tell the workaround suggested above would not help in this case, since no data is available for reading iter_content would block indefinitely.

Edit: socket.setdefaulttimeout fixes the problem, which is weird, I was under the impression that the timeout parameter would do the same.

blubber on 7 Apr 2014

@blubber - sorry for a blast from the past. I'm experiencing stuck sockets in ESTABLISHED although the other side closed the socket, and timeout (new implementation apparently) is in place.
You mentioned socket.setdefaulttimeout. Where exactly did you change it?