Aiohttp: Client timeout refactoring

Created on 26 Feb 2018 · 31Comments · Source: aio-libs/aiohttp

There are many different timeout strategies for waiting a response from client:

uber timeout for total elapsed seconds
timeout for waiting for available connection from a pool
connection timeout for waiting a response from a new connection to server
timeout for getting HTTP headers from response
timeout for every HTTP body chunk read
maybe you can add your case

As the solution I propose collapsing all timeouts into the single timeout class.
E.g. Timeout.after(30) is an uber timeout, Timeout.await_pool(5) is a timeout for waiting for connection pool. Timeout.after(30).await_pool(5) composes both.

Do you like the idea? Objections?

outdated

Source

asvetlov

👍2

Most helpful comment

That was just one example on how one possible timeout could be visualised. ;)

Just for reference here's how requests handles basic and advanced timeouts.
In a nutshell, requests only knows two types of timeouts:

The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It's a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.

Once your client has connected to the server and sent the HTTP request, the read timeout is the number of seconds the client will wait for the server to send a response. (Specifically, it's the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).

So to summarize all options on timeouts I found up to now in this thread (excluding anything on how the timeout might actually be specified in code):

| # timeout | description | affected states |
| ------------------------------- | ------------------------------------------------------------------ | ------------------------------------------------------------------------------ |
| 1. uber timeout | for total elapsed seconds, current (read) timeout | from request start to end |
| 2. pool queue timeout | for waiting for an available connection from the pool | from queued_start to queued_end |
| 3. connection create timeout | for waiting for connection establishment | from create_start to create_end |
| 4. DNS resolution timeout | probably only relevant on cache_miss, as a hit should take no time | from resolve_start to resolve_end |
| 5. socket connect timeout | for plain TCP connection establishment, requests' connect timeout | for sock_connect |
| 6. connection acquiring timeout | current connection timeout, for whole connection acquiring process | from connection acquiring begin to end |
| 7. new connection timeout | for waiting for first response from a new connection | from sock_connect to first byte received (i.e. headers_received) |
| 8. HTTP header timeout | for getting HTTP headers of response after request was sent | from before headers_sent (or after last chunk_sent?) to headers_received |
| 9. read timeout | requests' read timeout for detecting slow server / connection | after headers_sent and between each byte received from the server |
| 10. response body timeout | for limiting the time it takes to receive the response body | from after headers_received, around all chunk_received, until end |

Here are my comments:

Do we really need timeout 6? It's just the sum of timeouts 2 and potentially 3.
Timeout 2 may be 0 seconds to never wait if the pool is empty and directly raise a specific exception.
Timeout 10 might also be specified in relation to the number of bytes received, to abort downloads taking ages over a (suddenly) slow connection, but to still allow huge downloads over a normal connection.
Some timeouts include others. We should think about the defaults in each individual case once the exact list of timeouts is settled.
Special care needs to be taken if the time required for sending data also should be taken into account.

I tried putting all the timeouts and states into a single diagram. Unfortunately, blockdiag doesn't allow a group / rectangle spanning multiple other groups, so I couldn't include all of them (which was the reason why I merged all diagrams in the first place). If you want to fiddle with it, you'll find the source code here.
aiohttp

N-Coder on 14 Mar 2018

👍4

All 31 comments

I agree the current situation is confusing, especially for beginners.

Maybe keyword arguments would be more canonical than chaining?

On 26 Feb 2018 19:08, "Andrew Svetlov" notifications@github.com wrote:

There are many different timeout strategies for waiting a response from
client:

uber timeout for total elapsed seconds

timeout for waiting for available connection from a pool

connection timeout for waiting a response from a new connection to
server

timeout for getting HTTP headers from response

timeout for every HTTP body chunk read

maybe you can add your case

As the solution I propose collapsing all timeouts into the single
timeout class.
E.g. Timeout.after(30) is an uber timeout, Timeout.await_pool(5) is a
timeout for waiting for connection pool. Timeout.after(30).await_pool(5)
composes both.

Do you like the idea? Objections?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/aio-libs/aiohttp/issues/2768, or mute the thread
https://github.com/notifications/unsubscribe-auth/AD2jGaeXk6Kx62Bsb4JuE8EXW46YAR1oks5tYwBjgaJpZM4STwZi
.

samuelcolvin on 27 Feb 2018

Are you talking about keyword arguments in Timeout constructor? I don't know, let's see how things will go.
What I definitely don't want is adding all these settings as keyword arguments to client API. Should be the only timeout which contains all required information internally.

asvetlov on 27 Feb 2018

When a number is given, It should be converted to uber timeout.

Keyword-arguments for constructor may be better. Though chaining is cool, for the real usage, many applications may use configuration dict to store the settings, so keyword arguments would be more convient.

Timeout object should work like dict, so a request timeout object can be merged into the default settings.

hubo1016 on 27 Feb 2018

No. Disabling a waiting for available connection is preferable in many cases, aiobotocore wants it.
@thehesiod please confirm if I understand your needs properly.
It means that uber timeout is not always applicable. Different timeout types are not always composable, sometimes they are mutually exclusive.
Dict-like idea just doesn't work.

asvetlov on 27 Feb 2018

Maybe use False or a specified constant for "explicit disable"? Allow inheriting from default is always useful I think.

hubo1016 on 27 Feb 2018

I'm totally missed what default are you talking about and what inheriting means in this context?

asvetlov on 27 Feb 2018

I think there should be a default timeout settings on the session? And for each request, a different timeout can be specified.

hubo1016 on 27 Feb 2018

Yes. Like now a client session has default timeouts which could be overridden by request.
But I want to shrink these values into opaque timeout object that could be passed both into ClientSession constructor and request methods.

asvetlov on 27 Feb 2018

another possible timeout:

DNS resolution timeout (typically tied into "connect" timeout)

I'm not sure if the "send" of the headers blocks, if so I guess that could be a timeout, but more of a "write" timeout.

aiobotocore want's "normal" timeouts, that is timeouts should encapsulate actual network lag. So:

connect timeout is time to connect to endpoint (includes DNS resolve)
read timeout is waiting for any data from an endpoint

thehesiod on 27 Feb 2018

Definitely, this is something that has appeared many times and needs to be tackled sooner than later. My 2cents on that:

Use kw arguments in the constructor is gonna pollute the ClientSession signature and/or the requests methods, taking into account the current timeouts and future timeouts that might be implemented. I will go for a specific class object.
How are we gonna deal with backward compatibility?

Will be appreciated a snippet showing the use case.

Regarding the

timeout for waiting for available connection from a pool

We could give a new option that would allow the developer to get notified via an Exception if and only if all connections are busy because the limit of the pool was reached.

pfreixes on 27 Feb 2018

@pfreixes ya I think there are two categories, those that should be specifiable, and those that are traceable. For now we're only interested in tracing timeout for waiting for available connection from a pool.

thehesiod on 27 Feb 2018

@asvetlov When we choose a timeout object for the session which contains uber timeout and connect timeout, and then start a request with a timeout object that cancels the uber timeout, usually we still want to keep the connect timeout. So timeout object should be able to merge.

I suggest that the timeout object has following interfaces:

timeout object is always immutable. We may consider implement it from NamedTuple.
Timeout(uber_timeout = None, *, connect_timeout = None, ...) constructor. Only the uber timeout can be specified as positional arguments, all other arguments are keyword-only. The default values are None. Accepted values are None / False / . None means "use default", False means "force disable", other numbers are timeout values.
copy(**kwargs) Copy an existing timeout object and update the timeout values. Equivalent to convert the timeout to a dict, merge the dict with kwargs and use the dict to call Timeout(**args).
merge(timeout) Copy an existing timeout object and merge the settings from another timeout object. Only False/ are merged, None values are ignored. Equivalent to .copy(**timeout.todict())
todict() Convert the timeout object to a dict containing settings that are not None. Can be used to implement __repr__ and for logging.

NOTICE: copy(**kwargs) is different from merge(Timeout(**kwargs)). Timeout(connect_timeout=1).copy(connect_timeout=None) returns a timeout object with connect_timeout=None, but Timeout(connect_timeout=1).merge(Timeout(connect_timeout=None)) returns a timeout object with connect_timeout=1.

hubo1016 on 28 Feb 2018

I dislike timeout merging and dict exposing, it brings unnecessary complexity.

asvetlov on 28 Feb 2018

@pfreixes I don't see backward compatibility problems. For transition period a timeout is constructed from old-styled parameters if not provided explicitly.

asvetlov on 28 Feb 2018

@thehesiod tracing is very reach word with many contexts.
What do you mean be tracing timeout for waiting for available connection from a pool?

asvetlov on 28 Feb 2018

@asvetlov sorry I didn't quite understand your message, but to clarify, in regards to the subject of the time to acquire a connector from the pool, I'm only interested in that time for tracing (ddtrace, aws-xray, etc), I don't need to specify a timeout for that, nor do I want it included in other fine-grained timeouts. Hope that helps

thehesiod on 28 Feb 2018

In my mind tracing is not related to timeouts problem discussed here.
You should use client tracing API http://aiohttp.readthedocs.io/en/stable/tracing_reference.html to measure different request lifecycle times.

asvetlov on 28 Feb 2018

you already have signals for that on_connnection_queued_strart and
on_connection_queued_end

El 28/02/2018 07:54, "Andrew Svetlov" notifications@github.com escribió:

In my mind tracing is not related to timeouts problem discussed here.
You should use client tracing API http://aiohttp.readthedocs.io/
en/stable/tracing_reference.html to measure different request lifecycle
times.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/aio-libs/aiohttp/issues/2768#issuecomment-369139966,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABK1iWvwGAnJNXNLPEnCywd6dvv4XLqaks5tZPhBgaJpZM4STwZi
.

pfreixes on 28 Feb 2018

👍1

I'd really love to see more fine granular timeouts implemented, especially for not enforcing a timeout while waiting for a pooled connection, but only for actually establishing the TCP connection. This issue also seems to supersede a few other open issues, including #2648 and #2538.

I do see a strong relation to the aiohttp tracing functionality as both tracing and timeouts hook into certain points of the request lifecycle. I think it might help to use the new block diagrams in the trace docs to discuss and later document which (groups of) states / transitions are counted towards which timeouts (see attached example for the timeout for establishing a new connection).

N-Coder on 14 Mar 2018

To be clear: I have no estimation when I can find a time for the issue personally.
If somebody want to pick it up and make a PR -- you are welcome.

asvetlov on 14 Mar 2018

@N-Coder ya that's why I brought it up. If I get any chance I'll try to come up with something.

thehesiod on 14 Mar 2018

btw @N-Coder want to update your diagram for "read" timeouts as well? I'm not sure if that accounts for the header read, or just the body read(s). There's also redirects if you want to capture that as well from "request" to response"

thehesiod on 14 Mar 2018

That was just one example on how one possible timeout could be visualised. ;)

Just for reference here's how requests handles basic and advanced timeouts.
In a nutshell, requests only knows two types of timeouts:

The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It's a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.

Once your client has connected to the server and sent the HTTP request, the read timeout is the number of seconds the client will wait for the server to send a response. (Specifically, it's the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).

So to summarize all options on timeouts I found up to now in this thread (excluding anything on how the timeout might actually be specified in code):

Here are my comments:

Do we really need timeout 6? It's just the sum of timeouts 2 and potentially 3.
Timeout 2 may be 0 seconds to never wait if the pool is empty and directly raise a specific exception.
Timeout 10 might also be specified in relation to the number of bytes received, to abort downloads taking ages over a (suddenly) slow connection, but to still allow huge downloads over a normal connection.
Some timeouts include others. We should think about the defaults in each individual case once the exact list of timeouts is settled.
Special care needs to be taken if the time required for sending data also should be taken into account.

N-Coder on 14 Mar 2018

👍4

@N-Coder wow, thank you so much for that information! u rock!

for 6 I'm guessing it's useful if you just want to specify an encompassing timeout and don't want to break up the timeout between the sub-parts, so basically either you set 6, or the sub-parts, but not both?

btw for redirect, that could be to a different server so I'm assuming there are two branches?

thehesiod on 14 Mar 2018

@N-Coder you are god for blockdiag.
I've split the picture into three blocks because was unable to merge everything into single picture.

Some timeouts include others. We should think about the defaults in each individual case once the exact list of timeouts is settled.

This is exactly my main trouble: how to respect all possible timeouts without falling into combinatoric mess?

asvetlov on 14 Mar 2018

@N-Coder you are god for blockdiag.

Thanks! :sweat_smile:
Some of the edges are collapsed together, which makes the diagram hard to read at some points (and for which reason I made some edges thick). Also, there seem to be weird encoding problems with the titles containing brackets and some of the labels are cut off. I couldn't find fixes for any of those problems, so we might need a little more advanced solution for the diagram if we want to make something official out of my version. Or at least I should clean up the code, it's quite messy....

for 6 I'm guessing it's useful if you just want to specify an encompassing timeout and don't want to break up the timeout between the sub-parts, so basically either you set 6, or the sub-parts, but not both?

You're right, it might be worth to just keep it for convenience's sake. There might be some weird cases (although I can't come up with one right now), where somebody wants to set any 2 (or all 3) of timeouts 6, 2 and 3. Additionally, I can't come up with anything that speaks against keeping 6.

This is exactly my main trouble: how to respect all possible timeouts without falling into combinatoric mess?

I'd say keep it simple and don't try to guess what the developer actually wants when setting any weird timeouts. Better make it really easy to understand which timeout affects what and don't involve any magic at all. I'd recommend only setting the two timeouts requests is also using (possibly to the same values) and leave the more advanced values to the more arcane use-cases, but without any complicated combinatoric interference.

N-Coder on 14 Mar 2018

I just found #2310, #2537 and potentially also #2007 which are related to timeouts, but for the case of websockets. I'm not sure how far we should take websockets into account here, as at least the establishment of a connection takes at least one normal HTTP request and some of the timeouts for the websocket itself (e.g. a receive/read timeout for the TCP connection) might be similar.

As I need working timeouts for another project soon and already got kind of an idea how they could be implemented properly, I'll try implementing and example we can discuss.

N-Coder on 16 Mar 2018

👍1

Let's put websocket timeouts out of the issue scope for simplicity.
We can work on websocket timeouts in a separate issue.

asvetlov on 16 Mar 2018

I condensed my thoughts to code in PR #2842. Please don't see this as a complete PR ready for merging, it's just a draft intended as material for discussion. I had to make a PR to allow inline comments within the code to allow for easier reviewing. As soon as everything is settled, I'll make a proper, cleaned-up PR that makes the CI tools happy.

I'm looking forward to your comments, idea, opinions, objections,...

N-Coder on 16 Mar 2018

Done by #2972

asvetlov on 22 May 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].