Spun up from https://github.com/encode/httpx/issues/411
Currently, the behavior of passing None to the timeout parameter in the various request entrypoints is not consistent:
import httpx
url = "https://httpbin.org/delay/10"
# (1)
httpx.get(url, timeout=None) # Times out after 5s
# (2)
client = httpx.Client(timeout=None)
client.get(url) # Does not timeout, returns after 10s
# (3)
client = httpx.Client()
client.get(url, timeout=None) # Times out after 5s
We should make sure passing None always results in the behavior (2), i.e. no timeout should be applied, as opposed to (1) and (3) where the default timeout configuration is used.
The implementation suggested in #411 is to use a sentinel object:
# models.py
UNSET = object()
# api.py
def get(..., timeout: TimeoutTypes = UNSET, ...):
...
We need to switch to using UNSET in all relevant places. For example (though these may not end up being relevant, as I have not solved this myself):
httpx/api.py.BaseClient, Client and AsyncClient in httpx/client.py.httpx/dispatch/, esp. .send().Next, we need to check for that sentinel instead of None when deciding whether to use the default timeout config.
Notable places affected by this switch are:
.send() methods in clients and dispatchers (only use the client's or dispatcher's self.timeout if timeout is UNSET, instead of None)..connect(), .read() and .write() methods on the concurrency backend classes:I'd like to work on this.
@florimondmanca I don't understand the reason to use the sentinel value. Wouldn't be the same to keep DEFAULT_TIMEOUT_CONFIG as the default timeout value for httpx.Client() and all the top-level methods like httpx.get()? That way we would avoid having to check if timeout is UNSET.
@jcugat Good question, thanks, As it turns out, we already default DEFAULT_TIMEOUT_CONFIG in the constructors for Client, AsyncClient, ConnectionPool, HTTPConnection, etc. I'll edit the issue description based on this.
But there's a few design decisions about clients and dispatchers (which all expose a .send() method) that hint at using a sentinel value:
DEFAULT_TIMEOUT_CONFIG as a default in their constructors..send(). To do this, clients and dispatchers need to distinguish between "the user did not pass a value" and "the user explicitly doesn't want any timeout".Right now .send() cannot make that distinction, but it can be achieved if we switch to UNSET as a default for timeout in .send() and its sub-methods (e.g. .connect() on HTTPConnection).
This issue is more subtle than I initially thought it would be and will require to go through various places of the code base. Still a nice first issue, but more challenging than usual, so I removed the label. :)
@florimondmanaca can you assign it to me please?
I don't understand the reason to use the sentinel value.
The only place we need to use an UNSET value is on the client .request(...) methods.
That's because we need to differentiate between two difference cases:
timeout is not set (Defer to whatever is set on the Client instance)timeout=None (Don't timeout, regardless of whatever is set on the Client instance)We don't need it on the client value, and we don't need it on the top-level API methods.
The only place we need to use an
UNSETvalue is on the client.request(...)methods.
We don't need it on the client value, and we don't need it on the top-level API methods.
Except the top-level API methods default to None and pass that value along, making the top-level API default to no timeout. I don't think that's desired. And that doesn't help the fact that the dispatcher cannot currently distinguish between "the user did not pass a value" and "the user explicitly doesn't want any timeout", as Florimond said earlier. Doing this would leave the behavior inconsistent across the library just as now.
Or am I just getting my mind in a twist this morning? Or is this with the decision that the defaults in those places would be DEFAULT_TIMEOUT_CONFIG?
Ok, so if I got it right, in summary we want:
Client instance: leave default DEFAULT_TIMEOUT_CONFIG.request(...) methods: change it to have the UNSET sentinelhttpx/api.py: change it to have default DEFAULT_TIMEOUT_CONFIG.send() methods in clients and dispatchers: change it to have the UNSET sentinelIs this correct?
@thebigmunch Ah yeah, not sure what I was thinking there. Easiest, most consistent then would be for the top-level ones to use the UNSET sentinal too, then. (Having them be DEFAULT_TIMEOUT_CONFIG would also work just fine, but I guess we may as well make them mirror their equivelent methods)
So, @jcugat I'd probably suggest the last three in that list should all be "use UNSET".
I think we've got that right, then?
What about the default in AsyncDispatcher.request and Dispatcher.request? What about the default here? I may be missing some spots; I just took a quick skim, so I don't have a good idea about their use and interaction with everything.
@tomchristie Considering #463 turned out to make the UNSET sentinel leak across the code base, I opened a bunch of PRs the to explore the option of using timeout=False to disable timeouts. They result in a rigorous consistency of what timeout=None/timeout=False means across the code base and for our users, which I find quite pleasing. They need to be reviewed/merged in order:
timeout=None as "use defaults".timeout=False as "don't timeout".Anyone interested is much welcome to take a look at and drop comments on those PRs as well. :)
@florimondmanca
1 == True. So, True is technically a valid input, just not in a way people would likely expect.timeout=False does not really read as what you'd expect for "no timeout" when compared to timeout=None.httpx has to be beholden to requests' API or any other library, it should be considered that httpx would be changing the meaning of timeout=None as compared to requests and, IIRC, aiohttp. Can't say I've had enough experience with other http client libraries to determine if that's true of them or not.I certainly didn't really like the way UNSET looked going throughout the library either. My only semi-reasonable suggestion for a builtin constant that would mean don't timeout would be ellipsis. The problem there is that most people don't actually know that exists. And any prior http client library usage would probably lead them to assume they should use None.
Another option could be to just set the default parameter values to 5, leaving timeout=None to mean no timeouts.
@thebigmunch Thanks for your comments.
Your first point makes me think — what if we switched from Noneto True as the default, and disallowed None altogether? Then, we'd be able to raise a nice and clean exception to our users:
httpx.get(url, timeout=True) # Default behavior: apply default timeouts.
httpx.get(url, timeout=False) # *Don't* apply timeouts.
httpx.get(url, timeout=None) # ERROR: Please use 'timeout=False' to disable timeouts
I mean, if we decided to break backwards compatibility w.r.t. to timeouts (Requests applies no timeouts by default, but HTTPX does), it'd be nice to make it loud a clear in the code, right?
Another option could be to just set the default parameter values to
5, leavingtimeout=Noneto mean no timeouts.
Applying DEFAULT_TIMEOUT_CONFIG internally whenever relevant still seems like a valid option, yes. @sethmlarson also suggested this in https://github.com/encode/httpx/pull/493#issuecomment-544265312. Upside: we'd be able to keep timeout=None as "no timeout". Downside: bunch of changes required to the code, but I guess we can live up to that. :)
Another option could be to just set the default parameter values to 5, leaving timeout=None to mean no timeouts.
Seems like a very sensible suggestion, right? Not sure why it took us such a winding road to get there. 😌
Seems like a very sensible suggestion, right? Not sure why it took us such a winding road to get there.
Sometimes its difficult to see the wood for the trees ^_^
So, unless I'm really missing something here, I think the change to timeout=5 is not as trivial as we'd like it to be. Consider this example:
import asyncio
import httpx
async def main(url: str) -> None:
async with httpx.ConnectionPool(timeout=1e-2) as http:
r = await http.request("GET", url)
asyncio.run(main("http://localhost:8000"))
I guess we're expecting timeout=1e-2 to propagate to the .request() call, right?
Except… .request() has its own timeout parameter, so defaulting it to timeout=5 too means we have no way to tell if a timeout was passed, or if we should fallback to self.timeout.
And this pattern is present in virtually the entire codebase: in .send(), .connect(), .read(), .write(), etc.
I understand the initial goal of having timeout parameters present everywhere is to be able to use small components of HTTPX independently — and I think Starlette has proved that this approach is very beneficial. But it seems to me having a well-defined timeout everywhere and allowing components to be used independently or as part of a wider system is surprisingly not easy to achieve.
Again, am I just missing something really obvious? 😕
I think this wouldn't be a problem if we were using **kwargs. We'd basically be able to use everywhere:
def send(self, **kwargs):
timeout = kwargs.get("timeout", self.timeout)
...
But obviously **kwargs goes against our efforts of providing good editor and type checking support, so, yeah… At this point, I can't find of any other way to check for "this kwarg was not given" other than using a sentinal value. (Or it seems like we'd need to abandon per-call timeout config, e.g. on ConnectionPool and other dispatchers — and then maybe even Client, because how are we then supposed to pass the overidden timeout to the dispatcher? By setting its timeout attribute (provided it has one)? — which is obviously not acceptable.)
If anyone is up for tackling this (and give the problem another POV), I'd be super happy. I feel like I'm going crazy over the issues that arise when actually trying to make this change, and don't have the resources to dig any deeper right now unfortunately.
Again, am I just missing something really obvious?
I definitely mentioned it in regard to Client and its methods in your PR. I hadn't thought past that to see if it cropped up even deeper though.
Or it seems like we'd need to abandon per-call timeout config, e.g. on
ConnectionPooland other dispatchers — and then maybe evenClient, because how are we then supposed to pass the overiddentimeoutto the dispatcher?
In the prior art of requests, they quite specifically kept timeout out of Session because they wanted to keep a semantic separation between the session and the adapter layers (client vs dispatcher in httpx).
I think this wouldn't be a problem if we were using
**kwargs
Again, probably the solution requests used for the same problem.
But obviously
**kwargsgoes against our efforts of providing good editor and type checking support, so, yeah…
This would also go against the decision to remove unsupported parameters for some of the methods.
If anyone is up for tackling this (and give the problem another POV), I'd be super happy. I feel like I'm going crazy over the issues that arise when actually trying to make this change, and don't have the resources to dig any deeper right now unfortunately.
Yeah, I hadn't yet dove deep enough to see the full extent of the rabbit hole... until just now. Ouch.