I think our tack onto this should probably be to match requests API wherever possible (so no built-in retry functionality). What we should defiantly do tho would be to do a good job of documenting how to implement this or other functionality with a custom dispatcher. That way we can make sure we're allowing for a range of differing behaviors, without having to extend our API surface area.
I'm not absolute on this one, so perhaps could reassess it at a later date, but for the time being I'd like to treat it as out of scope.
I'm actually of the opposite opinion here. Every requests usage I've seen in the wild uses urllib3's Retry, let's not repeat that lapse of API. :)
Oh right, I was under the impression that it disables it, but I must be misremembering?
Okay, looks like it's disabled by default, but can be enabled in the HTTPAdapter API... https://github.com/kennethreitz/requests/blob/master/requests/adapters.py#L113
Not documented in the QuickStart or Advanced guides there, but it is part of the API reference. https://2.python-requests.org/en/master/api/#requests.adapters.HTTPAdapter
So, sure, let's treat that as in-scope.
I have used backoff atm, but it would be nice to have some native solution 馃憤
Should the urllib3 team be pinged on implementation details? I could potentially work on porting it straight over, but I'm unsure about where it would lie in this library, and if someone else knows better on how it would interact with Sync vs Async they would probably be better to implement.
I'd love to be put on the review if you're willing to take a stab at implementation. :) No worries on getting it right the first time.
Also want to give attribution where it's due, so would probably start with a copy and reference comment, then work on async-ifying it... Will put some effort into it over the next couple weeks. I don't have much free time, so I'm not opposed to duplicate work on it.
It might be worth tackling it API-first, before jumping in on the implementation.
How many controls and dials do we really want the retry API to provide? What critical bits of control over that are actually needed/used in the wild? Whatever is the most minimal possible sounds good to me - we can always extend it out further in time if needed.
We may also want to think about allowing for more complex cases through customization rather than configuration. Eg. only provide for specifying "allow N retries", but also provide an implementation hook that folks can build more complex cases off of.
That way you keep the API surface area nice and low, while still allowing flexiblity, or third party packages.
I'll start by enumerating urllib3's existing functionality for Retry:
read Number of times that a read operation can failwrite Number of times that a write operation can failconnect Number of times that an operation related to creating a connection can fail (Doesn't apply to read timeouts on TLS, those are under read)redirect Number of times that a redirect can be followedstatus Number of times that we can retry a request based on the received responses status code being in the status_forcelist and the request method being in method_whitelist.raise_on_redirect and raise_on_status returns whether or not we should raise an error or just return the response.respect_retry_after is whether a retry based on a response should respect he Retry-After header by sleeping.remove_headers_on_redirect is a set of headers that should be removed from subsequent requests when a redirect is issued.IMO the raise_on_redirect and raise_on_status are things that don't need to be attached to the Retry class?
Here's a typed skeleton of the current Retry object's public interface:
https://gist.github.com/StephenBrown2/2fc6bab18b30037488deb0f4db92e001
So I definitely want to make the new Retry object have one critical feature that actually makes sub-classing useful: Have a method that is subclassable that passes in the Request and the Response and then by some interface is allowed to decide whether a retry occurs and in addition also allows modifying the Request that will be emitted for that retry.
This exact functionality is tough to implement in urllib3 because we don't allow you to modify headers and we don't give the user everything they might want to know (such as the whole request that was sent)
Doing this allows so many things to be implemented simply by sub-classing the Retry. It's pretty critical actually that we get this interface right because there's a lot of functionality related to HTTP that involves multiple requests (Authentication, stripping headers on redirect, retry-after, resumable uploads, caching) that users have been panging for but is tough to implement without the right interface.
I'd rather Retry was an ABC and could be implemented without subclassing
Could you explain the benefit of not needing to implement with sub-classing in this situation? I'm not seeing one right away.
There鈥檚 some great stuff to dig into here, tho I don鈥檛 think it鈥檚 on the critical path.
The API will be retries=int|RetryConfig available either per-client or per-request.
Finer grained control and/or a method override will then exist on the RetryConfig.
Personally, I don't see a need to support the int option, as all the options could have good defaults and one can simply use retries=Retries() to get those.
What I would like to see, as the minimum for my needs right now, is:
total: int, Need this to be an overall limit, read/write specificity not needed, but I can see it would be useful as reads would be retryable much more than writes, though that can be handled with...3status_codes: set, A set of status codes (Or HTTPStatus values?) to retry on, as the API I'm working with has no real concept of the difference between GET and POST, so also good would be...frozenset({429, 503})http_methods: set, A set of HTTP methods that can be retried, would take care of the read/write specificity as mentioned above.frozenset({"HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"})sleep/backoff: int, I would assume respect_retry_after to be True by default, and specifying the sleep option would be a constant time to wait between retries if there is no Retry-After header, while backoff would be an exponential backoff factor.0https://pypi.org/project/backoff/ as mentioned by @Hellowlol might be something to look at for inspiration as well.
Noting here that having respect_retry_after on by default can cause a DoS attack on the HTTP client by setting excessive wait times. If it's on by default we need a reasonably short max_retry_after value.
And I'm of the opinion that having great defaults for the RetryConfig makes having an int -> total being acceptable being ever more desirable rather than less.
And I'm of the opinion that having great defaults for the RetryConfig makes having an int -> total being acceptable being ever more desirable rather than less.
100%. No reason folks should have to think about this.
Noting here that having
respect_retry_afteron by default can cause a DoS attack on the HTTP client by setting excessive wait times. If it's on by default we need a reasonably shortmax_retry_aftervalue.And I'm of the opinion that having great defaults for the
RetryConfigmakes having anint->totalbeing acceptable being ever more desirable rather than less.
How about just a max_retry_after which can be set to 0 to not respect Retry-After
I'm okay with having max_retry_after being zero mean we don't respect Retry-After and I guess None being no limit? We just have to be careful designing APIs where specific values have special meanings because it makes extending those APIs tougher and more confusing for the user and makes code using our library harder to read. In this case I think that Retry-After is very well defined and unlikely to be extended so can be treated this way.
Since 0 is Falsey, that would work, though I might prefer an explicit False being documented, the check would remain the same if respect_retry_after and max_retry_after: or similar.
Per #134 we're going to hold off on implementing this feature.
Most helpful comment
Not documented in the QuickStart or Advanced guides there, but it is part of the API reference. https://2.python-requests.org/en/master/api/#requests.adapters.HTTPAdapter
So, sure, let's treat that as in-scope.