With support for concurrent challenges (i.e. running multiple calls to Certificate.Obtain()
or Certificate.Renew()
at the same time -- different certs of course) I think it would be a good idea to make sure that lego doesn't hammer any the CA.
For example, if a client (user of lego) needs to obtain 100 certificates, it should be able to call Obtain()
100 times at once, but lego should be sure to space out the individual ACME API requests by some time.
So, a configurable rate limit such as MinAPIRequestInterval
would be useful.
The only alternative right now is for the user/importer of lego to do their own throttling. But this is not very effective... The difference between what I am proposing and just leaving it up to the user of lego do their own rate limiting is that rate limiting implemented into lego would apply directly to individual ACME API calls, which lego hides away from the user. The only thing the user can throttle themselves is the number of issuances at once, which is less useful because issuances are comprised of more than just a single API call. In other words, Certificate.Obtain() does not correspond to fixed amount of load on CA
. This detail is hidden from the user, and so the rate limit needs to be applied within lego.
For example: setting MinAPIRequestInterval
to 2 * time.Second
would allow a user to call Obtain()
100 times all at once, but lego would make sure that only 1 request goes to _each endpoint_ once every 2 seconds. (I think it is better for the rate limit to apply _per endpoint_ instead of all endpoints in the same bucket.)
This would speed up certificate issuances while respecting the CA's infrastructure.
What do you think?
I'm in favour of this: I was recently rate-limited because I needed to renew a bunch of certificates (N domains Ă— M subdomains Ă— Y ACME API calls; where N is monotonically increasing, M constant, and Y out of my control). Upsy-daisy :-)
On this particular host, N is controlled by customers adding new domains at any point in time.
My current mitigations are:
(1) already happens, apart from the spreading-out over time. (2) is work-in-progress.
@mholt: Do you think it's necessary to rate-limit calls to challenge-provider endpoints as well? Cloudflare for example, limits requests to 1200/5 min. This is large enough for Lego (which currently requires at most 3 requests/domain, but other providers might need more requests and/or enforce draconian limits.
I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).
This proposal can be linked to the rqs/s limit of LE:
The “new-reg”, “new-authz” and “new-cert” endpoints have an Overall Requests limit of 20 per second. The “/directory” endpoint and the “/acme” directory & subdirectories have an Overall Requests limit of 40 requests per second.
@ldez
As a lib, for me, the app using the lib must manage that because:
- I don't want that lego to have a state
- the app can create multiple client and introduce concurrent access
- Global mutable variables are never a good idea.
I've explained why the app cannot do this. Lego hides away the individual ACME calls, so even if the app is managing concurrent Obtain or Renew calls, it cannot throttle the requests to individual API endpoints.
I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).
Yes, I believe that's correct. My proposal is for throttling requests to individual API endpoints, regardless of how many concurrent Renew or Obtain are happening, or how frequently they are called. In other words, rate limits that have nothing to do with the actual certificate details.
This proposal can be linked to the rqs/s limit of LE:
Yep! Let's Encrypt has per-endpoint rate limits, and calls to Renew and Obtain don't necessarily correspond to the underlying API calls.
I am going to work on a PR that does the following:
map[string]TokenBucket
or something)The throttles will be global, so we'll probably have a mutex to protect the throttles map.
Some rate limits are per-account (e.g. Let's Encrypt's "300 new orders per account per 3 hours" rate limit) so perhaps we'd augment the endpoint with the account name in some way to make that a unique key.
I will try to submit a PR this weekend. Let me know if you have any objections that would prevent it from being accepted!
Well, what do you suggest then? The rate limits apply globally, whether we like it or not.
The variables must be linked to the client.
That's no good though, because multiple clients can be created, thus multiplying the limit for throttling. The point here is to prevent abuse (both accidental and intentional) of CA resources.
Global variables have a valid purpose, and this is one of them.
It's possible to use multiple clients with multiple CAs.
You can provide a rate limit strategy/manager to the client and re-use it between clients.
Hmm, I suppose, however this still relies on the user doing it correctly. One major reason for this change is to prevent accidentally hammering the CA. Besides, that rate limiting manager still has to be stored somewhere, and if it's not global, then more than one could exist, which again defeats the throttling.
I'm not sure we can get around the need for a global variable to do this correctly.
It seems like net/http
provides a good example here: There is a default HTTP client that is used in most cases, but you have the option of instantiating your own HTTP client that has different properties (and a different connection pool). By analogy, you could define a default limit manager that is used by most lego client instances, but allow the user to override the limit manager on their lego client, for instance if they are using a non-default CA.
That could work, but the default rate limiter (if one is not provided) still needs to be a singular instance somewhere (this is not a bad thing).
Most helpful comment
This proposal can be linked to the rqs/s limit of LE:
https://letsencrypt.org/fr/docs/rate-limits/