Lego: Throttle individual API requests

Created on 28 Sep 2019  Â·  13Comments  Â·  Source: go-acme/lego

With support for concurrent challenges (i.e. running multiple calls to Certificate.Obtain() or Certificate.Renew() at the same time -- different certs of course) I think it would be a good idea to make sure that lego doesn't hammer any the CA.

For example, if a client (user of lego) needs to obtain 100 certificates, it should be able to call Obtain() 100 times at once, but lego should be sure to space out the individual ACME API requests by some time.

So, a configurable rate limit such as MinAPIRequestInterval would be useful.

The only alternative right now is for the user/importer of lego to do their own throttling. But this is not very effective... The difference between what I am proposing and just leaving it up to the user of lego do their own rate limiting is that rate limiting implemented into lego would apply directly to individual ACME API calls, which lego hides away from the user. The only thing the user can throttle themselves is the number of issuances at once, which is less useful because issuances are comprised of more than just a single API call. In other words, Certificate.Obtain() does not correspond to fixed amount of load on CA. This detail is hidden from the user, and so the rate limit needs to be applied within lego.

For example: setting MinAPIRequestInterval to 2 * time.Second would allow a user to call Obtain() 100 times all at once, but lego would make sure that only 1 request goes to _each endpoint_ once every 2 seconds. (I think it is better for the rate limit to apply _per endpoint_ instead of all endpoints in the same bucket.)

This would speed up certificate issuances while respecting the CA's infrastructure.

What do you think?

arelib

Most helpful comment

This proposal can be linked to the rqs/s limit of LE:

The “new-reg”, “new-authz” and “new-cert” endpoints have an Overall Requests limit of 20 per second. The “/directory” endpoint and the “/acme” directory & subdirectories have an Overall Requests limit of 40 requests per second.

https://letsencrypt.org/fr/docs/rate-limits/

All 13 comments

I'm in favour of this: I was recently rate-limited because I needed to renew a bunch of certificates (N domains Ă— M subdomains Ă— Y ACME API calls; where N is monotonically increasing, M constant, and Y out of my control). Upsy-daisy :-)

On this particular host, N is controlled by customers adding new domains at any point in time.

My current mitigations are:

  1. Create a separate SAN for each domain, so I can spread out the renewal across a longer time period (this creates N bursts of MĂ—Y ACME calls).
  2. Switch to wildcard certifcates (which eliminates M).

(1) already happens, apart from the spreading-out over time. (2) is work-in-progress.

@mholt: Do you think it's necessary to rate-limit calls to challenge-provider endpoints as well? Cloudflare for example, limits requests to 1200/5 min. This is large enough for Lego (which currently requires at most 3 requests/domain, but other providers might need more requests and/or enforce draconian limits.

I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).

https://letsencrypt.org/fr/docs/rate-limits/

This proposal can be linked to the rqs/s limit of LE:

The “new-reg”, “new-authz” and “new-cert” endpoints have an Overall Requests limit of 20 per second. The “/directory” endpoint and the “/acme” directory & subdirectories have an Overall Requests limit of 40 requests per second.

https://letsencrypt.org/fr/docs/rate-limits/

@ldez

As a lib, for me, the app using the lib must manage that because:

  • I don't want that lego to have a state
  • the app can create multiple client and introduce concurrent access
  • Global mutable variables are never a good idea.

I've explained why the app cannot do this. Lego hides away the individual ACME calls, so even if the app is managing concurrent Obtain or Renew calls, it cannot throttle the requests to individual API endpoints.

I think that the proposal of @mholt will not solve your issue @dmke because it's a throttling on API calls, not a way to spreading-out over time to allow to overflow the LE rate limits (like "Certificates per Registered Domain (50 per week)" or "maximum of 300 New Orders per account per 3 hours" or "Duplicate Certificate limit of 5 per week", etc.).

Yes, I believe that's correct. My proposal is for throttling requests to individual API endpoints, regardless of how many concurrent Renew or Obtain are happening, or how frequently they are called. In other words, rate limits that have nothing to do with the actual certificate details.

This proposal can be linked to the rqs/s limit of LE:

Yep! Let's Encrypt has per-endpoint rate limits, and calls to Renew and Obtain don't necessarily correspond to the underlying API calls.

I am going to work on a PR that does the following:

  • Defines a global throttle per ACME endpoint (map[string]TokenBucket or something)
  • All ACME calls honor the throttle (if any) associated with its endpoint

The throttles will be global, so we'll probably have a mutex to protect the throttles map.

Some rate limits are per-account (e.g. Let's Encrypt's "300 new orders per account per 3 hours" rate limit) so perhaps we'd augment the endpoint with the account name in some way to make that a unique key.

I will try to submit a PR this weekend. Let me know if you have any objections that would prevent it from being accepted!

  • no global variables

Well, what do you suggest then? The rate limits apply globally, whether we like it or not.

The variables must be linked to the client.

That's no good though, because multiple clients can be created, thus multiplying the limit for throttling. The point here is to prevent abuse (both accidental and intentional) of CA resources.

Global variables have a valid purpose, and this is one of them.

It's possible to use multiple clients with multiple CAs.

You can provide a rate limit strategy/manager to the client and re-use it between clients.

Hmm, I suppose, however this still relies on the user doing it correctly. One major reason for this change is to prevent accidentally hammering the CA. Besides, that rate limiting manager still has to be stored somewhere, and if it's not global, then more than one could exist, which again defeats the throttling.

I'm not sure we can get around the need for a global variable to do this correctly.

It seems like net/http provides a good example here: There is a default HTTP client that is used in most cases, but you have the option of instantiating your own HTTP client that has different properties (and a different connection pool). By analogy, you could define a default limit manager that is used by most lego client instances, but allow the user to override the limit manager on their lego client, for instance if they are using a non-default CA.

That could work, but the default rate limiter (if one is not provided) still needs to be a singular instance somewhere (this is not a bad thing).

Was this page helpful?
0 / 5 - 0 ratings