Go-github: Where is the best layer to add throttling to avoid being marked for abuse?

Created on 22 Sep 2016  路  3Comments  路  Source: google/go-github

GitHub API has some guidelines to follow when accessing their API to avoid hitting abuse rate limit:

https://developer.github.com/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits

An application should follow those rules. Some of them can be enforced in a general/automated way via throttling, I think, and so they can be factored out into a library (rather than having to be done manually inside the application layer). Specifically:

  1. > Make requests for a single user or client ID serially. Do not make requests for a single user or client ID concurrently.

If we can assume a single *github.Client maps to a single user or client ID, then this can be done by using a mutex or semaphore or similar.

  1. > If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

Again, if we can assume a single *github.Client maps to a single user or client ID, then this can be done by keeping track of the last POST, PATCH, PUT, or DELETE request time, and sleeping if neccessary to make the next one happen at least one second later.

  1. > When you have been limited, wait the number of seconds specified in the Retry-After response header.

We can detect abuse rate limit response (it's documented here), read the number of seconds specified in the Retry-After response header, and not allow making network requests until that time has passed (probably returning *RateLimitError error, or an abuse-specific version thereof, right away for all requests).

I see two places where this can be done:

  1. Possibly inside go-github library. Inside Client itself. Probably as a configurable option.
  2. Or outside of this library. As a higher-level wrapper layer around go-github library.

So far, go-github has been a relatively "low-level" API client, meaning it helps you out with type safety and provides structured output, but aside from that, it maps relatively closely to 1:1 to making HTTP requests to GitHub API.

It has taken some steps to be a smarter/higher-level client where it could be done so transparently, without blocking. For example, after #277, RateLimitError type was added, and after #347, go-github is smart enough to track rate limit reset time and not make network calls when the rate limit is known to still be exceeded, returning RateLimitError right away.

So far it has also been a "return-right-away" API, where none of the calls are artificially throttled on the client side, they just return an error if there's a rate limit problem. So it seems adding throttling to the go-github client itself may be out of place, and would belong in a higher-level wrapper around go-github.

On the other hand, if done as an option that can be controlled, perhaps this is the best place to do these things, since it means more people can use the GitHub API correctly with less work from their side, by default. The user of the github client could make calls concurrently, but they would block and get serialized by the client as to follow the GH API guidelines. It would effectively shift some of the throttling that would otherwise be done on GitHub server side, to the go-github library client side.

@willnorris, @gmlewis, what are your thoughts on this? Where is the best place for this functionality?

Related issues: #152, #153, #277, #347, #304.

question

Most helpful comment

Here's how we implemented linked best practices in our repository:
https://github.com/terraform-providers/terraform-provider-github/blob/fa73654b66e37b1fd8d886141d9c2974e24ba42f/github/transport.go#L42-L109

I'd be more than happy to raise a PR if there's any interest in bringing these transports upstream along with tests. I don't expect everyone will want/need to use them, but it would be IMO nice to keep it upstream.

All 3 comments

To be honest, every time I write a big API-dependent job, I do this:

githubDelay = 720 * time.Millisecond  // 5000 QPH = 83.3 QPM = 1.38 QPS = 720ms/query

and then call time.Sleep(githubDelay) before each and every API call. That way, no matter how long it takes to run my batch job, I never hit the API quota limit. The downside to this is if you have any other jobs running as that same GitHub owner (of the Personal Access Token), the two jobs might interfere with each other, which Will and I have been talking about a lot. We are hoping that the new Integrations API will help with this situation.

I know this doesn't really help much... but is just another data point.

Shortly after I wrote the comment above, I switched to using a time.Ticker:

const githubRate = 720 * time.Millisecond
...
ticker := time.NewTicker(githubRate)
...
<-ticker.C // Don't hammer GitHub API
// call to GitHub API

which allows my tool to run for potentially hours and not abuse the rate limit.

Here's how we implemented linked best practices in our repository:
https://github.com/terraform-providers/terraform-provider-github/blob/fa73654b66e37b1fd8d886141d9c2974e24ba42f/github/transport.go#L42-L109

I'd be more than happy to raise a PR if there's any interest in bringing these transports upstream along with tests. I don't expect everyone will want/need to use them, but it would be IMO nice to keep it upstream.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

adrienzieba picture adrienzieba  路  3Comments

propertone picture propertone  路  3Comments

gmlewis picture gmlewis  路  3Comments

gmlewis picture gmlewis  路  3Comments

gmlewis picture gmlewis  路  3Comments