Conda-forge.github.io: GitHub API rate limit exceeded for conda-forge-admin

Created on 13 Apr 2016 · 44Comments · Source: conda-forge/conda-forge.github.io

Impressive!

https://travis-ci.org/conda-forge/conda-forge.github.io/jobs/122659551

Running command: ['python', '/home/travis/build/conda-forge/conda-forge.github.io/scripts/update_teams.py', './feedstocks_repo/feedstocks']
Traceback (most recent call last):
  File "/home/travis/build/conda-forge/conda-forge.github.io/scripts/update_teams.py", line 105, in <module>
    team._requester.requestJsonAndCheck("PUT", url, input={"permission": "push"})
  File "/home/travis/conda/tmp_envs/0a2f51c4c88b9dcaf16e/lib/python3.5/site-packages/github/Requester.py", line 171, in requestJsonAndCheck
    return self.__check(*self.requestJson(verb, url, parameters, headers, input, cnx))
  File "/home/travis/conda/tmp_envs/0a2f51c4c88b9dcaf16e/lib/python3.5/site-packages/github/Requester.py", line 179, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 403 {'documentation_url': 'https://developer.github.com/v3/#rate-limiting', 'message': 'API rate limit exceeded for conda-forge-admin.'}
CalledProcessError: Command '['python', '/home/travis/build/conda-forge/conda-forge.github.io/scripts/update_teams.py', './feedstocks_repo/feedstocks']' returned non-zero exit status 1

Looks like that is over 5000 requests an hour! https://developer.github.com/v3/#rate-limiting

Might be that the teams script needs to be a little smarter.... 😄

Source

pelson

All 44 comments

Haha, yeah, I saw this once and I just assumed it was a fluke. So, restarted the script and it went away. Unfortunately, I forgot to add an issue, thanks for doing this. Are there multiple tokens already in use for the different automated services? If not, that seems like a relatively easy first step to address this.

jakirkham on 13 Apr 2016

See also this: http://itsalocke.com/using-travis-make-sure-use-github-pat/ ->only applies if you have multiple repos/accounts? running the scripts.

jankatins on 14 Apr 2016

Yeah, @ocefpaf and I are seeing that at staged-recipes now too. See this comment and the log.

jakirkham on 16 Apr 2016

...and again at this repo. ( https://github.com/conda-forge/conda-forge.github.io/pull/97#issuecomment-210698029 ) Alright, let's see what we can do.

jakirkham on 16 Apr 2016

Now that I think about it. I think the issue here is that the linter bot makes the bulk of the requests. We should make the linter bot a separate user so it doesn't interfere with the feedstock pipeline.

jakirkham on 16 Apr 2016

The linter makes 3 requests per check. I don't have numbers on how many checks it does an hour, but I've added a new @conda-forge-linter user who does all of our linting for us 😄

That is probably only a sticking plaster though - I suspect the feedstock team management is causing a good number of these rate limit issues.

pelson on 20 Apr 2016

👍1

I don't know. Before we started having the linter, we didn't see these problems. Now I love the linter and definitely want it to stay, but I think what you have already done might be enough to help the issue.

This all being said, we do need to figure out if we can combine multiple requests into one so that we don't have as many requests on our account at one time.

Not sure how or where to leverage caching yet, but maybe what we have done already improves the situation significantly and batching a few things makes this mostly unnecessary. The trick with caching is we potential have stale information and it is a bit trickier to get right. So, I don't want to tackle this until we are sure that we need it.

jakirkham on 20 Apr 2016

The other thing I would add is if the linter gets rate limited that is unfortunate as we love the linter, but it doesn't stop our feedstock pipeline.

The real thing we need to keep an eye on is if our feedstock pipeline is getting rate limited. When this happens, it really becomes an all hands on deck situation :ship:. So, if we need to split docs, feedstocks, and staged recipes into 3 automated users, that is something that should be doable without too much thought (hopefully) and keep things functioning smoothly :boat:.

jakirkham on 20 Apr 2016

I don't know. Before we started having the linter, we didn't see these problems.

In that time we have also grown massively - we probably have 2x as many packages, each with their own set of maintainers, given the team management script simply queries for every single user in every single package, I can quite reasonably understand the rate limit excess. The good news: should be easy enough to fix with caching. The bad news: I haven't done it yet :smile:.

pelson on 20 Apr 2016

If each user is a single query, that is bad. Fortunately we have gone with the slow growth rate model. :laughing:

jakirkham on 20 Apr 2016

If each user is a single query, that is bad.

It is worse than that. It is each _maintainer_ in each package. Way more than the number of users...

pelson on 20 Apr 2016

Ouch! So, recounting the same users in various ways.

jakirkham on 20 Apr 2016

👍1

Looks like one pass of the team management script is making ~1200 GitHub API calls.

pelson on 20 Apr 2016

Wow! Yep, that is totally a problem.

jakirkham on 20 Apr 2016

I've got some changes that drop that to ~500. I'm still looking for optimizations.

pelson on 20 Apr 2016

Well that is still a huge win, but yes lower is almost certainly needed. What sort of optimization(s) have you done so far?

jakirkham on 20 Apr 2016

What sort of optimization(s) have you done so far?

Pulled calls into the appropriate scope, but mainly only check member lists if they don't have the right number of members. Some deeper thought on team strategy is probably necessary.

pelson on 20 Apr 2016

I see. So if the members have changed, but the total is the same this won't catch that case. Though since we do not do automated removal this shouldn't be such a problem hopefully.

jakirkham on 20 Apr 2016

Yeah, this is getting pretty rough. Every time we merge a few recipes we are throttled. We have to really look at how we are doing things and assess more how to optimize them.

jakirkham on 3 May 2016

OK. Given we need to checkout the feedstock in order to get the list of names, we can check how old the last commit was on the repo to determine whether the team needs updating. We can then periodically (daily, weekly or even monthly) run the full process to determine who needs to be removed, or if there are any stragglers which missed the cut.

pelson on 6 May 2016

👍1

Has anyone requested a higher rate limit from github?

From their docs:

Staying within the rate limit

If you are using Basic Authentication or OAuth, and you are exceeding your rate limit, you can likely fix the issue by caching API responses and using conditional requests.

If you're using conditional requests and still exceeding your rate limit, please contact us to request a higher rate limit for your OAuth application.

ericdill on 26 May 2016

I'm writing something to them now, but I would appreciate some feedback. I'll put it in a Google Doc and link it here so you guys can take a look and see if there is anything you want to add or if I messed something up.

jakirkham on 26 May 2016

👍1

Here's the link ( https://docs.google.com/document/d/19HLtYPwg6IKAwmxPwL7Vd3AX0n47ANP-ZTpZROn-Cwc/edit?usp=sharing ). The language is a bit flowery so bare with me if I strayed way too far on that part. I opened it to editing for everyone so feel free to comment, add, edit, etc. as you feel appropriate.

jakirkham on 26 May 2016

cc @ocefpaf

jakirkham on 26 May 2016

Hey that's awesome @jakirkham ! Only significant suggestion is to not "bury the lead" and just start off with what we're asking for. Other than that its all just minor suggestions. (I don't feel comfortable just editing your document, so I left these in the "suggestion" mode where you can accept or reject :smile: )

ericdill on 26 May 2016

That's great, @ericdill! Thanks for giving this some thought.

Only significant suggestion is to not "bury the lead"...

Yeah, I have a bad habit of doing that.

I don't feel comfortable just editing your document, so I left these in the "suggestion" mode where you can accept or reject 😄

Sounds good. I'll leave this up here for a bit so some of the conference goers/vacationers have some time to look at it.

jakirkham on 26 May 2016

Looks good to me. Thanks for leading this effort John!
Em 26/05/2016 18:05, "jakirkham" [email protected] escreveu:

That's great, @ericdill https://github.com/ericdill! Thanks for giving
this some thought.

Only significant suggestion is to not "bury the lead"...

Yeah, I have a bad habit of doing that.

I don't feel comfortable just editing your document, so I left these in
the "suggestion" mode where you can accept or reject 😄

Sounds good. I'll leave this up here for a bit so some of the conference
goers/vacationers have some time to look at it.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/conda-forge/conda-forge.github.io/issues/88#issuecomment-221994634

ocefpaf on 26 May 2016

@jakirkham - have you sent that over? I have found the GitHub support folks to be super friendly and responsive in the past, so see no reason to delay.

pelson on 1 Jun 2016

Not yet. I have been out Monday and Tuesday. Will try to look at it again later today.

jakirkham on 1 Jun 2016

Though I should add all of @ericdill's changes looked sensible to me. I'm sure we could tweak things more, but don't want to enter the bikesheding domain with this.

jakirkham on 1 Jun 2016

👍1

Basically just accepted all of your changes, @ericdill. Would like to bring up the letter at the meeting tomorrow. Wouldn't mind a few more people making edits on it before sending.

jakirkham on 3 Jun 2016

Something else we should consider (though this may be very controversial idk), is breaking out the scripts/CI from the docs/webpage here. One issue that we are seeing is the CI pushes a commit after regenerating the webpage, which results in retriggering the CI. This gets a bit noisy. Also, the scripts we have accumulated here are really cool, but they are a bit buried currently. I don't have the perfect solution in mind, but these are some ideas that might be worth kicking around.

jakirkham on 3 Jun 2016

Could everyone please take one last look at this letter to GitHub about Rate Limiting ( https://docs.google.com/document/d/19HLtYPwg6IKAwmxPwL7Vd3AX0n47ANP-ZTpZROn-Cwc/edit?usp=sharing ) and make sure it is ok? I'd like to send this today.

jakirkham on 3 Jun 2016

I'll set a time limit of 8:00:00 PM UTC today. If I haven't heard anything by then, it is going to be sent.

jakirkham on 3 Jun 2016

A few weeks ago I saw a github issue where the github devs came in and offered suggestions how to better use the github "API". Maybe it's worth to link to such an issue or to the code which uses the API?

jankatins on 3 Jun 2016

Let's see what they say. If they decide that we should just have a higher limit, that's great and it will let us focus on other problems that we also need to solve. If they decide they want a few devs to take a look and provide suggestions, that's great too and I will happily direct them here, point out relevant code, and bring @conda-forge/conda-forge-webservices into the loop. Though I don't really want to preempt an answer from them.

jakirkham on 4 Jun 2016

Thanks everyone for your feedback. I have sent the email. Will keep everyone here in the loop.

jakirkham on 4 Jun 2016

So, I got a response back from GitHub. Here is a doc with the letter and an empty page where we can write a response if people feel they wish to. ( https://docs.google.com/document/d/1lzWNxvmEtrgjSBVrUWEO-imDryBOLRfObz3PkI9qT5Y/edit?usp=sharing )

The gist of the answer is this. Upping the rate limit will basically never be a satisfactory solution for us because we will need to keep requesting this be upped. They encourage us to leverage community members GitHub API limits in the pursuit of this reduction. At the end, they ask if this solution will work for us. Given how things are set up here, I basically don't think this is realistic, but please feel free to comment. We have the following options at this point. Beyond writing them back and thanking them for there response, we have the following options.

Figure something else out on our own.
Explain the situation more clearly.
1. Reask for the limit bump.
2. Ask for help. (i.e. @janschulz's suggestion of getting some GitHub devs to help)
Suggestions?

Please let me know how you guys want to proceed.

jakirkham on 8 Jun 2016

The OAuth will not work: there might be a few users which might "donate" their token, but if I understand the technical side the problem is that _other_ oauth using tools (-> Travis) need to use the conda-forge token (=user conda-forge-admin) to access github and can't use random users?

What I'm not sure about is what is actually contributing to the requests?

CI services?
the maintainance scripts?
the linter? -> already running in its own account

Maybe the CI Services could also all get their own conda-forge-travis|appveyor|circleci github accounts, so at least the rate limits do not affect builds?

jankatins on 8 Jun 2016

if I understand the technical side the problem is that _other_ oauth using tools (-> Travis) need to use the conda-forge token (=user conda-forge-admin) to access github and can't use random users?

Actually, that is not the case. They don't use any user's token (unless something changed). ( https://github.com/travis-ci/travis-ci/issues/4599#issuecomment-127965680 )

there might be a few users which might "donate" their token...

Was thinking that we could have a couple of maintenance bot accounts whose tokens we cycle through when we get rate limited. We wouldn't be the first to consider going down this road. ( https://github.com/travis-ci/travis-ci/issues/4073#issuecomment-111513464 )

CI services?...Maybe the CI Services could also all get their own conda-forge-travis|appveyor|circleci github accounts, so at least the rate limits do not affect builds?

Not really. It might happen that one request from them puts us over, but it wouldn't be the cause. We do use tokens over there. So it is really on the token that we are using that it goes over.

the linter? -> already running in its own account

The linter is totally independent. See this comment. It could have issues, but it is more likely to get confused by something that happens, which is rare. AFAICT it is doing fine. In any event, it uses a different token from anything where we are seeing this issue.

the maintainance scripts?

🎰 💰

So, there is one script that seems to basically put us out of commission and its the feedstock/team update script. Both parts of this are bad. Though the team update script is worse. It also shares the token with staged-recipes, which we may also want to consider changing (thoughts on this @pelson 😄). I know that @pelson looked at how to optimize the team update script and there is definitely room for it. Though I don't know what optimizations have already been included (if any) at this point. By all means feel free to propose changes, @janschulz, though you may want to coordinate with @pelson so you don't both write the same things. 😄

In any event, this is all orthogonal to the original question, which is what do we want to write back to GitHub? 😄

jakirkham on 8 Jun 2016

Added a proposal for how we might get some savings on the GitHub rate limit. ( https://github.com/conda-forge/conda-forge.github.io/issues/172 )

jakirkham on 8 Jun 2016

Tell me off if this is a stupid question, but would it be OK to get a short breakdown of the top couple of API calls that are pushing us over the limit? For example, when do we call the team update script and at what frequency? Although Heroku is an option for occasionally calling the update script, what if we just took the meta.yaml as gospel and on a PR merge we simple replaced the current set of users with the meta-yaml rather than even bothering to check?

patricksnape on 8 Jun 2016

Response to you in this comment, @patricksnape. Maybe we can take this discussion over there. 😉

That issue is more of a proposals to fix and in-progress strategies collection. The issue here has moved to discussing what we want to say to GitHub.

jakirkham on 9 Jun 2016

Basically we have done quite a bit in the last few months to address this. Going to close it out for now. We can revisit options as they become necessary again.

jakirkham on 23 Aug 2016

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"Code of Conduct" and "Reporting Guidelines"

prachi237 · 4Comments

Make package listing more prominent on the webpage

jakirkham · 4Comments

Permission for @ocefpaf to add conda-forge recipe for qhub.

dharhas · 3Comments

CI support for long build times

h-vetinari · 4Comments

Adding SECURITY.md

jakirkham · 4Comments