Pygithub: github.GithubException.RateLimitExceededException

Created on 7 May 2019  路  15Comments  路  Source: PyGithub/PyGithub

I am trying to fetch the number of open issues using the following code in my Flask application.

g = Github()

repo = g.get_repo(repo_name)

open_pulls = repo.get_pulls(state='open')
open_pull_titles = [pull.title for pull in open_pulls]

open_issues = repo.get_issues(state='open')
open_issues = [issue for issue in open_issues if issue.title not in open_pull_titles]

and I get the error github.GithubException.RateLimitExceededException:.

repo.get_issues() returns the count of open issues plus pull requests.

stale

Most helpful comment

If I understand correctly, the return type of get_issues and get_pulls is PaginatedList. It uses yield element for iteration. So the request is not performed until open_issues = [issue for issue in open_issues if issue.title not in open_pull_titles]. If your token limits are reached, it will throw the RateLimitExceedException.

All 15 comments

If I understand correctly, the return type of get_issues and get_pulls is PaginatedList. It uses yield element for iteration. So the request is not performed until open_issues = [issue for issue in open_issues if issue.title not in open_pull_titles]. If your token limits are reached, it will throw the RateLimitExceedException.

Is there any workaround?

g = Github()
Did you authenticate at this step? public api has less rate limits.

g = Github()
Did you authenticate at this step? public api has less rate limits.

Yes, I did authenticate that step.

@242jainabhi
The things I usually do when reaching the rate limit is just holding off the program for some time.

Instead of using the list comprehension, I may use just a common loop with try-catch. Once a rate limit exception is caught, call the sleep function to wait for a while and check the rate limit with GitHub API again. The code only proceeds if the rate limit comes back to 5000.

I am sorry for missing a part of the code. Below is the code in continuation to the code in first comment.
I am accessing created_at date for all the open issues. This will again access the API for all the issues and hence end up making calls more than the limit.

for issue in open_issues:
    created_at = issue.created_at.timestamp()

I could not find a solution to this problem. Even if I authenticate the requests, the limit will be exhausted if the issues are too many (let's say 2000).

Something like this:

repositories = g.search_repositories(
    query='stars:>=10 fork:true language:python')

Also triggers the rate limit. I assume, it's doing the pagination automatically and that's triggering the rate limit? Is there any way for me to do it manually so I can pause?

I now realize that this is a real issue.
One possible workaround could be code snippet like the below. I have not tried it yet, let me know whether it works or not.

iter_obj=iter(open_issues) ## PaginatedList is a generator
 while True:
    try:   
        issue=next(iter_obj) 
        ## do something
    except StopIteration:
        break  # loop end
    except github.GithubException.RateLimitExceededException:
        sleep(3600) # sleep 1 hour
        ## check token limits
        continue

@wangpeipei90 Does not work.

For some reason this behavior is highly unpredictable and it's maddening.
My program can effectively cycle and preemptively call the ratelimit api to check if it adheres within the limits for one to two hours before randomly giving a 403.

Some native rate limit adherence would be a warm welcome here. Having to implement sleeps based on intuition when your application decides to spill the beans after two hours of running smoothly should not be expected behavior.

 File "word.py", line 126, in get_stargazers_inner
    for i in repo.get_stargazers_with_dates():
  File "/usr/local/lib/python3.6/dist-packages/github/PaginatedList.py", line 62, in __iter__
    newElements = self._grow()
  File "/usr/local/lib/python3.6/dist-packages/github/PaginatedList.py", line 74, in _grow
    newElements = self._fetchNextPage()
  File "/usr/local/lib/python3.6/dist-packages/github/PaginatedList.py", line 199, in _fetchNextPage
    headers=self.__headers
  File "/usr/local/lib/python3.6/dist-packages/github/Requester.py", line 276, in requestJsonAndCheck
    return self.__check(*self.requestJson(verb, url, parameters, headers, input, self.__customConnection(url)))
  File "/usr/local/lib/python3.6/dist-packages/github/Requester.py", line 287, in __check
    raise self.__createException(status, responseHeaders, output)
github.GithubException.RateLimitExceededException: 403 {'message': 'API rate limit exceeded for user ID xxxx.', 'documentation_url': 'https://developer.github.com/v3/#rate-limiting'}

Additionally, one could use the backoff library -- however it can not account for the current position in item iteration and will therefore start from scratch again.

Well, I visited https://github.com/settings/tokens and did a "Regenerate token". That got me rolling again, but I'm not sure for how long.

I used the "token" method of authentication. Example:

    github = Github("19exxxxxxxxxxxxxxxxxxxxxe3ab065edae6470")

See also #1233 for excessive requests.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@wangpeipei90

It works for me, but the RateLimitExceededException is not under GithubException in the version I used. Here is my code.

from github import RateLimitExceededException

issues = g.search_issues(query=keyword, **{'repo': repo, 'type': 'pr'})
            iter_obj = iter(issues)
            while True:
                try:
                    pr = next(iter_obj)
                    with open(pr_file, 'a+') as f:
                        f.write(pr.html_url + '\n')
                    count += 1
                    logger.info(count)
                except StopIteration:
                    break  # loop end
                except RateLimitExceededException:
                    search_rate_limit = g.get_rate_limit().search
                    logger.info('search remaining: {}'.format(search_rate_limit.remaining))
                    reset_timestamp = calendar.timegm(search_rate_limit.reset.timetuple())
                    # add 10 seconds to be sure the rate limit has been reset
                    sleep_time = reset_timestamp - calendar.timegm(time.gmtime()) + 10
                    time.sleep(sleep_time)
                    continue

Here is part of the log:

2020/01/08 23:42:09 PM - INFO - search remaining: 0

Thanks, @Xiaoven I am finally able to solve this with your code.

Was this page helpful?
0 / 5 - 0 ratings