The GitHub API limits searches to 1000 results. This limit affects searches performed via PyGitHub, such as GitHub.search_issues.
It seems there is no indication that your search has hit this limit - there is no exception or error that I am aware of. Perhaps there should be an exception raised when this happens (if it can be detected).
It is possible to work around this limit by issuing multiple search queries, but such queries must be tailored to suit the particular goals of the query - for example iterating over search_issues by progressive date ranges - and I cannot think of a way to generalise this.
Any thoughts on how to address this? Is there a general solution?
Note that this issue has nothing to do with rate limiting or pagination of results.
Here is a workaround demonstrating how to retrieve all pull requests in a range of dates, even if there are more than 1000 results:
EDIT: I will rewrite this to be a method that yields, rather than a class, will be simpler
class PullRequestQuery:
def __init__(self, git, repo, since, until):
self.git = git
self.repo = repo
self.until = until
self.issues = self.__query(since, until)
def __iter__(self):
skip = False
while True:
results = False
for issue in self.issues:
if not skip:
results = True
yield issue.as_pull_request()
skip = False
# If no more results then stop iterating.
if not results:
break
# Start new query picking up where we left off. Previous issue will be first one returned, so skip it.
self.issues = self.__query(issue.closed_at.strftime('%Y-%m-%dT%H:%M:%SZ'), self.until)
skip = True
def __query(self, since, until):
querystring = 'type:pr is:closed repo:%s/%s closed:"%s..%s"' % (self.repo.organization.login, self.repo.name, since, until)
return self.git.search_issues(query=querystring, sort="updated", order="asc")
With this class, you can now do this sort of thing:
git = Github(user, passwd)
org = git.get_organization(orgname)
repo = org.get_repo(reponame)
for pull in PullRequestQuery(git, repo, "2017-01-01", "2017-12-31"):
print "%s: %s" % (pull.number, pull.title)
Reading the Github API docs about search, I also notice that incomplete_results
is missing as part of the search-results processin in PyGithub. Probably including that value might also already help out with detecting if search results might be (in)complete.
Now that I have PyGithub forked and running locally from source (I'm looking at #606) perhaps I can investigate this further.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
re-open this issue ?
and for a general solution for other searches as well
Most helpful comment
re-open this issue ?
and for a general solution for other searches as well