Using the .get_contents()
method to try to download a large file raises the error:
{'errors': [{'code': 'too_large', 'field': 'data',
'resource': 'Blob'}],
'message': 'This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.',
'documentation_url': 'https://developer.github.com/v3/repos/contents/#get-contents'}
Is there a way of detecting this and passing over to another handler that can download the file?
For example, if something like this fails:
contents = repository.get_dir_contents(urllib.parse.quote(server_path), ref=sha)
for content in contents:
if content.type != 'dir':
file_content = repository.get_contents(urllib.parse.quote(content.path), ref=sha)
optionally revert to:
file_content = repository.get_git_blob(content.sha)
I've run into this problem before too. In my case, since I always had the SHA of the blob, I just used git_git_blob
instead.
However, get_git_blob
doesn't work for any object type besides blob
(hence the name). You need to know the type of the object before attempting to call it.
To do the fallback, you need to know two pieces of information:
If get_contents
fails, it doesn't tell you either of these things. There isn't really any good way of doing the fallback as far as I can tell.
Closed as wontfix
. If anyone has a good idea on how to solve this, I'm happy to reopen. As far as I can tell, it doesn't look like it's possible to do in a clean way.
I have the same problem and end up doing something along the line of.
file_contents = repo.get_contents(dir_name, ref=branch)
then sha
exists for each file_content
, and the following could be used to grab the blob of each file
for file_content in file_contents:
try:
if file_content.encoding != 'base64':
# some error ...
# ok...
except GithubException:
# if file_content DOES NOT HAVE encoding, it is a large file
blob = repo.get_git_blob(file_content.sha)
# do something with blob
If path_name
refers to a single file that is larger than 1M, it has to be some try/exception block like follows:
try:
res = repo.get_contents(path_name, ref=branch)
# ok, we have the content
except GithubException:
return get_blob_content(repo, branch, path_name)
where get_blob_content
is something like
def get_blob_content(repo, branch, path_name):
# first get the branch reference
ref = repo.get_git_ref(f'heads/{branch}')
# then get the tree
tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
# look for path in tree
sha = [x.sha for x in tree if x.path == path_name]
if not sha:
# well, not found..
return None
# we have sha
return repo.get_git_blob(sha[0])
Real code with error-checking is longer, but the idea is here.
When get the blob, following code will be useful.
blob = repo.get_git_blob(sha[0])
b64 = base64.b64decode(blob.content)
return b64.decode("utf8")
Also, update file will also encounter with this problem.
raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/reference/repos#get-repository-content"} getting this error when trying to download a got repository files for master branch
Most helpful comment
I have the same problem and end up doing something along the line of.
then
sha
exists for eachfile_content
, and the following could be used to grab the blob of each fileIf
path_name
refers to a single file that is larger than 1M, it has to be some try/exception block like follows:where
get_blob_content
is something likeReal code with error-checking is longer, but the idea is here.