Dvc: gdrive: failed to download a file from remote

Created on 1 Sep 2020  路  7Comments  路  Source: iterative/dvc

Bug Report

Please provide information about your setup

Output of dvc version:

$ dvc version

Yet to be answered on Discord

Additional Information (if any):

https://discordapp.com/channels/485586884165107732/563406153334128681/750101349303058502

bug p0-critical

All 7 comments

Okay, d41d8cd98f00b204e9800998ecf8427e is an empty file md5 ... trying to reproduce this.

It is already solved in the most recent version (with a workaround #4286 ). The proper PR is pending here - https://github.com/iterative/PyDrive2/pull/48 . Closing this for now.

Can reproduce this with:

touch empty
dvc add empty
dvc push
rm -f empty
rm -rf .dvc/cache
dvc pull

Getting:

ERROR: failed to download 'gdrive://0APuU2w04mc7qUk9PVA/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
Everything is up to date.
ERROR: failed to pull data from the cloud - 1 files failed to download

Logs:

2020-09-05 08:29:16,101 DEBUG: Downloading 'gdrive://***/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e'
2020-09-05 08:29:17,106 ERROR: failed to download 'gdrive://***/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 353, in GetContentFile
    download(fd, files.get_media(fileId=file_id))
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 343, in download
    status, done = downloader.next_chunk()
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/googleapiclient/http.py", line 749, in next_chunk
    raise HttpError(resp, content, uri=self._uri)
googleapiclient.errors.HttpError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ivan/Projects/dvc/dvc/cache/local.py", line 31, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/Users/ivan/Projects/dvc/dvc/tree/base.py", line 387, in download
    return self._download_file(
  File "/Users/ivan/Projects/dvc/dvc/tree/base.py", line 445, in _download_file
    self._download(  # noqa, pylint: disable=no-member
  File "/Users/ivan/Projects/dvc/dvc/tree/gdrive.py", line 590, in _download
    self._gdrive_download_file(item_id, to_file, name, no_progress_bar)
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/flow.py", line 122, in retry
    return call()
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/Users/ivan/Projects/dvc/dvc/tree/gdrive.py", line 395, in _gdrive_download_file
    gdrive_file.GetContentFile(to_file, callback=pbar.update_to)
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/auth.py", line 84, in _decorated
    return decoratee(self, *args, **kwargs)
  File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 360, in GetContentFile
    raise exc
pydrive2.files.ApiRequestError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
2020-09-05 08:29:17,129 DEBUG: fetched: [(18,)]
Everything is up to date.
2020-09-05 08:29:17,131 ERROR: failed to pull data from the cloud - 1 files failed to download
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ivan/Projects/dvc/dvc/command/data_sync.py", line 26, in run
    stats = self.repo.pull(
  File "/Users/ivan/Projects/dvc/dvc/repo/__init__.py", line 34, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/Users/ivan/Projects/dvc/dvc/repo/pull.py", line 25, in pull
    processed_files_count = self._fetch(  # pylint: disable=protected-access
  File "/Users/ivan/Projects/dvc/dvc/repo/fetch.py", line 73, in _fetch
    raise DownloadError(failed)
dvc.exceptions.DownloadError: 1 files failed to download
------------------------------------------------------------
2020-09-05 08:29:17,156 DEBUG: Analytics is enabled.
2020-09-05 08:29:17,257 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/rw/vfwscnts4vn0d10gl7v510640000gn/T/tmp527igosh']'
2020-09-05 08:29:17,259 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/rw/vfwscnts4vn0d10gl7v510640000gn/T/tmp527igosh']'

A few notes/thoughts:

  1. Workaround for now:

touch .dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e

  1. Existing code fix that does gdrive_file.FetchMetadata(fields="fileSize") make download 2x slower and twice as expensive in case of many files. We should find a better way to handle this (check hash? catch 416?) and eventually fix properly on the PyDrive2 end.

Most likely, the problem with the current fix is that GDrive API returns files size as a string 'fileSize': '0'. It means that if size check is always true.

(still curious how did it work in that ticket)

Still, I would recommend to make it a priority to fix this properly or introduce a hack that compares hash since this is a _very_ expensive fix in terms of GDrive API.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shcheklein picture shcheklein  路  3Comments

mfrata picture mfrata  路  3Comments

dnabanita7 picture dnabanita7  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

robguinness picture robguinness  路  3Comments