Output of dvc version:
$ dvc version
Yet to be answered on Discord
Additional Information (if any):
https://discordapp.com/channels/485586884165107732/563406153334128681/750101349303058502
Okay, d41d8cd98f00b204e9800998ecf8427e is an empty file md5 ... trying to reproduce this.
It is already solved in the most recent version (with a workaround #4286 ). The proper PR is pending here - https://github.com/iterative/PyDrive2/pull/48 . Closing this for now.
Can reproduce this with:
touch empty
dvc add empty
dvc push
rm -f empty
rm -rf .dvc/cache
dvc pull
Getting:
ERROR: failed to download 'gdrive://0APuU2w04mc7qUk9PVA/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
Everything is up to date.
ERROR: failed to pull data from the cloud - 1 files failed to download
Logs:
2020-09-05 08:29:16,101 DEBUG: Downloading 'gdrive://***/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e'
2020-09-05 08:29:17,106 ERROR: failed to download 'gdrive://***/repositories/species/d4/1d8cd98f00b204e9800998ecf8427e' to '.dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e' - <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 353, in GetContentFile
download(fd, files.get_media(fileId=file_id))
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 343, in download
status, done = downloader.next_chunk()
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/googleapiclient/http.py", line 749, in next_chunk
raise HttpError(resp, content, uri=self._uri)
googleapiclient.errors.HttpError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/ivan/Projects/dvc/dvc/cache/local.py", line 31, in wrapper
func(from_info, to_info, *args, **kwargs)
File "/Users/ivan/Projects/dvc/dvc/tree/base.py", line 387, in download
return self._download_file(
File "/Users/ivan/Projects/dvc/dvc/tree/base.py", line 445, in _download_file
self._download( # noqa, pylint: disable=no-member
File "/Users/ivan/Projects/dvc/dvc/tree/gdrive.py", line 590, in _download
self._gdrive_download_file(item_id, to_file, name, no_progress_bar)
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/flow.py", line 122, in retry
return call()
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/Users/ivan/Projects/dvc/dvc/tree/gdrive.py", line 395, in _gdrive_download_file
gdrive_file.GetContentFile(to_file, callback=pbar.update_to)
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/auth.py", line 84, in _decorated
return decoratee(self, *args, **kwargs)
File "/Users/ivan/Projects/dvc/.env/lib/python3.8/site-packages/pydrive2/files.py", line 360, in GetContentFile
raise exc
pydrive2.files.ApiRequestError: <HttpError 416 when requesting https://www.googleapis.com/drive/v2/files/1uPsyM_oqKLQXrL49al2uOaBiH72Ire-u?alt=media returned "Request range not satisfiable">
------------------------------------------------------------
2020-09-05 08:29:17,129 DEBUG: fetched: [(18,)]
Everything is up to date.
2020-09-05 08:29:17,131 ERROR: failed to pull data from the cloud - 1 files failed to download
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/ivan/Projects/dvc/dvc/command/data_sync.py", line 26, in run
stats = self.repo.pull(
File "/Users/ivan/Projects/dvc/dvc/repo/__init__.py", line 34, in wrapper
ret = f(repo, *args, **kwargs)
File "/Users/ivan/Projects/dvc/dvc/repo/pull.py", line 25, in pull
processed_files_count = self._fetch( # pylint: disable=protected-access
File "/Users/ivan/Projects/dvc/dvc/repo/fetch.py", line 73, in _fetch
raise DownloadError(failed)
dvc.exceptions.DownloadError: 1 files failed to download
------------------------------------------------------------
2020-09-05 08:29:17,156 DEBUG: Analytics is enabled.
2020-09-05 08:29:17,257 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/rw/vfwscnts4vn0d10gl7v510640000gn/T/tmp527igosh']'
2020-09-05 08:29:17,259 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/rw/vfwscnts4vn0d10gl7v510640000gn/T/tmp527igosh']'
A few notes/thoughts:
touch .dvc/cache/d4/1d8cd98f00b204e9800998ecf8427e
gdrive_file.FetchMetadata(fields="fileSize") make download 2x slower and twice as expensive in case of many files. We should find a better way to handle this (check hash? catch 416?) and eventually fix properly on the PyDrive2 end.Most likely, the problem with the current fix is that GDrive API returns files size as a string 'fileSize': '0'. It means that if size check is always true.
(still curious how did it work in that ticket)
Still, I would recommend to make it a priority to fix this properly or introduce a hack that compares hash since this is a _very_ expensive fix in terms of GDrive API.