We have a corpus (i.e., directory of text documents) that are stored in a company-wide S3 data store at s3://my-company-research-data/data/corpus.
When I run dvc import-url s3://my-company-research-data/data/corpus ./local/path, I get an error:
ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it withdvc add. - Current operation was unsuccessful because 's3://my-company-research-data/data/corpus' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.. Per, this thread, this appears to be a bug.
Output of dvc version:
$ dvc version
DVC version: 1.1.11
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, s3
Repo: dvc, git
Additional Information (if any):
When I run the same command with --verbose, this is what i get:
2020-07-22 17:20:16,566 DEBUG: fetched: [(3,)]
2020-07-22 17:20:16,935 DEBUG: Removing output 'challenge/common/data/COCA-corpus' of stage: 'COCA-corpus.dvc'.
Importing 's3://duolingo-research-data/det/COCA' -> 'challenge/common/data/COCA-corpus'
2020-07-22 17:20:16,936 DEBUG: Computed stage: 'COCA-corpus.dvc' md5: 'fda33f7e862514b4c924b2692aff808d'
2020-07-22 17:20:16,936 DEBUG: 'md5' of stage: 'COCA-corpus.dvc' changed.
2020-07-22 17:20:17,523 DEBUG: fetched: [(0,)]
2020-07-22 17:20:17,524 ERROR: failed to import s3://duolingo-research-data/det/COCA. You could also try downloading it manually, and adding it with `dvc add`. - Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/command/imp_url.py", line 18, in run
no_exec=self.args.no_exec,
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/__init__.py", line 34, in wrapper
ret = f(repo, *args, **kwargs)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
result = method(repo, *args, **kw)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/repo/imp_url.py", line 54, in imp_url
stage.run()
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/decorators.py", line 35, in rwlocked
return call()
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 424, in run
sync_import(self, dry, force)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/imports.py", line 29, in sync_import
stage.save_deps()
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/stage/__init__.py", line 387, in save_deps
dep.save()
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 267, in save
self.info = self.save_info()
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/output/base.py", line 191, in save_info
return self.tree.save_info(self.path_info)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 314, in save_info
self.PARAM_CHECKSUM: self.get_hash(path_info, tree=tree, **kwargs)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 282, in get_hash
hash_ = self.get_dir_hash(path_info, tree, **kwargs)
File "/Users/duolingo/Documents/GitHub/det-challenge-development/.pyenv/lib/python3.7/site-packages/dvc/tree/base.py", line 296, in get_dir_hash
raise RemoteCacheRequiredError(path_info)
dvc.exceptions.RemoteCacheRequiredError: Current operation was unsuccessful because 's3://duolingo-research-data/det/COCA' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
Thanks! I don't think importing should have this condition: requires existing cache on 's3' remote.
Hi @nimrand !
Unfortunately, this is a known bug :slightly_frowning_face: https://github.com/iterative/dvc/issues/4144 We'll try to get to it in the next sprint (starting next week). Thank you for the feedback! Closing this ticket in favor of https://github.com/iterative/dvc/issues/4144 .
Most helpful comment
Hi @nimrand !
Unfortunately, this is a known bug :slightly_frowning_face: https://github.com/iterative/dvc/issues/4144 We'll try to get to it in the next sprint (starting next week). Thank you for the feedback! Closing this ticket in favor of https://github.com/iterative/dvc/issues/4144 .