Dvc: `get-url` and `import-url` doesn't seem to work with S3 buckets anymore.

Created on 1 Jul 2020  路  3Comments  路  Source: iterative/dvc

Bug Report

  1. Create an empty directory
  2. dvc init --no-scm
  3. dvc import-url s3://some_bucket/some_target -v
2020-07-01 17:14:02,947 DEBUG: fetched: [(3,)]                                  
2020-07-01 17:14:03,123 DEBUG: Removing output 'some_target' of stage: 'some_target.dvc'.
Importing 's3://some_bucket/some_target' -> 'some_target'
2020-07-01 17:14:03,123 DEBUG: Computed stage: 'some_target.dvc' md5: '2f8b87d3b22efd1638f414c3b3f65614'
2020-07-01 17:14:03,123 DEBUG: 'md5' of stage: 'some_target.dvc' changed.
2020-07-01 17:14:04,088 DEBUG: fetched: [(0,)]
2020-07-01 17:14:04,146 ERROR: failed to import s3://some_bucket/some_target. You could also try downloading it manually, and adding it with `dvc add`. - Current operation was unsuccessful because 's3://some_bucket/some_target' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/dvc/command/imp_url.py", line 14, in run
    self.repo.imp_url(
  File "/usr/lib/python3.8/site-packages/dvc/repo/__init__.py", line 36, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/usr/lib/python3.8/site-packages/dvc/repo/imp_url.py", line 54, in imp_url
    stage.run()
  File "/home/anotherbugmaster/.local/lib/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/usr/lib/python3.8/site-packages/dvc/stage/decorators.py", line 35, in rwlocked
    return call()
  File "/home/anotherbugmaster/.local/lib/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/stage/__init__.py", line 424, in run
    sync_import(self, dry, force)
  File "/usr/lib/python3.8/site-packages/dvc/stage/imports.py", line 29, in sync_import
    stage.save_deps()
  File "/usr/lib/python3.8/site-packages/dvc/stage/__init__.py", line 387, in save_deps
    dep.save()
  File "/usr/lib/python3.8/site-packages/dvc/output/base.py", line 268, in save
    self.info = self.save_info()
  File "/usr/lib/python3.8/site-packages/dvc/output/base.py", line 192, in save_info
    return self.remote.save_info(self.path_info)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 762, in save_info
    return self.tree.save_info(path_info, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 329, in save_info
    self.PARAM_CHECKSUM: self.get_hash(path_info, tree=tree, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 297, in get_hash
    hash_ = self.get_dir_hash(path_info, tree, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 311, in get_dir_hash
    raise RemoteCacheRequiredError(path_info)
dvc.exceptions.RemoteCacheRequiredError: Current operation was unsuccessful because 's3://some_bucket/some_target' requires existing cache on 's3' remote. See <https://man.dvc.org/config#cache> for information on how to set up remote cache.
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
  1. dvc get-url s3://some_bucket/some_target -v
2020-07-01 17:15:39,910 ERROR: unexpected error - 'NoneType' object has no attribute 'cache'
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/dvc/main.py", line 53, in main
    ret = cmd.run()
  File "/usr/lib/python3.8/site-packages/dvc/command/get_url.py", line 17, in run
    Repo.get_url(self.args.url, out=self.args.out)
  File "/usr/lib/python3.8/site-packages/dvc/repo/get_url.py", line 19, in get_url
    dep.save()
  File "/usr/lib/python3.8/site-packages/dvc/output/base.py", line 268, in save
    self.info = self.save_info()
  File "/usr/lib/python3.8/site-packages/dvc/output/base.py", line 192, in save_info
    return self.remote.save_info(self.path_info)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 762, in save_info
    return self.tree.save_info(path_info, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 329, in save_info
    self.PARAM_CHECKSUM: self.get_hash(path_info, tree=tree, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 297, in get_hash
    hash_ = self.get_dir_hash(path_info, tree, **kwargs)
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 310, in get_dir_hash
    if not self.cache:
  File "/usr/lib/python3.8/site-packages/dvc/remote/base.py", line 184, in cache
    return getattr(self.repo.cache, self.scheme)
AttributeError: 'NoneType' object has no attribute 'cache'
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

The same commands work with https://some_domain/some_target urls and I don't think that external cache were ever necessary to download files from S3.

Please provide information about your setup

Output of dvc version:

$ dvc version
1.1.2

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

bug p2-medium research

All 3 comments

I found out a couple of things:

  • get-url works in 0.93.0
  • In order to make import-url work one need to set up s3 cache in _any_ bucket, not necessarily in the same bucket that contains the file that needs to be imported

That kind of solves the issue, but I don't get the logic behind this. Why would I need a cache in a separate bucket just to download the file from a completely different bucket? Seems weird because I need to download the file to my local machine anyway in order to compute hashes

@anotherbugmaster This is a well known bug that became more intrusive once we've adjusted the way we process inputs in get-url and import-url. It will be improved in the near future.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shcheklein picture shcheklein  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

ghost picture ghost  路  3Comments

gregfriedland picture gregfriedland  路  3Comments

ghost picture ghost  路  3Comments