Dvc.org: import: examples using example-get-started have a problem (chained imports)

Created on 6 Jan 2021  路  7Comments  路  Source: iterative/dvc.org

UPDATE: see bold text in https://github.com/iterative/dvc.org/issues/2079#issuecomment-755295653

Description

I try to import an existing data file and DVC crashes.

(related to https://github.com/iterative/dvc/issues/2599#issuecomment-566346857)

Reproduce

$ dvc import [email protected]:iterative/example-get-started data/data.xml
Importing 'data/data.xml ([email protected]:iterative/example-get-started)' -> 'data.xml'
ERROR: unexpected error - [Errno 2] No such file or directory: '/myproject/.dvc/cache/a3/04afb96060aad90176268345e10355'

Verbose output

2021-01-03 16:27:29,709 DEBUG: Check for update is enabled.
2021-01-03 16:27:29,710 DEBUG: fetched: [(3,)]
2021-01-03 16:27:29,929 DEBUG: Removing output 'data.xml' of stage: 'data.xml.dvc'.
Importing 'data/data.xml ([email protected]:iterative/example-get-started)' -> 'data.xml'
2021-01-03 16:27:29,932 DEBUG: Computed stage: 'data.xml.dvc' md5: '7168ca3824fbc00724b01499e1d31654'
2021-01-03 16:27:29,933 DEBUG: 'md5' of stage: 'data.xml.dvc' changed.
2021-01-03 16:27:29,933 DEBUG: Creating external repo [email protected]:iterative/example-get-started@None
2021-01-03 16:27:29,934 DEBUG: erepo: git clone '[email protected]:iterative/example-get-started' to a temporary dir
2021-01-03 16:27:31,714 DEBUG: Saving '../../../../tmp/tmp2liy0m1zdvc-clone/data/data.xml' to '.dvc/cache/a3/04afb96060aad90176268345e10355'.
2021-01-03 16:27:31,715 DEBUG: cache '/myproject/.dvc/cache/a3/04afb96060aad90176268345e10355' expected 'HashInfo(name='md5', value='a304afb96060aad90176268345e10355', dir_info=None, size=37891850, nfiles=None)' actual 'None'
2021-01-03 16:27:31,716 DEBUG: cache '/myproject/.dvc/cache/a3/04afb96060aad90176268345e10355' expected 'HashInfo(name='md5', value='a304afb96060aad90176268345e10355', dir_info=None, size=37891850, nfiles=None)' actual 'None'
2021-01-03 16:27:31,735 DEBUG: Preparing to download data from 'https://remote.dvc.org/get-started'
2021-01-03 16:27:31,735 DEBUG: Preparing to collect status from https://remote.dvc.org/get-started
2021-01-03 16:27:31,735 DEBUG: Collecting information from local cache...
2021-01-03 16:27:31,739 DEBUG: fetched: [(0,)]
2021-01-03 16:27:31,745 ERROR: unexpected error - [Errno 2] No such file or directory: '/myproject/.dvc/cache/a3/04afb96060aad90176268345e10355'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/main.py", line 90, in main
    ret = cmd.run()
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/command/imp.py", line 21, in run
    desc=self.args.desc,
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/repo/imp.py", line 7, in imp
    path, out=out, fname=fname, erepo=erepo, frozen=True, **kwargs
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/repo/__init__.py", line 54, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/repo/imp_url.py", line 64, in imp_url
    stage.run()
  File "/home/jop/.local/lib/python3.6/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/home/jop/.local/lib/python3.6/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/stage/__init__.py", line 500, in run
    sync_import(self, dry, force)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/stage/imports.py", line 30, in sync_import
    stage.deps[0].download(stage.outs[0])
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/dependency/repo.py", line 78, in download
    _, _, cache_infos = repo.fetch_external([self.def_path])
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/external_repo.py", line 176, in fetch_external
    path, repo, download_update, **kwargs
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/external_repo.py", line 147, in _fetch_to_cache
    download_callback=callback,
  File "/home/jop/.local/lib/python3.6/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/cache/base.py", line 40, in use_state
    return call()
  File "/home/jop/.local/lib/python3.6/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/cache/base.py", line 317, in save
    self._save(path_info, tree, hash_info, save_link, **kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/cache/base.py", line 326, in _save
    self._save_file(path_info, tree, hash_info, save_link, **kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/cache/base.py", line 213, in _save_file
    with tree.open(path_info, mode="rb") as fobj:
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/tree/repo.py", line 152, in open
    return dvc_tree.open(path_info, mode=mode, encoding=encoding, **kwargs)
  File "/home/jop/.local/lib/python3.6/site-packages/dvc/tree/dvc.py", line 107, in open
    return open(cache_path, mode=mode, encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: '/myproject/.dvc/cache/a3/04afb96060aad90176268345e10355'
------------------------------------------------------------
2021-01-03 16:27:32,087 DEBUG: Version info for developers:
DVC version: 1.11.8 (pip)
---------------------------------
Platform: Python 3.6.9 on Linux-4.19.128-microsoft-standard-x86_64-with-Ubuntu-18.04-bionic
Supports: gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2021-01-03 16:27:32,088 DEBUG: Analytics is disabled.

Expected

data.xml + data.xml.dvc in the cwd.

Environment information

Output of dvc version:

$ dvc version
DVC version: 1.11.8 (pip)
---------------------------------
Platform: Python 3.6.9 on Linux-4.19.128-microsoft-standard-x86_64-with-Ubuntu-18.04-bionic
Supports: gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Repo: dvc, git

Additional Information (if any):

bug doc-content

Most helpful comment

Issue caused by lack of iterative/dvc#3305.

Local test allowing to reproduce the behaviour:

@pytest.fixture
def another_erepo_dir(make_tmp_dir):
    return make_tmp_dir("another_erepo", scm=True, dvc=True)

def test_chained_import(tmp_dir, scm, dvc, erepo_dir, another_erepo_dir):
    with another_erepo_dir.chdir():
        another_erepo_dir.dvc_gen("data", "data content", commit="add data")

    with erepo_dir.chdir():
        stage = erepo_dir.dvc.imp(str(another_erepo_dir), "data", out="data")
        erepo_dir.scm.add([stage.dvcfile.relpath])
        erepo_dir.scm.commit("import data")

    dvc.imp(str(erepo_dir), "data")

There is a problem with this specific use case, as we are suggesting to play around with example-get-started at least in one place (https://dvc.org/doc/command-reference/import). Which will lead to errors for entry users.

We need to either:

  • fix iterative/dvc#3305 and be done with that
  • dig through docs and replace imports from example-get-started to imports from data-registry

As we have upcoming release, I presume we might not have time to properly handle iterative/dvc#3305.
Lets fix the docs then? @jorgeorpinel what do you think?

All 7 comments

p.s. I've checked that the source data exists, by cloning that repo and dvc pulling.

Issue caused by lack of iterative/dvc#3305.

Local test allowing to reproduce the behaviour:

@pytest.fixture
def another_erepo_dir(make_tmp_dir):
    return make_tmp_dir("another_erepo", scm=True, dvc=True)

def test_chained_import(tmp_dir, scm, dvc, erepo_dir, another_erepo_dir):
    with another_erepo_dir.chdir():
        another_erepo_dir.dvc_gen("data", "data content", commit="add data")

    with erepo_dir.chdir():
        stage = erepo_dir.dvc.imp(str(another_erepo_dir), "data", out="data")
        erepo_dir.scm.add([stage.dvcfile.relpath])
        erepo_dir.scm.commit("import data")

    dvc.imp(str(erepo_dir), "data")

There is a problem with this specific use case, as we are suggesting to play around with example-get-started at least in one place (https://dvc.org/doc/command-reference/import). Which will lead to errors for entry users.

We need to either:

  • fix iterative/dvc#3305 and be done with that
  • dig through docs and replace imports from example-get-started to imports from data-registry

As we have upcoming release, I presume we might not have time to properly handle iterative/dvc#3305.
Lets fix the docs then? @jorgeorpinel what do you think?

Issue caused by lack of iterative/dvc#3305.

Thanks! I knew I was overlooking something. I forgot we changed the Get Started example project to demo importing... I checked that it's possible to import from the previous version with:

$ dvc import [email protected]:iterative/example-get-started data/data.xml --rev 3-config-remote
Importing 'data/data.xml ([email protected]:iterative/example-get-started)' -> 'data.xml'

To track the changes with git, run:

        git add data.xml.dvc .gitignore

There is a problem with this specific use case, as we are suggesting to play around with example-get-started at least in one place (https://dvc.org/doc/command-reference/import). Which will lead to errors for entry users.

Exactly.

As we have upcoming release, I presume we might not have time to properly handle iterative/dvc#3305.
Lets fix the docs then?

Well, we're equally or more busy on the docs side, mainly over the same release. But it is indeed a bug for docs at the moment. So I'm transferring it...

Nvmd I can't transfer or edit issues on this repo. Can you do so @pared or @efiop ? Would also be nice to add a note/check box in iterative/dvc#3305 to close this issue if that is implemented first. For now I've updated the title and description.

@jorgeorpinel transferred!

Thanks @shcheklein

ERROR: unexpected error - [Errno 2] No such file or directory

p.s. @pared does iterative/dvc/issues/3305 cover improving that error message as a first step? There's no hint at all of what might be the problem.

Was this page helpful?
0 / 5 - 0 ratings