Dvc: fetch giving unexpected error for all tags and all commits

Created on 22 May 2020  ยท  3Comments  ยท  Source: iterative/dvc

$ dvc --version
DVC version: 0.94.0
Python version: 3.7.6
Platform: Linux-5.3.0-51-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Supported remotes: http, https

After cloning example-get-started repo it gives unexpected error for fetch -aT.
I have tried this on windows too and it shows same error. It previously worked when I tried on version 0.92.x.

$ git clone https://github.com/iterative/example-get-started
$ cd example-get-started
$ dvc fetch -aT
ERROR: unexpected error
$ dvc fetch -aT -v
2020-05-22 23:02:45,092 DEBUG: PRAGMA user_version;                     
2020-05-22 23:02:45,092 DEBUG: fetched: [(3,)]
2020-05-22 23:02:45,092 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-05-22 23:02:45,093 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-05-22 23:02:45,093 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-05-22 23:02:45,093 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-05-22 23:02:45,093 DEBUG: PRAGMA user_version = 3;
2020-05-22 23:02:45,117 DEBUG: Assuming '/home/hardik/src/example-get-started/.dvc/cache/42/c7025fc0edeb174069280d17add2d4.dir' is unchanged since it is read-only
2020-05-22 23:02:45,117 DEBUG: Assuming '/home/hardik/src/example-get-started/.dvc/cache/42/c7025fc0edeb174069280d17add2d4.dir' is unchanged since it is read-only
2020-05-22 23:02:45,117 DEBUG: Assuming '/home/hardik/src/example-get-started/.dvc/cache/68/36f797f3924fb46fcfd6b9f6aa6416.dir' is unchanged since it is read-only
2020-05-22 23:02:45,118 DEBUG: Assuming '/home/hardik/src/example-get-started/.dvc/cache/68/36f797f3924fb46fcfd6b9f6aa6416.dir' is unchanged since it is read-only
2020-05-22 23:02:45,184 DEBUG: SELECT count from state_info WHERE rowid=?
2020-05-22 23:02:45,184 DEBUG: fetched: [(7,)]
2020-05-22 23:02:45,184 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-05-22 23:02:45,197 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/command/data_sync.py", line 75, in run
    recursive=self.args.recursive,
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/repo/__init__.py", line 30, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/repo/__init__.py", line 536, in fetch
    return self._fetch(*args, **kwargs)
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/repo/fetch.py", line 45, in _fetch
    recursive=recursive,
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/repo/__init__.py", line 295, in used_cache
    filter_info=filter_info,
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/stage/__init__.py", line 761, in get_used_cache
    cache.update(out.get_used_cache(*args, **kwargs))
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/output/base.py", line 449, in get_used_cache
    self.checksum, self._collect_used_dir_cache(**kwargs),
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/output/base.py", line 363, in _collect_used_dir_cache
    if self.cache.changed_cache_file(self.checksum):
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/remote/base.py", line 805, in changed_cache_file
    if self.is_protected(cache_info):
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/remote/local.py", line 719, in is_protected
    if not self.exists(path_info):
  File "/home/hardik/miniconda3/lib/python3.7/site-packages/dvc/remote/local.py", line 97, in exists
    assert is_working_tree(self.repo.tree)
AssertionError
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
bug p0-critical

Most helpful comment

Issue is that state currently belongs to Repo, and always uses repo.tree. If repo.tree is a GitTree (because brancher is being used) state lookups for local cache paths will fail because local cache doesn't exist in the GitTree.

State should really belong to local cache, or to the WorkingTree used by local cache.

All 3 comments

Seems very similar to #3812 on first glance (noted by the author of this issue in https://github.com/iterative/dvc.org/issues/528#issuecomment-632835927).

After changes for #3811 we still fail with a different error:

โฏ dvc fetch -aT
ERROR: unexpected error - [Errno 2] No such file

...
2020-05-25 15:00:59,632 DEBUG: fetched: [(0,)]
2020-05-25 15:00:59,635 ERROR: unexpected error - [Errno 2] No such file
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/pmrowla/git/dvc/dvc/main.py", line 53, in main
    ret = cmd.run()
  File "/Users/pmrowla/git/dvc/dvc/command/data_sync.py", line 79, in run
    run_cache=self.args.run_cache,
  File "/Users/pmrowla/git/dvc/dvc/repo/__init__.py", line 25, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/Users/pmrowla/git/dvc/dvc/repo/__init__.py", line 527, in fetch
    return self._fetch(*args, **kwargs)
  File "/Users/pmrowla/git/dvc/dvc/repo/fetch.py", line 51, in _fetch
    used_run_cache=used_run_cache,
  File "/Users/pmrowla/git/dvc/dvc/repo/__init__.py", line 302, in used_cache
    filter_info=filter_info,
  File "/Users/pmrowla/git/dvc/dvc/stage/__init__.py", line 516, in get_used_cache
    cache.update(out.get_used_cache(*args, **kwargs))
  File "/Users/pmrowla/git/dvc/dvc/output/base.py", line 470, in get_used_cache
    self.checksum, self.collect_used_dir_cache(**kwargs),
  File "/Users/pmrowla/git/dvc/dvc/output/base.py", line 391, in collect_used_dir_cache
    self.get_dir_cache(jobs=jobs, remote=remote)
  File "/Users/pmrowla/git/dvc/dvc/output/base.py", line 358, in get_dir_cache
    if self.cache.changed_cache_file(self.checksum):
  File "/Users/pmrowla/git/dvc/dvc/remote/base.py", line 881, in changed_cache_file
    actual = self.get_checksum(cache_info)
  File "/Users/pmrowla/git/dvc/dvc/remote/base.py", line 350, in get_checksum
    checksum = self.state.get(path_info)
  File "/Users/pmrowla/git/dvc/dvc/state.py", line 400, in get
    actual_mtime, actual_size = get_mtime_and_size(path, self.repo.tree)
  File "/Users/pmrowla/git/dvc/dvc/utils/fs.py", line 53, in get_mtime_and_size
    base_stat = tree.stat(path)
  File "/Users/pmrowla/git/dvc/dvc/ignore.py", line 151, in stat
    return self.tree.stat(path)
  File "/Users/pmrowla/git/dvc/dvc/scm/git/tree.py", line 161, in stat
    raise OSError(errno.ENOENT, "No such file")
FileNotFoundError: [Errno 2] No such file
------------------------------------------------------------

Issue is that state currently belongs to Repo, and always uses repo.tree. If repo.tree is a GitTree (because brancher is being used) state lookups for local cache paths will fail because local cache doesn't exist in the GitTree.

State should really belong to local cache, or to the WorkingTree used by local cache.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kevin-hanselman picture kevin-hanselman  ยท  37Comments

luchoPipe87 picture luchoPipe87  ยท  69Comments

dmpetrov picture dmpetrov  ยท  64Comments

jorgeorpinel picture jorgeorpinel  ยท  45Comments

drorata picture drorata  ยท  46Comments