Dvc: Garbage Collect - Unexpected Error - Assertion Error - Troubles with changing dvc remotes?

Created on 17 Jun 2020  路  2Comments  路  Source: iterative/dvc

Bug Report

Please provide information about your setup

Output of dvc version:

$ dvc version
DVC version: 0.94.0
Python version: 3.7.6
Platform: Linux-4.15.0-99-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Supported remotes: http, https, s3, ssh
Cache: reflink - not supported, hardlink - supported, symlink - supported
Repo: dvc, git

Additional Information (if any):

I tried running the garbage collection for the first time in my project and am encountering the following error.

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc status                                                                           
Data and pipelines are up to date.                                      
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc gc -aTv
2020-06-17 09:20:15,856 WARNING: This will remove all cache except items used in the working tree and all git branches and tags of the current repo.
Are you sure you want to proceed? [y/n] y
2020-06-17 09:20:21,135 DEBUG: PRAGMA user_version;                     
2020-06-17 09:20:21,136 DEBUG: fetched: [(3,)]
2020-06-17 09:20:21,136 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-06-17 09:20:21,137 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-06-17 09:20:21,137 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-06-17 09:20:21,137 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-06-17 09:20:21,138 DEBUG: PRAGMA user_version = 3;
2020-06-17 09:20:30,448 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/11/127bb35fbb6b37c554139ae5cb3c35.dir' is unchanged since it is read-only
2020-06-17 09:20:30,449 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/11/127bb35fbb6b37c554139ae5cb3c35.dir' is unchanged since it is read-only
2020-06-17 09:20:31,259 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/03/4a5618f598d1602e8181c635409292.dir' is unchanged since it is read-only
2020-06-17 09:20:31,259 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/03/4a5618f598d1602e8181c635409292.dir' is unchanged since it is read-only
2020-06-17 09:20:31,267 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/94/2889f6d61e03865be5ebd66518db3e.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/94/2889f6d61e03865be5ebd66518db3e.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/e9/312f039663dcbc363e03dad8a1bf4c.dir' is unchanged since it is read-only
2020-06-17 09:20:31,268 DEBUG: Assuming '/home/rabefabi/alvuc/.dvc/cache/e9/312f039663dcbc363e03dad8a1bf4c.dir' is unchanged since it is read-only
2020-06-17 09:20:31,588 DEBUG: SELECT count from state_info WHERE rowid=?
2020-06-17 09:20:31,589 DEBUG: fetched: [(180911,)]
2020-06-17 09:20:31,589 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-06-17 09:20:31,599 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/command/gc.py", line 59, in run
    workspace=self.args.workspace,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/__init__.py", line 30, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/gc.py", line 73, in gc
    jobs=jobs,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/repo/__init__.py", line 295, in used_cache
    filter_info=filter_info,
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/stage/__init__.py", line 761, in get_used_cache
    cache.update(out.get_used_cache(*args, **kwargs))
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/output/base.py", line 449, in get_used_cache
    self.checksum, self._collect_used_dir_cache(**kwargs),
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/output/base.py", line 363, in _collect_used_dir_cache
    if self.cache.changed_cache_file(self.checksum):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/base.py", line 805, in changed_cache_file
    if self.is_protected(cache_info):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/local.py", line 719, in is_protected
    if not self.exists(path_info):
  File "/home/rabefabi/miniconda3/envs/alvuc/lib/python3.7/site-packages/dvc/remote/local.py", line 97, in exists
    assert is_working_tree(self.repo.tree)
AssertionError
------------------------------------------------------------

The debug lines inform me that some directories are read-only, and it's quite possible that I botched something during my server setup (Jupyter Lab inside docker container accessing host-directories via a docker bind). I however manually tried setting all access rights for the cache dir, which did not alleviate the error:

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ sudo chmod -R g+w .dvc/cache/
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ ls -la .dvc/cache/
total 12496
drwxrwxr-x 258 rabefabi users   4096 Jun 17 09:27 .
drwxr-xr-x   4 rabefabi users   4096 Jun 17 09:31 ..
drwxrwxr-x   2 rabefabi users  20480 Jun 17 09:19 00
drwxrwxr-x   2 rabefabi users  24576 Jun 17 09:19 01
drwxrwxr-x   2 rabefabi users  20480 Jun 17 09:19 02
drwxrwxr-x   3 rabefabi users  20480 Jun 17 09:19 03
...

Also, I recently switched dvc remotes (from S3 to ssh), so most of the cached files were created during a time where the S3-Remote was configured. Could this be relevant? The S3-Remote is no longer accessible.

Thanks in advance!

awaiting response

Most helpful comment

Hi @rabefabi, thanks for reporting. This issue has been fixed and we are preparing for 1.0 release within this week with the fix. But, you should be able to use a beta release without any issues for the time being.

Related: #3857 #3812

All 2 comments

Hi @rabefabi, thanks for reporting. This issue has been fixed and we are preparing for 1.0 release within this week with the fix. But, you should be able to use a beta release without any issues for the time being.

Related: #3857 #3812

Hi @skshetry , thank you for the quick response.

Upgrading did solve the issue, if anyone else encounters it:

(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ pip install --upgrade --pre dvc
[...]
(alvuc) rabefabi@9695e3ffdf02:~/alvuc$ dvc gc -aT
[garbage collection happens]

Thanks again!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ChrisHowlin picture ChrisHowlin  路  35Comments

mdekstrand picture mdekstrand  路  43Comments

Casyfill picture Casyfill  路  56Comments

luchoPipe87 picture luchoPipe87  路  69Comments

danfischetti picture danfischetti  路  41Comments