Hi, I am using dvc 0.50.1
I found if I do dvc add after dvc remove and dvc gc. It leads to an error as the following:
$dvc add data.csv
$dvc remove data.csv.dvc
$rm data.csv.dvc
$dvc gc
$dvc add data.csv
ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."
Is this an expected behavior? How do I solve this?
This is a bug. Here is how it supposed to work:
$ dvc add data.csv # data.csv is added to cache, data.csv.dvc is created
$ dvc remove data.csv.dvc # data.csv is removed
$ rm data.csv.dvc
$ dvc gc # data.csv is removed from cache
$ dvc add data.csv
ERROR: failed to add file - output 'data.csv' does not exist
So the question is how do you reobtain data.csv? Or maybe it is not deleted on dvc remove data.csv.dvc?
Hi @Suor
Wow, I didn't realize dvc remove actually delete the data it's self.
The example I made is not exactly what I did. Sorry to make confusion.
My dvc remove does delete files properly.
But it's not working on a soft-linked directory(or just directory?)
The following are the exact operations I did:
$ ln -s ../../image_data ./data #Build a soft link to the directory
$ dvc add data #Add this directory
$ dvc remove data.dvc #this action does not delete the directory at all
$ rm data.dvc
$ dvc gc
$ dvc add data #Add the directory again causing error
ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."
Also, I found the following operations causing the error as well
$ dvc add data.csv # data.csv is added to cache, data.csv.dvc is created
$ dvc remove data.csv.dvc # data.csv is removed
$ rm data.csv.dvc
$ dvc gc # data.csv is removed from cache
$ cp ../data_backup/data.csv ./data.csv # copy the data.csv from the backup
$ dvc add data.csv # add the data.csv again causing error
ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."
Reproduced the symlink case. Thanks for instructions @GBJim.
Diagnose: inode is sometimes reused when removing/recreating symlink, link mtime is ignored in directory mtime calc, so we hit state and load a dir checksum from it, which refers to a cached json listing, which was removed by dvc gc earlier.
P.S. A co-culprit is our code being fragile, dir cache is calculated during .get_dir_checksum() and stored in cache as a side-effect, later .load_dir_cache() relies on it to be there. Two fails happen in a row, but ignored with logger.exception() until things blow up on third exception.
@GBJim for the time being you can touch any file in the dir to make it work, this will update its mtime and resolve the confusion.
@Suor Thank you for the quick reply!
As for files, is there a work-around also?
@GBJim I am unable to reproduce that one. Using touch might still work, renaming a file should definitely work. What operating system are you using?
@Suor I couldn't reproduce this one as well. It seems like the second case is working fine.
Thank you!
Most helpful comment
Reproduced the symlink case. Thanks for instructions @GBJim.