dvc add causing "ERROR: failed to download" after dvc remove and gc

Created on 3 Jul 2019  路  8Comments  路  Source: iterative/dvc

Hi, I am using dvc 0.50.1
I found if I do dvc add after dvc remove and dvc gc. It leads to an error as the following:

$dvc add data.csv
$dvc remove data.csv.dvc
$rm data.csv.dvc
$dvc gc
$dvc add data.csv

ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."

Is this an expected behavior? How do I solve this?

bug p0-critical

Most helpful comment

Reproduced the symlink case. Thanks for instructions @GBJim.

All 8 comments

This is a bug. Here is how it supposed to work:

$ dvc add data.csv        # data.csv is added to cache, data.csv.dvc is created
$ dvc remove data.csv.dvc # data.csv is removed
$ rm data.csv.dvc          
$ dvc gc                  # data.csv is removed from cache
$ dvc add data.csv
ERROR: failed to add file - output 'data.csv' does not exist

So the question is how do you reobtain data.csv? Or maybe it is not deleted on dvc remove data.csv.dvc?

Hi @Suor
Wow, I didn't realize dvc remove actually delete the data it's self.
The example I made is not exactly what I did. Sorry to make confusion.

My dvc remove does delete files properly.
But it's not working on a soft-linked directory(or just directory?)
The following are the exact operations I did:

$ ln -s ../../image_data ./data  #Build a soft link to the directory
$ dvc add data                   #Add this directory
$ dvc remove data.dvc            #this action does not delete the directory at all          
$ rm data.dvc            
$ dvc gc  
$ dvc add data                   #Add the directory again causing error
ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."

Also, I found the following operations causing the error as well

$ dvc add data.csv                         # data.csv is added to cache, data.csv.dvc is created
$ dvc remove data.csv.dvc                  # data.csv is removed
$ rm data.csv.dvc          
$ dvc gc                                   # data.csv is removed from cache
$ cp ../data_backup/data.csv ./data.csv    # copy the data.csv from the backup 
$ dvc add data.csv                         # add the data.csv again causing error
ERROR: failed to download ".dvc/cache/............"
[Errno 2] No such file or directory: ''/home/ccha97u/Documents/dvc_example/.dvc/cache/....."

Reproduced the symlink case. Thanks for instructions @GBJim.

Diagnose: inode is sometimes reused when removing/recreating symlink, link mtime is ignored in directory mtime calc, so we hit state and load a dir checksum from it, which refers to a cached json listing, which was removed by dvc gc earlier.

P.S. A co-culprit is our code being fragile, dir cache is calculated during .get_dir_checksum() and stored in cache as a side-effect, later .load_dir_cache() relies on it to be there. Two fails happen in a row, but ignored with logger.exception() until things blow up on third exception.

@GBJim for the time being you can touch any file in the dir to make it work, this will update its mtime and resolve the confusion.

@Suor Thank you for the quick reply!
As for files, is there a work-around also?

@GBJim I am unable to reproduce that one. Using touch might still work, renaming a file should definitely work. What operating system are you using?

@Suor I couldn't reproduce this one as well. It seems like the second case is working fine.
Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

prihoda picture prihoda  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

robguinness picture robguinness  路  3Comments

tc-ying picture tc-ying  路  3Comments

ghost picture ghost  路  3Comments