Dvc: cache file name automatically changed

Created on 25 Apr 2020  路  9Comments  路  Source: iterative/dvc

dvc version 0.93.0

What I did is using dvc run to run python code, and dvc add for dependencies.
Then changed one of the dependencies,dvc run again, and dvc add for the changed file

The cache for files automatically changed.
For example, for csv file
cache
- e2
- 83742f43b84506f6417e43f6ba666b

became
cache
- e2
- 83742f43b84506f6417e43f6ba666b.csv

which makes me unable to do dvc checkout.
Does anyone have idea about this? Thanks

image

awaiting response

All 9 comments

Hi @lmxs1237, I notice that you are using a very old version (recent release is v0.93.0). Can you please retry your operations with updated version of dvc? For now, you can just rename the cache and retry. If you have external remote, dvc pull might work with a updated version (assuming the bug is fixed).

Hi @skshetry, I updated the version of dvc to v0.93.0, but it happened again. And I tried to change the cache file name with
mv 80cc95bf0407188578fb59eda1c2a7.csv 80cc95bf0407188578fb59eda1c2a7
Then somehow, other file's name changed, it's like totally randomly selecting cache and changing its name.

Can you check via grep where the hashes are being used?

$ grep -r <hash_with_csv` <dvc directory>

Can you also check that specific cache content if it's a json? To me, it looks like someone changed md5 for directory stage file from <hash>.dir to <hash>.csv.

@skshetry
This time I run
dvc run -f train.dvc -d data.csv -d code.py -o output python code.py
Then dvc add data.csv code.py, then git add all dvc.
Then it picks the python code and changes the cache name..
image
So I checked the hash of this python code, it returned
Binary file <dvc directory>/.dvc/state matches
<dvc directory>/train.dvc: - md5: 02a545fad57b4355030ff35c220daef4
<dvc directory>/src/code.py.dvc: - md5: 02a545fad57b4355030ff35c220daef4
And the cache content for python code is just code, for csv is csv file with comma separated.

Funny thing.. When I finished typing above words, 3 more cache file's name changed..
image

Whoa, that is extremely strange :eyes: Having those .csv/.py suffixes shouldn't happen ever, something very serious is going on.

@lmxs1237 It would help us a lot if you could come up with a minimal reproducible script.

@lmxs1237 Are you sure you are not doing something weird with your cache? Or maybe this project is also used by someone else and they might be doing something odd to it? This looks really odd and I have hard time thinking that it has been a bug since at least 0.66.0 (but maybe it is so obscure and specific that it only happens in specific circumstances, we'll see). As a sanity check, could you run $ dvc version(it contains more than the version itself) and show us the output, please?

The cache is exactly the same file with the original one. Sound like there is a program automatically adding a suffix to the cache file according to its file format whenever it reads.

Closing as stale.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gvyshnya picture gvyshnya  路  36Comments

ynop picture ynop  路  41Comments

ChrisHowlin picture ChrisHowlin  路  35Comments

pared picture pared  路  73Comments

danfischetti picture danfischetti  路  41Comments