Dvc.org: cmd-ref: add a note that gc might try to pull missing .dir cache files

Created on 19 Mar 2020  ยท  6Comments  ยท  Source: iterative/dvc.org

All 6 comments

e.g. for directories if corresponding .dir cache file is missing, then gc is not able to tell which files from that dir are needed, so it tries to download them.

@efiop any other cases in which GC may hit the remote? Thanks

@jorgeorpinel I think that's all.

@jorgeorpinel I tried to implement this on example-get-started repo.
For instance after running dvc pull prepare.dvc the content of cache was:

โ”œโ”€โ”€ 58
โ”‚ย ย  โ””โ”€โ”€ 245acfdc65b519c44e37f7cce12931
โ”œโ”€โ”€ 68
โ”‚ย ย  โ”œโ”€โ”€ 36f797f3924fb46fcfd6b9f6aa6416.dir
โ”‚ย ย  โ””โ”€โ”€ 36f797f3924fb46fcfd6b9f6aa6416.dir.unpacked
โ”‚ย ย  ย ย   โ”œโ”€โ”€ test.tsv
โ”‚ย ย   ย ย  โ””โ”€โ”€ train.tsv
โ””โ”€โ”€ 9d
 ย ย  โ””โ”€โ”€ 603888ec04a6e75a560df8678317fb

Then I ran rm -Rf .dvc/cache to delete the cache and used dvc gc -w. It did pull some .dir cache files from remote as dvc gc isn't able to tell which files in directory are missing. This is also mentioned in the discord discussion

โ”œโ”€โ”€ 42
โ”‚ย ย  โ””โ”€โ”€ c7025fc0edeb174069280d17add2d4.dir
โ””โ”€โ”€ 68
     โ””โ”€โ”€ 36f797f3924fb46fcfd6b9f6aa6416.dir

Should I create a PR and add a note for this in gc.md?

Where did the 42 one come from @imhardikj ?

First I deleted the cache directory and then I did dvc gc -w. So I am sure gc fetched 68 as well as 42 from remote.

Oh OK you cloned example-get-started to its latest commit, but only pulled prepare.dvc at first right (when you ran tree .dvc/cache)? That would explain it. 42 must be from a later stage.

Was this page helpful?
0 / 5 - 0 ratings