e.g. for directories if corresponding .dir cache file is missing, then gc is not able to tell which files from that dir are needed, so it tries to download them.
@efiop any other cases in which GC may hit the remote? Thanks
@jorgeorpinel I think that's all.
@jorgeorpinel I tried to implement this on example-get-started repo.
For instance after running dvc pull prepare.dvc the content of cache was:
โโโ 58
โย ย โโโ 245acfdc65b519c44e37f7cce12931
โโโ 68
โย ย โโโ 36f797f3924fb46fcfd6b9f6aa6416.dir
โย ย โโโ 36f797f3924fb46fcfd6b9f6aa6416.dir.unpacked
โย ย ย ย โโโ test.tsv
โย ย ย ย โโโ train.tsv
โโโ 9d
ย ย โโโ 603888ec04a6e75a560df8678317fb
Then I ran rm -Rf .dvc/cache to delete the cache and used dvc gc -w. It did pull some .dir cache files from remote as dvc gc isn't able to tell which files in directory are missing. This is also mentioned in the discord discussion
โโโ 42
โย ย โโโ c7025fc0edeb174069280d17add2d4.dir
โโโ 68
โโโ 36f797f3924fb46fcfd6b9f6aa6416.dir
Should I create a PR and add a note for this in gc.md?
Where did the 42 one come from @imhardikj ?
First I deleted the cache directory and then I did dvc gc -w. So I am sure gc fetched 68 as well as 42 from remote.
Oh OK you cloned example-get-started to its latest commit, but only pulled prepare.dvc at first right (when you ran tree .dvc/cache)? That would explain it. 42 must be from a later stage.