Dvc: DvcTree/RepoTree should re-compute dir hash for dirty outputs

Created on 3 Sep 2020  ·  2Comments  ·  Source: iterative/dvc

From https://github.com/iterative/dvc/pull/4518#discussion_r482763844

When local workspace is dirty, the existing dir cache and hashes are reused for dir outputs in DvcTree (and RepoTree by extension). If the output is dirty (new/modified/removed files in the directory) get_hash()/get_dir_hash() should re-compute the hash rather than using the existing (clean) hash from cache/state.

Once DvcTree is updated to handle dirty workspace in this scenario, dvc diff should be updated to use RepoTree.get_hash() everywhere.

bug p1-important

All 2 comments

should re-compute the hash rather than using the existing (clean) hash from cache/state.

Just to clarify that state is for dirty repos too, we should use it to avoid recomputing multiple times for the same dirty file. That should give us a pretty good performance even for dirty repos.

Related bug: DvcTree.walk in a dirty workspace still yields all file names in the original dir cache for the directory out, even if a nested file has been deleted in the local workspace. This makes diff ignore deleted files inside an output dir. Even though diff can see that the dir has been modified, it can't see which individual files have been removed.

example-get-started git:master  py:dvc ❯ dvc diff
example-get-started git:master  py:dvc ❯ rm data/features/test.pkl
example-get-started git:master  py:dvc ❯ dvc diff
Modified:
    data/features/

files summary: 0 added, 0 deleted, 0 modified, 0 not in cache

Resolving the original issue and re-computing dir cache for the dirty workspace will fix this bug without needing to modify DvcTree.walk, since it should be yielding filenames from the the updated (dirty) dir cache file list.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gregfriedland picture gregfriedland  ·  3Comments

shcheklein picture shcheklein  ·  3Comments

dmpetrov picture dmpetrov  ·  3Comments

anotherbugmaster picture anotherbugmaster  ·  3Comments

GildedHonour picture GildedHonour  ·  3Comments