Running a dvc diff took about 70 minutes to run.
Output of dvc version:
$ dvc version
Platform: Python 3.7.7 on Linux-4.15.0-101-generic-x86_64-with-debian-buster-sid
Supports: gs, http, https
Cache types: hardlink, symlink
Repo: dvc, git
Additional Information (if any):
I have 16722 files in my .dvc cache and 5 .dvc files. 3 of which point to directories.
And my .dvc/config file:
[core]
remote = nas
['remote "nas"']
url = /mnt/NAS/Production/project/dvc-cache
I'm using a NAS as the remote and most of the data is images. Looks like there is about 900 MB of data total
Here is the output of dvc diff -v
Hi @kweston !
Could you please run
dvc diff -v --cprofile --cprofile-dump diff.prof &> log.log
and post diff.prof and log.log?
log.log
diff.prof.gz
Hi @efiop
Here you go
This is most likely the diff directory file granularity performance issue (#4580) which was addressed in DVC 1.8.2. @kweston can you update DVC and then try running diff again?
You are correct. Updated dvc to v1.8.4 and now it takes just 16 seconds for a diff. I can live with this. Thanks for the help!
Closing for now. There will be more optimizations soon, stay tuned! :wink:
Most helpful comment
You are correct. Updated dvc to v1.8.4 and now it takes just 16 seconds for a diff. I can live with this. Thanks for the help!