dvc diff takes forever

Created on 19 Oct 2020  路  5Comments  路  Source: iterative/dvc

Bug Report

Running a dvc diff took about 70 minutes to run.

Please provide information about your setup

Output of dvc version:

$ dvc version

DVC version: 1.8.1 (pip)

Platform: Python 3.7.7 on Linux-4.15.0-101-generic-x86_64-with-debian-buster-sid
Supports: gs, http, https
Cache types: hardlink, symlink
Repo: dvc, git

Additional Information (if any):

I have 16722 files in my .dvc cache and 5 .dvc files. 3 of which point to directories.

And my .dvc/config file:

[core]
    remote = nas
['remote "nas"']
    url = /mnt/NAS/Production/project/dvc-cache

I'm using a NAS as the remote and most of the data is images. Looks like there is about 900 MB of data total

Here is the output of dvc diff -v

diffout.gz

awaiting response performance research

Most helpful comment

You are correct. Updated dvc to v1.8.4 and now it takes just 16 seconds for a diff. I can live with this. Thanks for the help!

All 5 comments

Hi @kweston !

Could you please run

dvc diff -v --cprofile --cprofile-dump diff.prof &> log.log

and post diff.prof and log.log?

log.log
diff.prof.gz
Hi @efiop
Here you go

This is most likely the diff directory file granularity performance issue (#4580) which was addressed in DVC 1.8.2. @kweston can you update DVC and then try running diff again?

You are correct. Updated dvc to v1.8.4 and now it takes just 16 seconds for a diff. I can live with this. Thanks for the help!

Closing for now. There will be more optimizations soon, stay tuned! :wink:

Was this page helpful?
0 / 5 - 0 ratings