Dvc: pull: computing md5 for large directories (message is misleading)

Created on 9 May 2019  ยท  3Comments  ยท  Source: iterative/dvc

version: 0.40.0+6408b5

Pulling the _data_ directory: dvc pull -r ssh

โฏ dvc pull -r ssh
Preparing to download data from 'ssh://localhost/tmp/data-storage'
Preparing to collect status from ssh://localhost/tmp/data-storage
[##############################] 100% Collecting information
[##############################] 100% Analysing status.
Computing md5 for a large directory data. This is only done once.
[##############################] 100% data
[##############################] 100% Checkout finished!
Everything is up to date.

Remove a file from the directory: rm data/1

dvc pull -r ssh

Preparing to download data from 'ssh://localhost/tmp/data-storage'
Preparing to collect status from ssh://localhost/tmp/data-storage
[##############################] 100% Collecting information
[##############################] 100% Analysing status.
Computing md5 for a large directory data. This is only done once.
[##############################] 100% data
[##############################] 100% Checkout finished!
Everything is up to date.

It is showing me that it will only compute the MD5s once, but it is doing it twice.

Maybe remove the This is only done once message or rephrase it.

p3-nice-to-have ui

Most helpful comment

@efiop , my point is that the This is only done once message could confuse more than bringing value.
For me is clear that This is only done once is actually "if you don't modify the content of any file inside data directory", but I'm not sure if I'm biased because I kind of know how DVC works internally and what's the purpose of computing checksums.

All 3 comments

@mroutis well, but you've removed a file from the directory, so you've modified it and so it needs to compute checksum once again. What is your point?

@efiop , my point is that the This is only done once message could confuse more than bringing value.
For me is clear that This is only done once is actually "if you don't modify the content of any file inside data directory", but I'm not sure if I'm biased because I kind of know how DVC works internally and what's the purpose of computing checksums.

@mroutis but you have data added as a whole, and you are modifying it by adding/removing files to it, so it makes sense that it is considered as changed and will be re-computed once again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddygups picture siddygups  ยท  3Comments

shcheklein picture shcheklein  ยท  3Comments

ghost picture ghost  ยท  3Comments

shcheklein picture shcheklein  ยท  3Comments

dmpetrov picture dmpetrov  ยท  3Comments