version: 0.40.0+6408b5
Pulling the _data_ directory: dvc pull -r ssh
โฏ dvc pull -r ssh
Preparing to download data from 'ssh://localhost/tmp/data-storage'
Preparing to collect status from ssh://localhost/tmp/data-storage
[##############################] 100% Collecting information
[##############################] 100% Analysing status.
Computing md5 for a large directory data. This is only done once.
[##############################] 100% data
[##############################] 100% Checkout finished!
Everything is up to date.
Remove a file from the directory: rm data/1
dvc pull -r ssh
Preparing to download data from 'ssh://localhost/tmp/data-storage'
Preparing to collect status from ssh://localhost/tmp/data-storage
[##############################] 100% Collecting information
[##############################] 100% Analysing status.
Computing md5 for a large directory data. This is only done once.
[##############################] 100% data
[##############################] 100% Checkout finished!
Everything is up to date.
It is showing me that it will only compute the MD5s once, but it is doing it twice.
Maybe remove the This is only done once
message or rephrase it.
@mroutis well, but you've removed a file from the directory, so you've modified it and so it needs to compute checksum once again. What is your point?
@efiop , my point is that the This is only done once
message could confuse more than bringing value.
For me is clear that This is only done once
is actually "if you don't modify the content of any file inside data
directory", but I'm not sure if I'm biased because I kind of know how DVC works internally and what's the purpose of computing checksums.
@mroutis but you have data
added as a whole, and you are modifying it by adding/removing files to it, so it makes sense that it is considered as changed and will be re-computed once again.
Most helpful comment
@efiop , my point is that the
This is only done once
message could confuse more than bringing value.For me is clear that
This is only done once
is actually "if you don't modify the content of any file insidedata
directory", but I'm not sure if I'm biased because I kind of know how DVC works internally and what's the purpose of computing checksums.