When tracking a directory using DVC, the stats reported by dvc diff are not correct. For example, in a small test repository, the directory test is being tracked by DVC. When running dvc diff between two commits, the following is reported:
$ dvc diff f722af7 91cdc34
dvc diff from f722af74ec372073d04847a3c77ff7e7a154a21b to 91cdc345625ded08cb50f5616f41cee73b198bb6
diff for 'test'
-test with md5 cd5a9abdb72acc541fecb818c8280d5c.dir
+test with md5 3d681d1fe5b39802f663c9a82600d219.dir
0 files not changed, 0 files modified, 0 files added, 0 files deleted, size was increased by 3 Bytes
It should say that 1 file was modified, but instead only reports the size change correctly.
$ dvc --version
0.35.5
DVC installed via pip.
$ python3 --version
Python 3.6.7
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
Will this the correct output
$ dvc diff f722af7 91cdc34
dvc diff from f722af74ec372073d04847a3c77ff7e7a154a21b to 91cdc345625ded08cb50f5616f41cee73b198bb6
diff for 'test'
-test with md5 cd5a9abdb72acc541fecb818c8280d5c.dir
+test with md5 3d681d1fe5b39802f663c9a82600d219.dir
0 files not changed, 1 files modified, 0 files added, 0 files deleted, size was increased by 3 Bytes
?
Namely, 1 files modified is the only difference from what it shows currently.
And changed? Are "modified" and "changed" different in this context?
@GildedHonour "0 files _not_ changed". Yes, I think this is the only difference. We need to fix that. Feel free to fix not changed part so it's less confusing. We will need to update the docs if needed and write a test for this.
@shcheklein Okay
1 files changed, 1 files modified, 0 files added, 0 files deleted, size was increased by 3 Bytes
@GildedHonour so, let's imagine we have 10 files in the directory and we change only one of them:
9 files _not_ changed, 1 files modified, 0 files added, 0 files deleted, size was increased by 3 Bytes
This is the intended output in this case ^^
Since, I see that the phrase 9 files _not_ changed is confusing, may be we can change the it a little bit (preserving the semantics of course).
Alright. I've fixed it I think.
I'm working on the tests.
there're 2 tests in func/test_version.py will fail, they're probably unrelevant to my changes. They may have to do wit my OS -- MacOS. Should I fix them?
1)
dvc_repo = Repo: '/private/var/folders/qr/0v5z71xn7x503ykyv1j6lkp00000gn/T/dvc-test.28819.vk4jnw5_.DAdbZ4bwgScCFRGEbwNNfJ'
caplog = <_pytest.logging.LogCaptureFixture object at 0x1258c6210>
def test_info_in_repo(dvc_repo, caplog):
assert main(["version"]) == 0
assert re.search(re.compile(r"DVC version: \d+\.\d+\.\d+"), caplog.text)
assert re.search(re.compile(r"Python version: \d\.\d\.\d"), caplog.text)
assert re.search(re.compile(r"Platform: .*"), caplog.text)
assert re.search(re.compile(r"Binary: (True|False)"), caplog.text)
> assert re.search(
re.compile(r"Filesystem type \(cache directory\): .*"), caplog.text
)
E AssertionError: assert None
E + where None = <function search at 0x10e2b6290>(re.compile('Filesystem type \\(cache directory\\): .*'), 'INFO dvc.command.version:version.py:65 DVC version: 0.53.1+60159b.mod\n ... Platform: Darwin-17.7.0-x86_64-i386-64bit\n Binary: False\n')
E + where <function search at 0x10e2b6290> = re.search
E + and re.compile('Filesystem type \\(cache directory\\): .*') = <function compile at 0x10e2ea3b0>('Filesystem type \\(cache directory\\): .*')
E + where <function compile at 0x10e2ea3b0> = re.compile
E + and 'INFO dvc.command.version:version.py:65 DVC version: 0.53.1+60159b.mod\n ... Platform: Darwin-17.7.0-x86_64-i386-64bit\n Binary: False\n' = <_pytest.logging.LogCaptureFixture object at 0x1258c6210>.text
[....]/dvc/tests/func/test_version.py:13: AssertionError
2)
repo_dir = <tests.basic_env.TestDirFixture object at 0x126578cd0>, caplog = <_pytest.logging.LogCaptureFixture object at 0x1265785d0>
def test_info_outside_of_repo(repo_dir, caplog):
assert main(["version"]) == 0
assert re.search(re.compile(r"DVC version: \d+\.\d+\.\d+"), caplog.text)
assert re.search(re.compile(r"Python version: \d\.\d\.\d"), caplog.text)
assert re.search(re.compile(r"Platform: .*"), caplog.text)
assert re.search(re.compile(r"Binary: (True|False)"), caplog.text)
> assert re.search(
re.compile(r"Filesystem type \(workspace\): .*"), caplog.text
)
E AssertionError: assert None
E + where None = <function search at 0x10e2b6290>(re.compile('Filesystem type \\(workspace\\): .*'), 'INFO dvc.command.version:version.py:65 DVC version: 0.53.1+60159b.mod\n ... Platform: Darwin-17.7.0-x86_64-i386-64bit\n Binary: False\n')
E + where <function search at 0x10e2b6290> = re.search
E + and re.compile('Filesystem type \\(workspace\\): .*') = <function compile at 0x10e2ea3b0>('Filesystem type \\(workspace\\): .*')
E + where <function compile at 0x10e2ea3b0> = re.compile
E + and 'INFO dvc.command.version:version.py:65 DVC version: 0.53.1+60159b.mod\n ... Platform: Darwin-17.7.0-x86_64-i386-64bit\n Binary: False\n' = <_pytest.logging.LogCaptureFixture object at 0x1265785d0>.text
[.......]/dvc/tests/func/test_version.py:31: AssertionError
@GildedHonour Those tests were fixed recently on master, please update your branch.
I fetched those fixes and created a PR.