I run the command:
dvc gc -a -v
I get some traceback:
2020-03-16 09:53:37,900 ERROR: unexpected error - 'be23d78966ad0171e87879e051edf6eb3f446e12'
------------------------------------------------------------
Traceback (most recent call last):
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/main.py", line 50, in main
ret = cmd.run()
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/command/gc.py", line 60, in run
workspace=self.args.workspace,
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/__init__.py", line 27, in wrapper
ret = f(repo, *args, **kwargs)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/gc.py", line 75, in gc
jobs=jobs,
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/__init__.py", line 256, in used_cache
for stage, filter_info in pairs:
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/__init__.py", line 252, in <genexpr>
for target in targets
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/__init__.py", line 202, in collect_granular
return [(stage, None) for stage in self.stages]
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/repo/__init__.py", line 397, in stages
for root, dirs, files in self.tree.walk(self.root_dir):
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/ignore.py", line 135, in walk
dirs[:], files[:] = self.dvcignore(root, dirs, files)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/ignore.py", line 115, in dvcignore
return DvcIgnoreFilter(self.tree)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/ignore.py", line 93, in __init__
for root, dirs, files in self.tree.walk(self.tree.tree_root):
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/scm/git/tree.py", line 148, in walk
yield from self._walk(tree, topdown=topdown)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/scm/git/tree.py", line 131, in _walk
yield from self._walk(tree[i], topdown=topdown)
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/scm/git/tree.py", line 121, in _walk
for i in _iter_tree(tree):
File "/home/my_user/data/env/my_project/lib/python3.6/site-packages/dvc/scm/git/tree.py", line 26, in _iter_tree
node = submodules[node.hexsha]
KeyError: 'be23d78966ad0171e87879e051edf6eb3f446e12'
My setup
Can reproduce with git repo that has submodules, e.g. with git clone https://github.com/githubtraining/example-dependency:
>>> from dvc.scm.git import Git
>>> git = Git(".")
>>> tree = git.get_tree("HEAD~1")
>>> for root, dnames, fnames in tree.walk("."):
... print(root)
... print(dnames)
... print(fnames)
...
/home/efiop/git/feedstocks
['feedstocks']
['.gitmodules', 'LICENSE', 'README.md']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/efiop/git/dvc/dvc/scm/git/tree.py", line 148, in walk
yield from self._walk(tree, topdown=topdown)
File "/home/efiop/git/dvc/dvc/scm/git/tree.py", line 131, in _walk
yield from self._walk(tree[i], topdown=topdown)
File "/home/efiop/git/dvc/dvc/scm/git/tree.py", line 121, in _walk
for i in _iter_tree(tree):
File "/home/efiop/git/dvc/dvc/scm/git/tree.py", line 26, in _iter_tree
node = submodules[node.hexsha]
KeyError: 'ba4d4c3b302af35049508e381fdff85072caa200'
so this affects all git repos with submodules, which is really bad.
Ok, so we are currently using this [0] hack to work around the fact that item.name is not always a basename. Even though IndexObject has it defined that way [1], but Submodule - doesn't [2]. So using item.name might cause issues such as [3], because GitPython doesn't pass name parameter when simply going through objects [4]. In other places, when working specifically with submodules, it sets _name attribute explicitly [5].
[0] https://github.com/iterative/dvc/blob/0.90.0/dvc/scm/git/tree.py#L15
[1] https://github.com/gitpython-developers/GitPython/blob/3.1.0/git/objects/base.py#L170
[2] https://github.com/gitpython-developers/GitPython/blob/3.1.0/git/objects/submodule/base.py#L1123
[3] https://github.com/gitpython-developers/GitPython/issues/597
[4] https://github.com/gitpython-developers/GitPython/blob/3.1.0/git/objects/tree.py#L237
[5] https://github.com/gitpython-developers/GitPython/blob/3.1.0/git/objects/submodule/base.py#L357
@hoangcao The fix was released in 0.90.1, please upgrade, give it a try and let us know if works for you or not. Thanks for the feedback! :pray:
It works. Thanks, @efiop
Most helpful comment
It works. Thanks, @efiop