$ dvc -V
0.62.1
$ dvc get https://github.com/iterative/example-get-started data/data.xml
Multi-Threaded:
79%|โโโโโโโโ |../../../../private/var/folders/hq/h82mgrhx37nc89_1rcg6gm30081024/37916850 [00:08<00:03, 2.25MB/s]
$ dvc get https://github.com/iterative/example-get-started model.pkl
Multi-Threaded:
62%|โโโโโโโ |../../../../private/var/folders/hq/h82mgrhx37nc89_1rcg6gm283866624/6262877 [00:00<00:02 ...
Is there a reason to show temporary paths and what actions we can make on this? I'd expect to see just a file name like model.pkl?
For the record: we can either chdir() into the temporary repo root or to make a global variable that we can adjust that will make PathInfo.str() use it for relpath (Kudos @Suor).
This is not only about dvc get. Import are also affected, see #2691
An addition about global variable, which might be used as base for PathInfo.__str__(), it should be:
dvc/path_info.py directlywith path_info_base(some_dir):
# ... path_infos are stringified differently here
This will protect us from usual global var issues making it a dynamic scope variable if you know what I mean ;)
From discord:
@kurianbenoy:
I am interested to fix that issue, even if it takes more time to understand the codebase
@efiop:
Glad to hear that! :slight_smile: Got it. Okay, let's think about some clever ways to fix that then.
ok, so we are using named cache for that. We could make it check if we are inside of a repo or not and if not then print a relpath computed relative to the repo.root_dir instead of cwd.
the piece of the code that is responsible for that is https://github.com/iterative/dvc/blob/0.66.4/dvc/output/base.py#L410
specifically str(self) that is passed as a name
so if we could do something like:
if os.getcwd().startswith(self.stage.repo.root_dir):
name = str(self)
else:
name = relpath(self.path_info, self.stage.repo.root_dir)
and then use that name instead of str(self) in that line, then it would automatically start working in an acceptable way.
@Suor: Some outs might be outside repo dir, but still "in repo", referencing them simply by name would be confusing
So the core culprit is stringifying PathInfo objects, which don't know about repo and so can't show sane paths, stringifying output objects is, however, ok. So:
str(self) where @efiop mentionedOutputLocal.__str__() should be fixed thoughI propose this logic for OutputLocal.__str__():
if not self.is_in_repo:
return self.def_path
elif <curdir isin self.repo.root_dir>:
return relpath(self.path_info, curdir)
else:
return relpath(self.path_info, self.repo.root_dir)
Which means we show relative paths for the repo we are in and rel to repo root paths for all other repos. Remotes and absolute paths are always shown as is.
On <curdir isin ...> this not quire .startswith() as that is wrong for paths:
"a/b/c" isin "a/b" == True
"a/bc" isin "a/b" == False
This would fix the issue and we won't luckily need chdirs nor a global var.
P.S. As a second cleanup/refactor stage I would propose completely stop using PathInfo.__str__(), since it fundamentally doesn't know how to stringify itself. Raise TypeError() there and fix all the usages. This is definitely separate, don't think about it until the issue itself is resolved.
Most helpful comment
So the core culprit is stringifying
PathInfoobjects, which don't know about repo and so can't show sane paths, stringifying output objects is, however, ok. So:str(self)where @efiop mentionedOutputLocal.__str__()should be fixed thoughI propose this logic for
OutputLocal.__str__():Which means we show relative paths for the repo we are in and rel to repo root paths for all other repos. Remotes and absolute paths are always shown as is.
On
<curdir isin ...>this not quire.startswith()as that is wrong for paths:This would fix the issue and we won't luckily need chdirs nor a global var.
P.S. As a second cleanup/refactor stage I would propose completely stop using
PathInfo.__str__(), since it fundamentally doesn't know how to stringify itself. RaiseTypeError()there and fix all the usages. This is definitely separate, don't think about it until the issue itself is resolved.