When we run a command that is running checkout internally, we could get:
(node)[ivan@ivan ~/Projects/myrepo]$ dvc metrics show -a
Unexpected error: Cmd('git') failed due to: exit code(1)
cmdline: git checkout bigrams
stderr: 'error: Your local changes to the following files would be overwritten by checkout:
auc.metric
auc.metric.dvc
featurization.py
matrix.pkl.dvc
model.pkl.dvc
Please commit your changes or stash them before you switch branches.
Aborting'
We should improve the message.
Ideally, we should find a way to traverse git's object graph w/o actual checkouts.
We should use ls-tree and show branch:file instead of checking out other branches. It will eliminate the problem once and for all. Also need to access dvc cache files directly, instead of using dvc checkout
. Already working on it. Thank you!
fixing this would also solve https://github.com/iterative/dvc/issues/1552
I'm using this script locally right now which uses git show
(and jq
):
#!/usr/bin/env python3
import sys
import argparse
from jq import jq # pylint: disable=no-name-in-module
from git import Repo
from git.exc import GitCommandError
import json
from pygments import highlight
from pygments.lexers.data import JsonLexer
from pygments.formatters.terminal import TerminalFormatter
parser = argparse.ArgumentParser(description="""
python scripts/metrics.py experiments/**/metrics/ml_model_parameter_search.json
python scripts/metrics.py experiments/**/metrics/rule_model_train.json --jq '.total_untagged'
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq '.level_5.score' --all-branches
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq 'to_entries|map(.key, .value.score)' --all-branches
""",
formatter_class=lambda prog: argparse.RawDescriptionHelpFormatter(prog, max_help_position=32))
parser.add_argument('filepaths', nargs='+')
parser.add_argument('--jq')
parser.add_argument('--all-branches', action='store_true')
args = parser.parse_args(args=None if sys.argv[1:] else ['--help'])
repo = Repo('.')
branches = ['master']
if args.all_branches:
branches = [x.name for x in repo.branches]
for branch in branches:
for filepath in args.filepaths:
try:
content = repo.git.show(f'{branch}:{filepath}')
except GitCommandError as e:
if 'exists on disk, but not in' in e.stderr or 'does not exist in' in e.stderr:
continue
else:
raise e
if args.jq:
content = json.dumps(jq(args.jq).transform(text=content))
content = highlight(content, JsonLexer(), TerminalFormatter()).strip()
print(branch, filepath, end='')
prefix = '\n' if len(content) > 50 else ' '
print(f'{prefix}{content}')
here are some example outputs:
Most helpful comment
I'm using this script locally right now which uses
git show
(andjq
):here are some example outputs: