Dvc: Handle git failure to run checkout with a better message

Created on 13 Aug 2018  路  3Comments  路  Source: iterative/dvc

When we run a command that is running checkout internally, we could get:

(node)[ivan@ivan ~/Projects/myrepo]$ dvc metrics show -a Unexpected error: Cmd('git') failed due to: exit code(1) cmdline: git checkout bigrams stderr: 'error: Your local changes to the following files would be overwritten by checkout: auc.metric auc.metric.dvc featurization.py matrix.pkl.dvc model.pkl.dvc Please commit your changes or stash them before you switch branches. Aborting'

We should improve the message.

Ideally, we should find a way to traverse git's object graph w/o actual checkouts.

enhancement

Most helpful comment

I'm using this script locally right now which uses git show (and jq):

#!/usr/bin/env python3

import sys
import argparse
from jq import jq # pylint: disable=no-name-in-module
from git import Repo
from git.exc import GitCommandError

import json
from pygments import highlight
from pygments.lexers.data import JsonLexer
from pygments.formatters.terminal import TerminalFormatter

parser = argparse.ArgumentParser(description="""
python scripts/metrics.py experiments/**/metrics/ml_model_parameter_search.json
python scripts/metrics.py experiments/**/metrics/rule_model_train.json --jq '.total_untagged'
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq '.level_5.score' --all-branches
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq 'to_entries|map(.key, .value.score)' --all-branches
""",
formatter_class=lambda prog: argparse.RawDescriptionHelpFormatter(prog, max_help_position=32))
parser.add_argument('filepaths', nargs='+')
parser.add_argument('--jq')
parser.add_argument('--all-branches', action='store_true')
args = parser.parse_args(args=None if sys.argv[1:] else ['--help'])

repo = Repo('.')

branches = ['master']
if args.all_branches:
    branches = [x.name for x in repo.branches]

for branch in branches:
    for filepath in args.filepaths:
        try:
            content = repo.git.show(f'{branch}:{filepath}')
        except GitCommandError as e:
            if 'exists on disk, but not in' in e.stderr or 'does not exist in' in e.stderr:
                continue
            else:
                raise e
        if args.jq:
            content = json.dumps(jq(args.jq).transform(text=content))
            content = highlight(content, JsonLexer(), TerminalFormatter()).strip()
        print(branch, filepath, end='')
        prefix = '\n' if len(content) > 50 else ' '
        print(f'{prefix}{content}')

here are some example outputs:

All 3 comments

We should use ls-tree and show branch:file instead of checking out other branches. It will eliminate the problem once and for all. Also need to access dvc cache files directly, instead of using dvc checkout. Already working on it. Thank you!

fixing this would also solve https://github.com/iterative/dvc/issues/1552

I'm using this script locally right now which uses git show (and jq):

#!/usr/bin/env python3

import sys
import argparse
from jq import jq # pylint: disable=no-name-in-module
from git import Repo
from git.exc import GitCommandError

import json
from pygments import highlight
from pygments.lexers.data import JsonLexer
from pygments.formatters.terminal import TerminalFormatter

parser = argparse.ArgumentParser(description="""
python scripts/metrics.py experiments/**/metrics/ml_model_parameter_search.json
python scripts/metrics.py experiments/**/metrics/rule_model_train.json --jq '.total_untagged'
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq '.level_5.score' --all-branches
python scripts/metrics.py experiments/**/metrics/rule_model_parameter_search.json --jq 'to_entries|map(.key, .value.score)' --all-branches
""",
formatter_class=lambda prog: argparse.RawDescriptionHelpFormatter(prog, max_help_position=32))
parser.add_argument('filepaths', nargs='+')
parser.add_argument('--jq')
parser.add_argument('--all-branches', action='store_true')
args = parser.parse_args(args=None if sys.argv[1:] else ['--help'])

repo = Repo('.')

branches = ['master']
if args.all_branches:
    branches = [x.name for x in repo.branches]

for branch in branches:
    for filepath in args.filepaths:
        try:
            content = repo.git.show(f'{branch}:{filepath}')
        except GitCommandError as e:
            if 'exists on disk, but not in' in e.stderr or 'does not exist in' in e.stderr:
                continue
            else:
                raise e
        if args.jq:
            content = json.dumps(jq(args.jq).transform(text=content))
            content = highlight(content, JsonLexer(), TerminalFormatter()).strip()
        print(branch, filepath, end='')
        prefix = '\n' if len(content) > 50 else ' '
        print(f'{prefix}{content}')

here are some example outputs:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nik123 picture nik123  路  3Comments

mfrata picture mfrata  路  3Comments

shcheklein picture shcheklein  路  3Comments

analystanand picture analystanand  路  3Comments

gregfriedland picture gregfriedland  路  3Comments