Dvc: import: cryptic, confusing error output

Created on 13 Oct 2019  Â·  3Comments  Â·  Source: iterative/dvc

Splitting from other issues. See https://github.com/iterative/dvc/issues/2599#issuecomment-541292359 and https://github.com/iterative/dvc/issues/2600#issuecomment-541367615

DVC version: 0.62.1
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Filesystem type (workspace): ('apfs', '/dev/disk1s1')

A few problems in error output from dvc import:
1.

$ dvc import \               
           [email protected]:iterative/dataset-registry.git \
           invalid/path       
Importing 'invalid/path ([email protected]:iterative/dataset-registry.git)' -> 'path'
ERROR: failed to import 'invalid/path' from '[email protected]:iterative/dataset-registry.git'. - unable to find DVC-file with output '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpxczzy97tdvc-repo/invalid/path'

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
  • [x] Here the output doesn'e exist so the output is actually good, except for the super long, random-looking path ..._c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpxczzy97tdvc.... Is that necssary? I don't think it will mean anything to people.
    2.
$ dvc import ../data-reg-test data
Importing 'data (../data-reg-test)' -> 'data'
Missing cache for directory '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpons_rplmdvc-repo/data'. Cache for files inside will be lost. Would you like to continue? Use '-f' to force. [y/n] n
ERROR: failed to import 'data' from '../data-reg-test'. - unable to fully collect used cache without cache for directory '../../../../private/var/folders/_c/3mt_xn_d4xl2ddsx2m98h_r40000gn/T/tmpons_rplmdvc-repo/data'
...
  • [ ] Here we have the same random-looking path issue, but also there's a prompt that seems to refer to that path, and it seems to me its impossible to know what the implications of answering y vs n are.
  • [ ] I went for n above (safe approach), and another hard-to-understand ERROR message comes out. unable to fully collect used cache without cache for directory??? Perhaps just say "Import cancelled by user." as a regular INFO log?
    3.
$ dvc import ../data-reg-test data
...
Cache for files inside will be lost. Would you like to continue? Use '-f' to force. [y/n] y
ERROR: failed to import 'data' from '../data-reg-test'. - config file error: no remote specified. Setup default remote with
    dvc config core.remote <name>
or use:
    dvc pull -r <name>


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

This is the same command as the previous example, but I went for y. This error msg is better because it tells me what the problem is (the DVC repo in ./data-reg-test doesn't have a remote – refer to #2599 BTW).

  • [ ] The only problem here is that the Enter after y causes 2 new lines before Having any troubles? .... Any way to avoid this?
enhancement p1-important ui

Most helpful comment

@jorgeorpinel Ok, I am trying to clean up my assigned log. And got back to this issue.
I prepared reproduction script:

#!/bin/bash

rm -rf repo data-reg storage
mkdir repo data-reg

main=$(pwd)

pushd data-reg
git init --quiet
dvc init -q

set -ex

mkdir data
echo data >> data/1

dvc add data
# dvc remote add -d str $main/storage
# dvc push
git add -A
git commit -am "init"

rm -rf .dvc/cache

popd
pushd repo

git init --quiet
dvc init -q

dvc import ../data-reg data

Which I believe resembles your original issue (point 2 and 3). I try to import data from data-reg, while, there is no remote and cache has been removed.

Here are the results:

  1. When going for n
    asciicast

Warning: I believe this prompt is unnecessary, also those path are uninformative.
Error: better than it used to be but we need without cache for directory: 'data' and not this full tmp path.

  1. When going for y:
    asciicast

1st Warning: same as before
2nd Warning: not necessary
3rd Warning: might be useful if it was more like Cannot find cache fordata``
Error: plainly wrong, as it is defined as output but its not cached.

What I think is we should investigate why there is the prompt, and proceed from there.

All 3 comments

except for the super long, random-looking path... I don't think it will mean anything to people.

This seems like it will be fixed with #2777.

@jorgeorpinel Ok, I am trying to clean up my assigned log. And got back to this issue.
I prepared reproduction script:

#!/bin/bash

rm -rf repo data-reg storage
mkdir repo data-reg

main=$(pwd)

pushd data-reg
git init --quiet
dvc init -q

set -ex

mkdir data
echo data >> data/1

dvc add data
# dvc remote add -d str $main/storage
# dvc push
git add -A
git commit -am "init"

rm -rf .dvc/cache

popd
pushd repo

git init --quiet
dvc init -q

dvc import ../data-reg data

Which I believe resembles your original issue (point 2 and 3). I try to import data from data-reg, while, there is no remote and cache has been removed.

Here are the results:

  1. When going for n
    asciicast

Warning: I believe this prompt is unnecessary, also those path are uninformative.
Error: better than it used to be but we need without cache for directory: 'data' and not this full tmp path.

  1. When going for y:
    asciicast

1st Warning: same as before
2nd Warning: not necessary
3rd Warning: might be useful if it was more like Cannot find cache fordata``
Error: plainly wrong, as it is defined as output but its not cached.

What I think is we should investigate why there is the prompt, and proceed from there.

Thanks for the update @pared, sounds like a good plan.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kevin-hanselman picture kevin-hanselman  Â·  37Comments

drorata picture drorata  Â·  46Comments

gcoter picture gcoter  Â·  38Comments

mdekstrand picture mdekstrand  Â·  43Comments

dmpetrov picture dmpetrov  Â·  64Comments