dvc get unable to find DVC-file error

Created on 4 Nov 2019  路  10Comments  路  Source: iterative/dvc

I get an "unable to find DVC-file" error when trying to download a file from a private repository.

This is the command:

dvc get https://github.com/private_repo/my_repo/ model.pkl --rev branchname

This is the error:

ERROR: failed to get 'model.pkl' from 'https://github.com/private_repo/my_repo/' - unable to find DVC-file with output 'model.pkl'

This is the content of the .dvc file with the model.pkl output:

cmd: python train.py
outs:
- path: resources/model.pkl
  metric: false
  cache: true
  persist: false
  md5: a933f14a1466d27382a8a265eacd3034
- path: resources/evaluation.json
  metric: true
  cache: false
  persist: false
  md5: 286ea699ee07c4de3a689d48fd5e677b
deps:
- path: resources/X_train.pkl
  md5: e8f9fecdab4411d6a5ad4b09cc6b7821
- path: resources/y_train.pkl
  md5: 107fb604950f2099f892dc3a5898f50b
- path: resources/X_test.pkl
  md5: 3d00a8e4aab1670398757e2c76850f8d
- path: resources/y_test.pkl
  md5: b249173c50595c1f9663f40a057309e8
md5: 91f945a0209e673487e61637089aaab4

The path of the above .dvc file in the repo is: my_repo/my_service/service_train.dvc

System info (dvc installed with conda):

DVC version: 0.66.3
Python version: 2.7.15
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False

Any ideas regarding the possible reasons behind the error would be highly appreciated.

research

All 10 comments

@ValdarT thanks for the report! have you tried to do dvc get repo resources/model.pkl instead?

Thanks for the prompt reply. I did try that, it gave the same result.

kk, could you please run it with -v, please? Could you also try do the same w/o specifying --rev if that file exists in the master branch of course.

@ValdarT I've tried to reproduce it with a simple scenario when I have two branches with an output and I use --rev to specify one of those branches. It worker well for me. It would be great to see more info (debug log with -v) like I mentioned, or may be a simple scenario how can we reproduce this.

dvc get https://github.com/my_private_repo/my_repo/ model.pkl --rev branchname -v

DEBUG: Writing '/private/var/folders/g5/0pk5g7hj1dx3f_95b04x1tph0000gn/T/tmpG4QztZdvc-erepo/.dvc/config.local'.
DEBUG: Writing '/private/var/folders/g5/0pk5g7hj1dx3f_95b04x1tph0000gn/T/tmpG4QztZdvc-erepo/.dvc/config'.
DEBUG: Removing '.JSaFog4tzwoREktJXKwF6j'
ERROR: failed to get 'resources/model.pkl' from 'https://github.com/my_private_repo/my_repo/' - unable to find DVC-file with output 'model.pkl'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/username/anaconda3/envs/py2/lib/python2.7/site-packages/dvc/command/get.py", line 22, in run
    rev=self.args.rev,
  File "/Users/username/anaconda3/envs/py2/lib/python2.7/site-packages/dvc/repo/get.py", line 67, in get
    raise OutputNotFoundError(path)
OutputNotFoundError: unable to find DVC-file with output 'model.pkl'

The same happens when I try without specifying the branch.

Could it be related to the repository structure (i.e., to where .dvc files are located in the folders) or is it possible to somehow corrupt the state in a way that could result in something like this?

I'll try to find time tomorrow to come up with a way to reproduce this.

Ok, I think I know the reason for this - you need to specify a full relative (to project root) path of the file - how would it looks like after you do pull/checkout, in this case:

my_service/resources/model.pkl

@ValdarT please, try that let me know if that works.

I was able to reproduce a different issue with a file that is not cached though:

$ dvc import https://github.com/shcheklein/example-get-started.git auc.metric
Importing 'auc.metric (https://github.com/shcheklein/example-get-started.git)' -> 'auc.metric'
WARNING: Cache '0eaa29dc9b8c89bb9ba1348b3c3cc772' not found. File 'auc.metric' won't be created.
ERROR: failed to import 'auc.metric' from 'https://github.com/shcheklein/example-get-started.git'. - output 'auc.metric' does not exist

cc @efiop do you if we have implemented already the logic to import/get non cached files? Does it look like a bug or am I missing something?

@shcheklein We didn't yet, only API supports non-cached right now.https://github.com/iterative/dvc/issues/2515

This solves it, indeed. I find it a bit surprising because in general dvc doesn't care about paths like this so changing this behaviour could be a possible UX improvement.

In any case, everything works. Thank you very much for helping me out here, @shcheklein!

@ValdarT yes, the reason for DVC relying on the actual path, not on DVC-file outputs is because output path in DV C-file is relative and it's possible that two or more DVC-files have the same output path values (while files are located in different subdirectories).

It might be a good addition though to signal a hint if such a situation is detect - there is a DVC-file(s) that has an output path value that matches the provided in dvc get CLI? Feel free to open a ticket for that - it might be a good first one!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmpetrov picture dmpetrov  路  64Comments

danfischetti picture danfischetti  路  41Comments

kskyten picture kskyten  路  44Comments

shcheklein picture shcheklein  路  36Comments

jorgeorpinel picture jorgeorpinel  路  45Comments