Dvc: get: handle non-DVC repositories

Created on 8 Jan 2020  路  5Comments  路  Source: iterative/dvc

Even though #2977 solved this for dvc import, it looks like it is still not supported for dvc get. It might be a simple bug or code path is different. We need to fix and add tests for this.

Version:

VC version: 0.80.0+82cdce
Python version: 3.7.5
Platform: Darwin-18.2.0-x86_64-i386-64bit
Binary: False
Package: None
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1') 

Reproduce:

bash mkdir git-repo cd git-repo git init touch test git commit -a -m "add test" cd .. dvc get ./git-repo test

outputs:

ERROR: failed to get 'test' from './git-repo' - URL './git-repo' is not a dvc repository.

feature request p1-important

Most helpful comment

Hello world! I am on this right now.

All 5 comments

Hello world! I am on this right now.

I need some help choosing an implementation.

The only way to download something seems to be in the dvc.remote.* namespace. The problem with the remotes is that they always require a Repo instance to be passed in to be initialised.

I figure there's a few solutions here, and would greatly appreciate some input on which direction to take:

  • My preferred option: Remove the Repo dependency from the remotes (they don't seem to use it much, the base class uses repo.cache which could be a separate argument, and the local remote needs a tree argument). This enables you to create a remote and download from a remote source without having a repository, which is what we need for get here.
  • Refactor-heavier option which might be good for code organisation: Create a new namespace, svc.download.* by factoring out the download function of each remote. There would be a function in its __init__ that chooses which downloader to use in each situation. The remotes would access the download function directly, since they already know what kind of remote they are and therefore which download function they need.
  • Hacky option: Create a valid DVC repository in a temporary directory, initiate an import, and then copy the imported file over to the get target path.

I just realised that I'm probably trying to fix too much. A closer look at #2977 reveals that it only fixes local repositories, which is probably an easier problem to fix for get here.

Hi @fabiosantoscode !

This enables you to create a remote and download from a remote source without having a repository, which is what we need for get here.

could you elaborate on this, please? As far as I understand in this ticket actual file comes not from the Repo, and not even from a remote. It comes from the cloned git repo.

Regarding the other options - the same, I'm a bit confused, probably because we are not on the same page on the terminology and/or logic behind get/import.

Right. I probably jumped into looking at remotes because the get documentation states that it can download from any DVC-enabled git repository. My train of thought was to make it download from any git repository. Then I went into a tangent because I associated dvc.remote.* as the same concept as git remotes, which are git repositories. On a closer look they seem to be single files.

Got it, will use the existing git facilities to retrieve the thing. They probably work for local and remote repositories alike.

Thanks for clearing this up!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mfrata picture mfrata  路  3Comments

analystanand picture analystanand  路  3Comments

gregfriedland picture gregfriedland  路  3Comments

siddygups picture siddygups  路  3Comments

shcheklein picture shcheklein  路  3Comments