dvc.api: doesn't work with repo with SSH URL

Created on 11 Jan 2020  路  7Comments  路  Source: iterative/dvc

$ dvc version
DVC version: 0.80.0
Python version: 3.7.6
Platform: Darwin-19.0.0-x86_64-i386-64bit
Binary: False
Package: brew

This script

import csv
import dvc.api

with dvc.api.open(
  "sea_ice.csv",
  repo="[email protected]:iterative/df_sea_ice_no_header.git"
) as fd:
  reader = csv.reader(fd)
  for row in reader:
    print(row[0])

Throws

Traceback (most recent call last):
  File "tests/open-test.py", line 6, in <module>
    repo="[email protected]:iterative/df_sea_ice_no_header.git"
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/api.py", line 78, in _open
    with _make_repo(repo, rev=rev) as _repo:
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/api.py", line 98, in _make_repo
    yield Repo(repo_url)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/repo/__init__.py", line 77, in __init__
    root_dir = self.find_root(root_dir)
  File "/Users/drbidibombom/DVC-repos/api-tests/.env/lib/python3.7/site-packages/dvc/repo/__init__.py", line 145, in find_root
    raise NotDvcRepoError(root)
dvc.exceptions.NotDvcRepoError: you are not inside of a dvc repository (checked up to mount point '/')

It looks like it's trying to use my cwd . as the repo (and I'm not working from a DVC repo).

bug c5-half-a-day p1-important

Most helpful comment

@jorgeorpinel, I renamed the title, to reflect that it affects all public apis for all SSH urls. :slightly_smiling_face:

All 7 comments

p.s. If I try the HTTP URL it throws a different error: That it can't find the repo.

Caused by urlparse not being able to understand the git URL

https://github.com/iterative/dvc/blob/8787ac03808fcf38dcac01f7a57f82f216b28e12/dvc/api.py#L64-L71

An ugly patch would be to do:

diff --git a/dvc/api.py b/dvc/api.py
index ffb044a0..ce9f3b44 100644
--- a/dvc/api.py
+++ b/dvc/api.py
@@ -87,7 +87,7 @@ def read(path, repo=None, rev=None, remote=None, mode="r", encoding=None):

 @contextmanager
 def _make_repo(repo_url, rev=None):
-    if not repo_url or urlparse(repo_url).scheme == "":
+    if not repo_url or (urlparse(repo_url).scheme == "" and not urlparse(repo_url).path.startswith('git@')):
         assert rev is None, "Custom revision is not supported for local repo"
         yield Repo(repo_url)
     else:

Need to research the correct way to parse those Git URLs.

https://github.com/coala/git-url-parse

looks good :ok_hand:


EDIT: It doesn't work thaaat well, it has some issues parsing local URL's and https

We might want to always use external_repo() instead.

This issue also affects public GH repo accessed through SSH URL, and not just dvc.api.open but, all of the public APIs, i.e. open, read, get_url, and summon, etc. that uses _make_repo() (ref: https://github.com/iterative/dvc/issues/3111#issuecomment-573362145).

So, the following throws NotDvcRepoError as well:

import dvc.api

url = dvc.api.get_url("get-started/data.xml", repo="[email protected]:iterative/dataset-registry")

@jorgeorpinel, I renamed the title, to reflect that it affects all public apis for all SSH urls. :slightly_smiling_face:

Was this page helpful?
0 / 5 - 0 ratings