dvc fails to list contents of ssh remote.
$ dvc list ssh://ip_address/data/dvc_test/
ERROR: failed to list 'ssh://ip_address/data/dvc_test/' - Failed to clone repo 'ssh://ip_address/data/dvc_test/' to '/tmp/tmpa0q8bkkbdvc-clone': Cmd('git') failed due to: exit code(128)
cmdline: git clone --no-single-branch --progress -v ssh://ip_address/data/dvc_test/ /tmp/tmpa0q8bkkbdvc-clone
I can see why it may be failed, as it appears to expect the remote to be a git repository, which I've confirmed it's not. I also confirmed the same behaviour when creating a local filesystem remote.
Remote was setup on client, ssh is using keybased authentcation, and user is defined in ~/.ssh/config for the ssh host
$ dvc remote add -d storage ssh://ip_address/data/dvc_test
$ dvc remote list
storage ssh://ip_address/data/dvc_test
Remote works fine for push and and pull operations.
Output of dvc version:
$ dvc version
DVC version: 1.8.1 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.8.4-200.fc32.x86_64-x86_64-with-glibc2.2.5
Supports: http, https, ssh
Cache types: hardlink, symlink
Repo: dvc, git
Additional Information (if any):
If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.
$ dvc list -v ssh://ip_address/data/dvc_test/
2020-10-08 12:06:46,373 DEBUG: Creating external repo ssh://ip_address/data/dvc_test/@None
2020-10-08 12:06:46,373 DEBUG: erepo: git clone 'ssh://ip_address/data/dvc_test/' to a temporary dir
2020-10-08 12:06:48,178 ERROR: failed to list 'ssh://ip_address/data/dvc_test/' - Failed to clone repo 'ssh://ip_address/data/dvc_test/' to '/tmp/tmpnkq00fn3dvc-clone': Cmd('git') failed due to: exit code(128)
cmdline: git clone --no-single-branch --progress -v ssh://ip_address/data/dvc_test/ /tmp/tmpnkq00fn3dvc-clone
------------------------------------------------------------
Traceback (most recent call last):
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/scm/git.py", line 127, in clone
tmp_repo = clone_from()
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/git/repo/base.py", line 1019, in clone_from
return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/git/repo/base.py", line 956, in _clone
handle_process_output(proc, None, progress.new_message_handler(), finalize_process, decode_streams=False)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/git/cmd.py", line 115, in handle_process_output
return finalizer(process)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/git/util.py", line 328, in finalize_process
proc.wait(**kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/git/cmd.py", line 408, in wait
raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git clone --no-single-branch --progress -v ssh://ip_address/data/dvc_test/ /tmp/tmpnkq00fn3dvc-clone
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/command/ls/__init__.py", line 30, in run
entries = Repo.ls(
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/repo/ls.py", line 36, in ls
with external_repo(url, rev, fetch=False, stream=True) as repo:
File "/usr/lib64/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/external_repo.py", line 41, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/external_repo.py", line 354, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/funcy/flow.py", line 244, in wrap_with
return call()
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/external_repo.py", line 414, in _clone_default_branch
git = Git.clone(url, clone_path)
File "/home/user/.local/pipx/venvs/dvc/lib64/python3.8/site-packages/dvc/scm/git.py", line 132, in clone
raise CloneError(url, to_path) from exc
dvc.scm.base.CloneError: Failed to clone repo 'ssh://ip_address/data/dvc_test/' to '/tmp/tmpnkq00fn3dvc-clone'
------------------------------------------------------------
2020-10-08 12:06:48,209 DEBUG: Analytics is disabled.
This is expected behavior, dvc list only works for listing the contents of an actual DVC repository (which is not the same thing as a DVC remote).
Can you share what your intended use case is for this? If you want to see what files need to be pushed or pulled from a remote, you can use dvc status -c
Hmmm, maybe a terminology thing that's not clear. I would have thought that a "remote" refers to a "repository".
The use case is using a shared DVC remote to store versioned artifacts. So git repo A references only some of those artifacts, git repo B references a subset of those and maybe some others. A user comes along and wants to start a new project, asking what's already in the DVC remote that I might be able to use to then add to their new git repo C they've just created.
A "DVC remote" is more like a remote mirror of what is stored in .dvc/cache - so content addressable storage (identified by MD5 hashes). The "repository" itself is the main project repo (which is usually also a git repo) containing .dvc files.
So listing the contents of a remote really only tells you "the file with the binary contents that is hashed to abc1234 is present in this remote".
In your scenario, I think what you may be looking for something like this?
dvc push everything into your default remote locationdvc import or dvc get to reuse whichever files they need from your original project.For 3, dvc list can be used with the path or URL to your original DVC/git repo (rather than the path/URL to a DVC remote) to see which files are available.
Closing, as dvc list is not really meant to be used in the remotes.