Dvc: ssh: support scp-like relpaths in urls

Created on 4 Jul 2020  ·  7Comments  ·  Source: iterative/dvc

Bug Report

Machine 1 (macos):

dvc version

console
DVC version: 1.1.7
Python version: 3.7.6
Platform: Darwin-19.5.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, ssh
Cache: reflink - supported, hardlink - supported, symlink - supported
Repo: dvc, git

Machine 2 (ubuntu):

dvc version

DVC version: 1.1.2
Python version: 3.8.2
Platform: Linux-5.4.0-39-generic-x86_64-with-glibc2.29
Binary: False
Package: pip
Supported remotes: gdrive, http, https, ssh
Filesystem type (workspace): ('ext4', '/dev/nvme0n1p2')

Same problem on both machines:

dvc push -v

failed to upload '.dvc/cache/ff/a857a0b16c937f60da296bcd3a337e' to 'ssh://user@server/dvc/datasets/ff/a857a0b16c937f60da296bcd3a337e' - unable to create remote directory '/dvc': [Errno 13] Permission denied
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 113, in makedirs
    self.sftp.mkdir(path)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 460, in mkdir
    self._request(CMD_MKDIR, path, attr)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 813, in _request
    return self._read_response(num)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 865, in _read_response
    self._convert_status(msg)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 896, in _convert_status
    raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Permission denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/local.py", line 328, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/base.py", line 431, in upload
    no_progress_bar=no_progress_bar,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/__init__.py", line 268, in _upload
    no_progress_bar=no_progress_bar,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 214, in upload
    self.makedirs(posixpath.dirname(dest))
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 109, in makedirs
    self.makedirs(head)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 109, in makedirs
    self.makedirs(head)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 120, in makedirs
    ) from exc
dvc.exceptions.DvcException: unable to create remote directory '/dvc'
------------------------------------------------------------

What have I found and how I fix it:

The problem is that in ssh/connection.py/makedirs parameter path comes like /dvc starts with /, this path then passed to paramiko and it can't make dir with / prefix, permission denied error appears. Same behavior if you try

$ mkdir /dvc
$ mkdir: cannot create directory ‘/dvc’: Permission denied

So when I do this:

remote/ssh/connection.py

def makedirs(self, path):
        # Single stat call will say whether this is a dir, a file or a link

        if path.startswith('/'):
            path = path[1:]
        st_mode = self.st_mode(path)
        ......

The problem with makedir is solved!

Then the new one appears:

2020-07-04 13:57:57,288 ERROR: failed to upload '.dvc/cache/1d/7f483312a6e63bf4ebb06cff427a9c' to 'ssh://user@server/dvc/datasets/1d/7f483312a6e63bf4ebb06cff427a9c' - [Errno 2] No such file
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/local.py", line 328, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/base.py", line 431, in upload
    no_progress_bar=no_progress_bar,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/__init__.py", line 268, in _upload
    no_progress_bar=no_progress_bar,
  File "/usr/local/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 225, in upload
    self.sftp.put(src, tmp_file, callback=pbar.update_to)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 759, in put
    return self.putfo(fl, remotepath, file_size, callback, confirm)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 714, in putfo
    with self.file(remotepath, "wb") as fr:
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 372, in open
    t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 813, in _request
    return self._read_response(num)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 865, in _read_response
    self._convert_status(msg)
  File "/usr/local/lib/python3.7/site-packages/paramiko/sftp_client.py", line 894, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file
------------------------------------------------------------

The problem is the same, remote path starts with / and paramiko cant load it. So the fix is same (but in 2 places)

remote/ssh/connection.py

def upload(self, src, dest, no_progress_bar=False, progress_title=None):

        self.makedirs(posixpath.dirname(dest))
        tmp_file = tmp_fname(dest)

        # FIX
        if tmp_file.startswith('/'):
            tmp_file = tmp_file[1:]
        # ------

        if not progress_title:
            progress_title = posixpath.basename(dest)

        with Tqdm(
            desc=progress_title, disable=no_progress_bar, bytes=True
        ) as pbar:
            self.sftp.put(src, tmp_file, callback=pbar.update_to)

        # FIX
        if dest.startswith('/'):
            dest = dest[1:]
        # ------

        self.sftp.rename(tmp_file, dest)

And now push works well.

P.S.
I think this fix is too stride forward to be high quality, so I can make a pull request or after OK from you either after comment on how to make it better.

feature request help wanted p3-nice-to-have

Most helpful comment

Using the abs path as a workaround works. I'm not blocked anymore. :+1:

What about sftp and root dir... Server guys created a user for me on some common server where are some other сolleagues have users too, so I have powers only in my user's home folder :relaxed: (I'm not very good at server things, so I do not know the features of sftp and other stuff)

All 7 comments

Hi @Kuluum !

dvc treats the paths in url as absolute ones, hence why it tries to create /dvc. What you want is a relative path behaviour, where we pass a relpath to sftp and so it uses it based on the home directory for the user you are accessing the server as.

IIRC, utils like scp accept both path and /path in their url after :. But at the same time, you won't be able to specify the port in that url and will have to use -p for that. If you try to do that right now in dvc, you'll get an error about us not being able to cast string to int.

The current workaround is to use an abspath like /home/user/dvc instead of /dvc in your url. Would that work for you for now?

As to a proper solution, we clearly need some in-url way to differentiate abs path from a relative one and treat them properly. We could make our parsing smarter so it understands :my/path(and :22:my/path).

@efiop Thanks! With the absolute path, all work well. Also, I see that usage of the absolute path is described in the documentation, but I didn't notice it. For some reason, I used to expect the path '/dvc/datasets' to be relative to the ssh connect folder. My user has no rights to write to the root folder and it's the real reason why it cant create this folders.

I used to expect the path '/dvc/datasets' to be relative to the ssh connect folder

Do you remember any examples like that? I'm having a trouble remembering anything like that, but maybe you used to use some sftp servers that run in a user home dir? Usually all sftp servers are started with a root in a real fs root /, but sometimes they are configured to use a different root dir, I've seen that happen (there are even some issues that we have where people were confused by it). So maybe you could consider configuring your sftp server to behave like that too? Though that would be more confusing than just using abs paths, in my opinion.

So you are using abs path (e.g. /home/user/dvc) for now as a workaround, right? Just double checking that you are not blocked by this anymore :slightly_smiling_face:

Using the abs path as a workaround works. I'm not blocked anymore. :+1:

What about sftp and root dir... Server guys created a user for me on some common server where are some other сolleagues have users too, so I have powers only in my user's home folder :relaxed: (I'm not very good at server things, so I do not know the features of sftp and other stuff)

For the record: seems like we have a very similar problem with hdfs, where we always pass /path to pyarrow, but it actually supports path too, but uses what seems to be /user/efiop as root for that. Will need to take a closer look at it as well.

I got confused by this also and took a cycle to realize what's happening. I think we should at least document this properly /cc @jorgeorpinel .

Not 100% sure I got the issue, workaround, and what's missing from docs, but here's a small PR for you guys to check please: https://github.com/iterative/dvc.org/pull/1649

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  ·  3Comments

anotherbugmaster picture anotherbugmaster  ·  3Comments

siddygups picture siddygups  ·  3Comments

mdscruggs picture mdscruggs  ·  3Comments

shcheklein picture shcheklein  ·  3Comments