Dvc: ssh: error while import-url with special character

Created on 28 Sep 2020  路  7Comments  路  Source: iterative/dvc

Bug Report

Problem while download files with special characters.

Please provide information about your setup

DVC version: 1.7.9 
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-48-generic-x86_64-with-glibc2.10
Supports: gdrive, gs, http, https, ssh
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Workspace directory: ext4 on /dev/sda5
Repo: dvc, git

Error log :

2020-09-28 15:53:35,511 ERROR: failed to import ssh://[email protected]/home/data/cana/ds30. You could also try downloading it manually, and adding it with `dvc add`. - ssh command 'md5sum /home/data/cana/ds30/cana-mucuna/class35_e2545053-f2c5-4108-9042-67244a94e267_p_['cana']_o_['cana', 'mucuna'].jpg' finished with non-zero return code 1': md5sum: '/home/data/cana/ds30/cana-mucuna/class35_e2545053-f2c5-4108-9042-67244a94e267_p_[cana]_o_[cana,': No such file or directory
md5sum: mucuna].jpg: No such file or directory

------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/command/imp_url.py", line 14, in run
    self.repo.imp_url(
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/repo/__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/repo/imp_url.py", line 54, in imp_url
    stage.run()
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/funcy/decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/stage/decorators.py", line 36, in rwlocked
    return call()
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/funcy/decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/stage/__init__.py", line 429, in run
    sync_import(self, dry, force)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/stage/imports.py", line 29, in sync_import
    stage.save_deps()
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/stage/__init__.py", line 392, in save_deps
    dep.save()
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/output/base.py", line 268, in save
    self.hash_info = self.get_hash()
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/output/base.py", line 178, in get_hash
    return self.tree.get_hash(self.path_info)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/base.py", line 263, in get_hash
    hash_info = self.get_dir_hash(path_info, **kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/base.py", line 330, in get_dir_hash
    dir_info = self._collect_dir(path_info, **kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/base.py", line 310, in _collect_dir
    new_hashes = self._calculate_hashes(not_in_state)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/base.py", line 296, in _calculate_hashes
    return dict(zip(file_infos, hashes))
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/base.py", line 295, in <genexpr>
    hashes = (hi.value for hi in executor.map(worker, file_infos))
  File "/home/gabriel/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/home/gabriel/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/gabriel/anaconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/home/gabriel/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/progress.py", line 126, in wrapped
    res = fn(*args, **kwargs)
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/ssh/__init__.py", line 242, in get_file_hash
    return HashInfo(self.PARAM_CHECKSUM, ssh.md5(path_info.path))
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 295, in md5
    md5 = self.execute("md5sum " + path).split()[0]
  File "/home/gabriel/anaconda3/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 276, in execute
    raise RemoteCmdError("ssh", cmd, ret, err)
dvc.tree.base.RemoteCmdError: ssh command 'md5sum /home/data/cana/ds30/cana-mucuna/class35_e2545053-f2c5-4108-9042-67244a94e267_p_['cana']_o_['cana', 'mucuna'].jpg' finished with non-zero return code 1': md5sum: '/home/data/cana/ds30/cana-mucuna/class35_e2545053-f2c5-4108-9042-67244a94e267_p_[cana]_o_[cana,': No such file or directory
md5sum: mucuna].jpg: No such file or directory

------------------------------------------------------------
2020-09-28 15:53:35,520 DEBUG: Analytics is enabled.
2020-09-28 15:53:35,605 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x4p60hi']'
2020-09-28 15:53:35,608 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp4x4p60hi']'
bug help wanted p2-medium triage

Most helpful comment

It's a good question. I'm not sure if that issue is exactly appropriate. I guess we could open a new one and see what the maintainers say. I commented on that one for now: https://github.com/paramiko/paramiko/issues/583#issuecomment-700967071

All 7 comments

Probable cause: the path to the file is /home/data/cana/ds30/cana-mucuna/class35_e2545053-f2c5-4108-9042-67244a94e267_p_['cana']_o_['cana', 'mucuna'].jpg (includes combinations of special charactes like [, ', ], ,, and ) which the file system supports via terminal as well as ssh and scp, but paramiko doesn't support it. See https://github.com/paramiko/paramiko/issues/583

@jorgeorpinel looks like paramiko/paramiko#583 is about exec_command is it still relevant in this case?

(I'm asking mostly to see if we need to create a ticket on the paramiko side in advance if we are sure this paramiko's issue- it takes time to resolve them)

It's a good question. I'm not sure if that issue is exactly appropriate. I guess we could open a new one and see what the maintainers say. I commented on that one for now: https://github.com/paramiko/paramiko/issues/583#issuecomment-700967071

Yep, thanks @jorgeorpinel. It looks like related indeed. I'm still curious though what is the right solution - expect Paramiko to escape things, or expect Paramiko to be a think layer that doesn't alter what you pass into it - and it's our responsibility to escape the command, path, etc.

@jorgeorpinel That one is not related to this issue. We simply didn't escape the command ourselves, paramiko shouldn't take care of that for us, as commands are crafted by us. Will be fixed by https://github.com/iterative/dvc/pull/4767

paramiko shouldn't take care of that for us, as commands are crafted by us

But some times there are limitations in out dependencies that are too difficult to address, I think e.g. https://github.com/iterative/dvc/issues/4392#issuecomment-674448191 which is fine, I think.

Anyway, glad this is resolved 鉁岋笍

Was this page helpful?
0 / 5 - 0 ratings