Dvc: Proxy Command not working

Created on 18 May 2020  路  5Comments  路  Source: iterative/dvc

Hello!

I have a decentralized setup with a distributed master dvc cache in a private server, it is behind a jump server (proxy).

My guess is that dvc can be configured in a similar way to rsync. As far as I know (please, correct me if I'm wrong), there are several ways to set this up:

  • ~/.ssh/config with the necessary host (ProxyJump variable in modern OS, but not supported by pramiko or ProxyCommand for the rest)
  • port forwarding (less than ideal)
  • rsync -e option (obviously, not available in dvc)

I am able to get rsync working with all three options. But I can only get dvc to work when doing port forwarding.

I think the problem comes from paramiko (see paramiko/issues/512), but I report it here just in case you had other possible solutions or want to point out anything.

Just for completeness, here is the ssh config (replacing *.server with the real servers):

Host backup.server
    ProxyCommand ssh -q -W %h:%p [email protected]

And the local dvc config:

['remote "decentralized_dvc"']
    ask_password = true
    url = ssh://[email protected]:/home/gblanco/dvc
[core]
    remote = decentralized_dvc

Also, a minimum (not) working example with paramiko:

import paramiko
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
sock = paramiko.ProxyCommand('ssh -q -W %h:%p [email protected]')
ssh.connect('backup.server', port=22, username='gblanco', banner_timeout=100, sock=sock)
stdin,stdout,stderr = ssh.exec_command("ls .") 
ssh.close()

In my honest opinion, support for rsync or more customization in the ssh side could help a lot, since it seems to be a common use case. What do you think?

Best,
Guillermo.

awaiting response feature request question

All 5 comments

Interesting question. Let me move this to the core DVC repo (this repo is for our docs 馃檪). Thanks

Hi @geblanco !

I would double check your setup, seems like it works fine for other people that are not on windows. If it still doesn't work, another simple workaround is to just mount your remote through sshfs and use it as a local remote.

Hi!

My bad, I though this was the core dev repo :).

I believe it's working for many people, though not for everybody under linux. I guess that everyone having different versions of OpenSSH doesn't help either. I have double and triple checked my setup, though there are some annoying things. For example, rsync seems to behave correctly with -tt in the poxy command, but dvc/paramiko do not seem to accept it.

In any case, @efiop, I hadn't though about sshfs, seems like a great trick actually, I'll give a try and come back.

Maybe we can close this issue and reopen it when new information is available?

Sounds good. Let's close it for now.

Hello,

I have successfully tried the sshfs approach, I'll leave here some instructions, just in case it's useful for anyone else :)

Setup sshfs

remote_dir='/data'
local_dir='/home/<user>/dvc_data'
sshfs \
    -o idmap=user \
    -o allow_other,default_permissions \
    -o ssh_command="ssh -p 22 -A <user>@<proxy_server> ssh" \
    -C "${remote_dir}" "${local_dir}"

Configure dvc:

[core]
    remote = server_dvc
['remote "server_dvc"']
    url = "/home/<user>/dvc_data"

Hope this helps!

Sadly, this approach bugs my computer when the server network fails (we're having some network problems). I am considering VPN.

Best,
Guillermo

Was this page helpful?
0 / 5 - 0 ratings

Related issues

robguinness picture robguinness  路  3Comments

prihoda picture prihoda  路  3Comments

siddygups picture siddygups  路  3Comments

shcheklein picture shcheklein  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments