Dvc: Hosting DVC repositories on Gitea?

Created on 8 Feb 2020  路  7Comments  路  Source: iterative/dvc

Hello,

I recently installed Gitea on my personal server. I find it great to manage my git repositories but I was wondering if there was a way to host my DVC repositories as well. My idea is that, since Gitea allows cloning git repositories with SSH, I should be able to clone DVC repositories with SSH as well.

So I tried it with one project. I set a URL for DVC which is as the one for git, but changing the extension from .git to .dvc (ideally, I would like to use this convention). I also created the corresponding directory on the server. If I try to dvc push, I always get this error:

ERROR: unexpected error - EOF during negotiation
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 130, in __init__
    server_version = self._send_version()
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp.py", line 134, in _send_version
    t, data = self._read_packet()
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp.py", line 201, in _read_packet
    x = self._read_all(4)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp.py", line 188, in _read_all
    raise EOFError()
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/command/data_sync.py", line 49, in run
    recursive=self.args.recursive,
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/repo/__init__.py", line 31, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/repo/push.py", line 25, in push
    return self.cloud.push(used, jobs, remote=remote)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/data_cloud.py", line 81, in push
    show_checksums=show_checksums,
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/local.py", line 385, in push
    download=False,
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/local.py", line 358, in _process
    download=download,
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/local.py", line 279, in status
    md5s, jobs=jobs, name=str(remote.path_info)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/ssh/__init__.py", line 329, in cache_exists
    ret = list(itertools.compress(checksums, in_remote))
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/ssh/__init__.py", line 322, in exists_with_progress
    return self.batch_exists(chunks, callback=pbar.update_desc)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/ssh/__init__.py", line 290, in batch_exists
    channels = ssh.open_max_sftp_channels()
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/dvc/remote/ssh/connection.py", line 301, in open_max_sftp_channels
    self._sftp_channels.append(self._ssh.open_sftp())
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/client.py", line 556, in open_sftp
    return self._transport.open_sftp_client()
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/transport.py", line 1097, in open_sftp_client
    return SFTPClient.from_transport(self)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 170, in from_transport
    return cls(chan)
  File "/home/gcoter/projects/music/unsupervised-source-separation/venv/lib/python3.6/site-packages/paramiko/sftp_client.py", line 132, in __init__
    raise SSHException("EOF during negotiation")
paramiko.ssh_exception.SSHException: EOF during negotiation
------------------------------------------------------------

Of course, I know that Gitea was not developed with DVC in mind, maybe my idea is impossible. But I would like to be sure. Could you give me advice about what I should check on the Gitea server to make sure that DVC can access it through SSH?

awaiting response question

Most helpful comment

I finally managed to make it work by using this docker image: https://github.com/atmoz/sftp

@gcoter, glad to hear that. And, thanks for sharing the setup information, this will benefit a lot of users for sure (including me :stuck_out_tongue:).

All 7 comments

@gcoter, you added the SSH url of repo that's in gitea as dvc remotes, right?
I'm afraid that's not supported as the SSH url is only meant for authentication in Gitea (same with github/gitlab for the matter). There's no way to ssh directly (and, it's meant to host git repositories).

DVC requires both SSH and SFTP access for SSH remotes to work. You can however git push to push to the remote gitea server, and add one of the remote storages supported by dvc. Then, you can share files tracked by dvc via dvc push.

I set a URL for DVC which is as the one for git, but changing the extension from .git to .dvc

You mean, you changed url similar to [email protected]:iterative/dvc.git to [email protected]:iterative/dvc.dvc? Anyway, hosting DVC repository is not supported by Gitea. However, as mentioned above, you can host git repository there and add a different remote storage on the DVC.

I recently installed Gitea on my personal server. I find it great to manage my git repositories but I
was wondering if there was a way to host my DVC repositories as well. My idea is that, since Gitea
allows cloning git repositories with SSH, I should be able to clone DVC repositories with SSH as well.

You can setup SSH server on the same one as Gitea and use as a SSH remote storage if you prefer. :slightly_smiling_face:

Hi @skshetry, thank you for your quick response! Indeed, my idea was that, if I have a git URL like [email protected]/gcoter/project.git, I would store the corresponding dvc repository at [email protected]/gcoter/project.dvc. I thought that the SSH service running on Gitea was agnostic to what is inside the requested directory (acting like SFTP) and that I could just use kind of the same URL, as long as I store my dvc in the same directory as the one used by Gitea.
So, if I understand well, SSH is only used for authentication, not as a SFTP server, right? In this case, I understand that it cannot work.

You can setup SSH server on the same one as Gitea and use as a SSH remote storage if you prefer

Yes I think I will do that. I will create a SFTP server which will be responsible for dvc repositories and store them next to the git repositories. That way, I can let Gitea work as it was intended :slightly_smiling_face:

@gcoter, just a friendly ping! :slightly_smiling_face:

Sorry for the late reply, though. Got lost.

Yes I think I will do that. I will create a SFTP server which will be responsible for dvc repositories and store them next to the git repositories. That way, I can let Gitea work as it was intended

Did you try? Did it work?
If you need any help, please do comment or ask us on Discord.

So, if I understand well, SSH is only used for authentication, not as a SFTP server, right? In this case, I understand that it cannot work.

Yes, it's used by Gitea to authenticate you so that you can push/pull from the repository. But, I don't know specific details though. Sorry.

Hi @skshetry, I didn't have the time to finish it but I did start to try several docker images (because it is a requirement for me to use docker) which would allow me to create a SFTP server sharing the same volume as Gitea. I will tell you when I am done :)

I finally managed to make it work by using this docker image: https://github.com/atmoz/sftp

Here is how I did it:
1) I cloned the project and built the image locally on the Raspberry Pi (because the image on Docker Hub was not compatible with ARM). To avoid confusion, I tagged it "rpi".
2) I added this to the docker-compose file I used for Gitea:

dvc:
  image: atmoz/sftp:rpi
  ports:
    - "<A port different from the one used by Gitea for SSH>:22"
  volumes:
    - /home/gcoter/.ssh/id_ed25519:/etc/ssh/ssh_host_ed25519_key:ro
    - /home/gcoter/.ssh/id_rsa:/etc/ssh/ssh_host_rsa_key:ro
    - dvc_data:/home/dvc
  command: dvc::1001

3) I opened a bash session inside the SFTP container and created the folder /home/dvc/gcoter (to mirror the convention used in Gitea). So now, all of the added dvc repositories will be stored in the volume dvc_data in the folder gcoter. I also added my public keys to /home/dvc/.ssh/authorized_keys with the right permissions because I want to authenticate with keys from my personal computer (no password).
4) Because Gitea and this SFTP are on the same host, I can use almost the same URL in my projects. For instance, ssh://git@<URL>:<Gitea SSH Port>/gcoter/project.git become ssh://dvc@<URL>/gcoter/project.dvc with the SFTP port.

I finally managed to make it work by using this docker image: https://github.com/atmoz/sftp

@gcoter, glad to hear that. And, thanks for sharing the setup information, this will benefit a lot of users for sure (including me :stuck_out_tongue:).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmpetrov picture dmpetrov  路  3Comments

analystanand picture analystanand  路  3Comments

prihoda picture prihoda  路  3Comments

siddygups picture siddygups  路  3Comments

tc-ying picture tc-ying  路  3Comments