Git-lfs: Pure SSH-based protocol

Created on 25 Feb 2016  路  58Comments  路  Source: git-lfs/git-lfs

I would like to push git lfs changes via SSH, but this doesn't seem to work "out of the box". Specifically, I have two remotes: one with https (_origin_, i.e. GitHub) and one with ssh (_alternate_, i.e. a local computer not generally accessible to the web, only ssh). When I set up git-lfs, the https remote is fine, but I can no longer push to the ssh branch:

$ git push alternate master
Git LFS: (0 of 2 files) 0 B / 450.17 KB
exit status 127
error: failed to push some refs to "[email protected]:home/blah.git"
$ git lfs push alternate master
Git LFS: (0 of 2 files) 0 B / 450.17 KB
exit status 127

I suspect this Issue is related to Issues #295, and its Pull Request #350, and maybe #798

Will it be possible to have git-lfs work with existing ssh remotes "out of the box", like https? Is the currently supported solution to run git-lfs-ssh-serve on the remote? I have not been able to experiment with the git-lfs-ssh-serve yet. Forgive me if this is a naive or duplicate question, or unique to my setup. Thanks!

enhancement

Most helpful comment

please support git lfs push via ssh, git push over ssh is widely used.

All 58 comments

Git LFS doesn't support SSH as a transport protocol. So, as a hack workaround, it executes a git-lfs-authenticate command through SSH, which should return JSON. Unfortunately, this isn't really documented that well. Here's the JSON struct that it needs to return. Here's a live example:

$ ssh [email protected] git-lfs-authenticate github/git-lfs download
{
  "href": "https://api.github.com/lfs/github/git-lfs",
  "header": {
    "Authorization": "RemoteAuth monkey"
  },
  "expires_at": "2016-02-25T17:42:24Z"
}

This gives the URL and necessary header value to access the LFS API. Not only would you have to implement this git-lfs-authenticate script, but you'd have to setup some kind of LFS server somewhere to communicate with the client.

@technoweenie we also needed that feature, thus we removed all user handling from git-lfs-testserver and added the necessary infrastructure to support authentication through JWT tokens from gitolite (in our case).

https://github.com/mgit-at/lfs-test-server/tree/master/git-lfs-authenticate

Any update on this yet?

Any update on this yet?

Not yet.

please support git lfs push via ssh, git push over ssh is widely used.

We can't deploy via CI using ssh remote because LFS fails. It would be great if it would work.

The git-lfs-authenticate workaround is not practical in many cases. Often, it makes sense to keep the LFS repository on the same server as the Git repository. After all, both paths correspond to pieces of the same project... For servers that only allow SSH access for security reasons, you are screwed.
To me, SSH support is a necessity in order to be consistent with Git's capabilities.

+1, please support push via ssh

+1, this feature is very basic and necessary!

yes, please add git-lfs push/pull support through ssh.

The biggest problem currently with git-lfs is that it is not symetric in the port usage to git itself. Which makes it a pain in the ass in eg. enterprise deployments because for many uses git works (ssh) but git-lfs does not work because of firewall rules.

We need this feature. It's silly not to support it.

Please, we need this :)

Not having this is bad for whole git. Because it a non optional feature in many setups, but it adds inconsistency.

I mean git-lfs as tool is non optional in many cases.

Thank you, everybody. I have noted everyone's request of this feature, and will try and find time to work on this in the future. I will update this thread when there are updates to be shared.

+1 for git-lfs using SSH remotes

Please, we need this feature badly! Git-lfs-authenticate is not working on customized gerrit server.

+1 highly needed feature. Maybe any updates on this?

+1 for git-lfs using SSH remotes

I just lost several hours because of this, and error messages from VSTS were not helpful. Please fix.

+1

pretty please, otherwise this whole git lfs thing is a local gimmick :(

I gave up on git-lfs because without this feature it is too limited.

@gerroon Agreed. Although the whole idea of git-lfs is appealing, I could not implement it for any practical use because of this issue.

yeah I hopethat the Gitea devs find time to implement this.

Thanks for the continued interest. Someone from @git-lfs/core will be
sure to update this thread when we have made progress (if this is the
case).

For those that are interested, there's a proposal for a pure SSH-based protocol in #3290. I'd like to keep that discussion focused on the technical aspects of the proposal.

We're definitely aware that this is a highly desired feature. I'm going to retitle this issue to make it a little easier for people to find, since I had a bit of trouble finding it myself.

+1, please support push via ssh

+1 push via ssh

After two hours of looking around for my missing files and encountering loads of misleading error messages, I only found out now that SSH is not supported for LFS....

Am being diplomatic when i say am very upset that this feature is missing.

I wish I had found this forum and read @yucolabjames' comment two hours ago...

To be accurate, it _is_ possible to push to an LFS-enabled repository using SSH, but the LFS operations occur over HTTPS currently. We know that a pure SSH protocol is useful in situations where the current one is not, and it's definitely something we want to implement. We're just not there yet.

With 2FA enabled, LFS over ssh is dead at this point... prove me wrong

+1, please support this.

Was any progress made on this? The proposed SSH protocol seems quite involved. I'd like to explore a alternative, and hopefully simpler solution if no work has commenced on this yet.

@bk2204 @ttaylorr

We haven't implemented this yet, no. What were you thinking for a different solution?

I think an alternative would be to use the current HTTP API as is, but communicate the protocol over stdin/stdout with a remote running SSH command that then proxies to the regular HTTP backend.

Ordinarily, this would be hard to achieve whilst satisfying the requirements in your proposal, but if we look to HTTP/2 instead, which is designed to use a single connection and multiplexes requests, it should work very well.

Go makes it pretty easy to do this. We could provide a git-lfs-api binary for other implementers to use, as there's likely to be very little custom logic as all normal authentication flows would also be supported without modification.

I was planning on providing a demonstration of this, as in theory it's as simple as using a custom http.Transport specifically when you decide to route traffic via an SSH command... but I've found the LFS client to be not very accommodating to this change. I could put together a simplified example though, if there's any interest in this?

I considered using plain HTTP, but unfortunately there are some limitations to doing that:

  • We can't assume that server implementations speak HTTP/2, and if there are server implementations that don't, we essentially make them unusable with Git LFS over SSH. Implementing HTTP/2 is very significantly more complicated than HTTP/1.1; the former will almost always require some sort of compiled code, while the latter is usually implementable using the standard library in most scripting languages.
  • The authentication in SSH happens at the beginning of the connection, not at each individual request. The only analogue for doing that in HTTP is NTLM, which authenticates the entire connection, not each request; this surprising behavior is actually the basis for a variety of security bugs.
  • We end up sending a lot of data over the connection that is redundant by using HTTP. We neither need nor want a Host header, for example. We also don't want to receive or accept a Location header, as that could be a source of security problems.
  • Go's HTTP library is not really designed to work over this kind of transport. We can't use HTTP/2 without TLS, for example. Moreover, most languages are not designed to handle HTTP data over non-TCP connections.

The protocol I've proposed is designed to easily map to HTTP and reuses a lot of the same techniques (e.g. status codes and headers) without sending a lot of the redundant data we don't want. It does look complicated because of the grammar, but the mapping is relatively simple. It also uses the same syntax primitives (pkt-line) that are already in use in Git servers, so adding support for it should be relatively easy for server-side implementations.

We can't assume that server implementations speak HTTP/2

I agree. Whilst HTTP/2 has seen decent adoption, it might still be a hurdle. However, as I mentioned, I think this is fine as long as the solution provided is usable by everybody. The solution being a git-lfs-api binary that handles HTTP/2, without the upstream server needing to understand it yet.

The authentication in SSH happens at the beginning of the connection

The SSH authentication in this scenario is effectively only allowing the end-user to proxy LFS data to a specific upstream. All other authentication is handled per request, transparently tunneled. I still envision that the client would perform a request for git-lfs-authenticate, to then use those credentials with the proxied connection.

We end up sending a lot of data over the connection that is redundant by using HTTP.

This and the other issues are present in the current API that I cannot imagine will be disappearing any time soon. HTTP/2 also supports header compression, though. I don't think the overhead of headers would be an issue.

Go's HTTP library is not really designed to work over this kind of transport. We can't use HTTP/2 without TLS, for example. Moreover, most languages are not designed to handle HTTP data over non-TCP connections.

It can work without TLS, documented as h2c (HTTP/2 over TCP, rather than over TLS). It may well be true that it's difficult to emulate a TCP connection over stdin/stdout and support h2c in other languages though.

I added support for this to the LFS client earlier today and borrowed functionality from my https://github.com/saracen/lfscache project (with caching disabled) to implement the proxying over stdin/stdout. To proxy the actual batch downloads, the URLs to them need to be rewritten (which lfscache already does, so that the client hits the cache, rather than the remote S3 server etc.). There's a few complications with this approach; confusing log output (about what a HTTP GET request is really doing) and the entry-point into this functionality isn't super clear, but it was mostly my first stab at hooking up the end-to-end communication and get some files being transferred.

I've pushed the client side code that handles this, referenced PR below. The server command can be found here: https://github.com/saracen/git-lfs-tunnel

+1 for implementing lfs through ssh .. please share with us if any updates. Thanks!

@bk2204 I take it my response to your concerns wasn't enough to sway your thoughts on the direction of HTTP2 over SSH?

awesome :)!
Many THX for starting the implementation of LFS over pure ssh!

No, my opinions on this haven't changed since we discussed it last. I still think HTTP/2 over SSH isn't the right way forward.

@bk2204 Just to make sure, did you read my reply to your previous concerns and get a chance to look at the associated PR? Your first reply seemed to suggest you were confused over what I was proposing, so I was hoping my comments and PR would help there and that there would be another round of discussion.

Yes, I've looked at the PR. This PR works decently for the client side of things, but it makes non-Go implementations very difficult, because HTTP/2 libraries are usually over a TLS connection, not an SSH connection. We have alternate client implementations that do not use Go, and we have server implementations that don't use Go. I can imagine this being difficult to implement server-side in Java, for example. It will also likely be difficult to implement with existing HTTP/2 implementations in Rust as well.

I still think it's too early to assume that everyone has implemented HTTP/2, since multiple major hosting providers don't have public HTTP/2 support over HTTP. In addition, if we're going to design a protocol, I'd like to avoid having to make multiple SSH requests, one each for authentication and data, since doing so can prompt the user for each request (e.g. with SSH keys on a smart card).

In addition, while using HTTP/2 support is an easy way to get support, I don't think it's elegant from a protocol design perspective. For example, how do we handle a redirect? How do we handle it if the server tells us we should downgrade to HTTP/1.1? There's a lot of additional functionality in HTTP/2 (and just HTTP) that we neither need nor want and the error handling necessary to deal with all of these edge cases and make sure we don't introduce a security hole is going to be complex. It introduces a whole new security model for SSH, which I don't really want.

I think there's many libraries that support H2C. It's part of the HTTP2 specification, and whilst I'm not 100% sure, I doubt libraries would purposely not implement the cleartext version. I suspect the HTTP/2 protocol is identical in either case, and the only difference is the initial negotiation and the transport used.

I don't think the H2C part would be an issue. It might perhaps be harder to pipe it over anything, as we're doing in this case with an SSH connection. But that's a guess and I don't think not knowing right now immediately kills this idea.

I still think it's too early to assume that everyone has implemented HTTP/2

My PR and the server side implementation doesn't expect the host to support HTTP/2. The server side component is a binary that handles that HTTP/2 protocol, but doesn't expect the upstream server to support it.

The requirement of users here is SSH support for accessing LFS. We're not really designing a new protocol. My PR purposely doesn't touch much at all. Resolving the flaws of the current design, that may use multiple SSH requests is beyond its scope. It might be that there's an alternative solution there, but the last time I looked at this was in April, so I'm can't recall entirely how that works.

The SSH authentication and connection is effectively allowing communication to be proxied. This HTTP client is no more vulnerable to attacks than your existing HTTP client, because they're exactly the same implementation. The LFS server is also exactly the same server you'd already be communicating with. The difference is the binary acting as a transparent HTTP proxy, and in the example I've provided, all this does is rewrite the data URLs to enforce they go through the proxy - any other endpoint is dropped. I don't believe it adds any new security concerns, and if it does, I'd love to know, as it means my LFS caching server is also vulnerable.

Given how little this should change client-side, is it not worth exploring further? It doesn't mean there cannot be an alternative in the future. Users that require the SSH support can opt-in to use "h2ssh", rather than it being something that is rolled out to everybody automatically. Github doesn't even need to support it server side. For users running their own Git LFS server, this modification would simply allow them to drop a binary onto a system that proxies connections. If Github do wish to support it, that'd be great and it'd be awesome to pull other developers into this discussion that would be responsible for writing the git-lfs-tunnel binary.

Many self-hosted Git servers only support SSH, not HTTP (like this).
Wrapping HTTP/2 in SSH isn't the answer when SSH already provides authentication and transfer. Just write the files in series.

Is there any progress on this, beyond punting to lfs-folderstore?

There isn't any change in the status. We'll update the issue if there's any change.

Many self-hosted Git servers only support SSH, not HTTP (like this).
Wrapping HTTP/2 in SSH isn't the answer when SSH already provides authentication and transfer. Just write the files in series.

The current proposal for SSH support is here: https://github.com/git-lfs/git-lfs/blob/master/docs/proposals/ssh_adapter.md

That proposal, and my own, both use SSH's authentication. For transfer, a different protocol is used in both instances. One protocol is the pkt-line-HTTP-over-SSH and the other is HTTP2-over-SSH. In terms of protocol alone, @bk2204 has some reasons as why HTTP2 isn't suitable (1) a belief that it will be harder to implement in other languages, and 2) a misunderstanding that the hosted server will need to support HTTP2), but ignoring those potential issues, I don't think HTTP/2 over SSH is any more of an issue than the alternative.

@bk2204

There isn't any change in the status. We'll update the issue if there's any change.

I'm just now jumping into this conversation. Where can we track the progress. Is #3290 still being used, or was that only for the initial proposal?

P.S. We have a situation where authentication needs to rely on SSH keys for git lfs.

As mentioned, nothing has changed. The proposal has been written, but it hasn't yet been implemented. The amount of work to do so is non-trivial and will touch a lot of code.

It's already possible to use SSH keys for authentication. The current technique performs authentication over SSH and gets a token that can be used to talk to the HTTP API. The only thing that's outstanding is a protocol that doesn't use HTTP at all and operates completely over an SSH connection.

Hello, after getting error on push after needing hours to get git working on a Synology Diskstation with ssh access I bump into this issue.
Perhaps someone here is knowing Synology Diskstation. As I understand, it uses its git server over ssh. Ans that's the problem with lfs when I understand it correclty.
DSM Git ssh info: https://www.synology.com/de-de/knowledgebase/DSM/help/Git/git

Because of that git-lfs-authenticate error some at push. Has someone a solution and can me explain in detail (with an example :-)), what I can change to make git on the diskstation work over ssh with lfs support? Or what I can do on client side?

Would be nice to find a solution. Need lfs because of a lot of binary files.
thanks a lot!

The Synology NAS likely won't support Git LFS, even when we have a full SSH-based protocol. The NAS would need to support the future SSH-based protocol natively and just because it has a Git server doesn't mean it will support the Git LFS protocol.

Thank you for these clear advice. So I can stop to try nearly everything. I read tons of web sites and tutorials. Also official page: https://git-lfs.github.com
No one (or mostly no one) speaks about what requirements git servers must have for using lfs and how they can easily be checked. I found most new information after getting next errors.
So I know I have to stick to svn or to another solutions.
Thanks a lot! Would nice if Synology or "git server" or whoever ... can change something that git-lfs can also run on Synlogoy. I think it's beside qnap the biggest NAS provider. So should be needed.

So the conclusion is that LFS still does not work over SSH, or have I missed some conversation ?

Currently it's possible to use Git LFS with SSH remotes because authentication happens over SSH and then the actual data transfer happens over HTTP or HTTPS using credentials acquired over the SSH connection. However, Git LFS doesn't offer a native SSH-based protocol that doesn't require HTTP operations at the moment, which is what this issue is tracking.

The proposal immediately dismisses SFTP, but that seems like a mistake to me. From a layperson's perspective, it looks like SFTP would drastically reduce the amount of work required for this feature.

Can someone explain why SFTP is an insecure option?

each access (upload or download) must have an access control check instead of one at the beginning of the operation

Why isn't this a problem for e.g. the interactive sftp console packaged with SSH?

The proposal immediately dismisses SFTP, but that seems like a mistake to me. From a layperson's perspective, it looks like SFTP would drastically reduce the amount of work required for this feature.

Can someone explain why SFTP is an insecure option?

Most hosting providers use a single account (e.g., git) for all uploads and downloads. If a user invokes a shell command, access control is checked at the beginning of that shell command invocation; if the process is not allowed, then the command channel is rejected. Access control is checked based on the repository name passed to the command.

With SFTP, you basically have a file system that you can access. You spawn a single SFTP server process and manipulate any files you want on the remote system. The root is not restricted by a single repository name, so you're going to have to check access control on each file, which is expensive unless the kernel is doing it for you (which it is not in this case).

It is, of course, possible for your own server to mount the remote via sshfs (which uses SFTP) and push to it like a local server. That has been supported since local pushes came in with 2.10.

each access (upload or download) must have an access control check instead of one at the beginning of the operation

Why isn't this a problem for e.g. the interactive sftp console packaged with SSH?

Because multiple untrusted users don't share access via a single username.

It is, of course, possible for your own server to mount the remote via sshfs (which uses SFTP) and push to it like a local server. That has been supported since local pushes came in with 2.10.

This is the key, then. Thanks!

Was this page helpful?
0 / 5 - 0 ratings