Gitea: Gitea LFS Storage different than Git LFS Storage

Created on 16 Sep 2018 · 15Comments · Source: go-gitea/gitea

Gitea version (or commit ref): 1.5.1
Git version: 2.18.0.windows.1
Operating system: Windows 7
Database (use [x]):
- [ ] PostgreSQL
- [ ] MySQL
- [ ] MSSQL
- [x] SQLite
Can you reproduce the bug at https://try.gitea.io:
- [ ] Yes (provide example URL)
- [ ] No
- [x] Not relevant

Description

Git LFS stores its files in e.g.

01/23/0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Gitea LFS stores its files in e.g.

01/23/456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Why the difference? It prevents me from e.g. re-using the LFS storage between Git and Gitea for other purposes.

reviewewontfix

Source

chowey

Most helpful comment

@lunny spec is for client side. Server side we can store them as we want. There is no point in changing something that is working and possibly break something

lafriks on 6 Oct 2018

👍2 ❤1 😕1

All 15 comments

What do you mean by git lfs?

lafriks on 16 Sep 2018

@chowey Gitlab's LFS server also has this filesystem layout.

zeripath on 16 Sep 2018

👀1

@lafriks when I do git lfs install, git puts the LFS cache into .git/lfs by default (although it can go anywhere if I set git config lfs.storage). The folder structure is almost the same as Gitea, but not quite.

@zeripath I didn't know that. I take it this means we won't be changing it?

chowey on 16 Sep 2018

The Git-LFS specs state:

These filters ensure that large files aren't written into the repository proper, instead being stored locally at .git/lfs/objects/{OID-PATH} (where {OID-PATH} is a sharded filepath of the form OID[0:2]/OID[2:4]/OID), synchronized with the Git LFS server as necessary.

eNBeWe on 1 Oct 2018

👍2

I happen to track many very large files, so sharing the lfs.storage between Git and Gitea in my local workspace saves me terabytes of disk space.

It makes sense to me to follow the Git-LFS spec, even if Gitlab does it differently.

chowey on 1 Oct 2018

Most probably no as that is just server side and in the future we could rewrite it not to be stored on server locally but for example in aws as an option.
There is no spec on how to store it on server side. Also sharing same directory for server & client is dangerous as at one moment you can get corrupted lfs directory

lafriks on 1 Oct 2018

~~~This maybe need to be an option since some filesystem are also limited in the number of file in a folder.~~~ (edit: I missed read)

sapk on 2 Oct 2018

@sapk we already do create two level subdirs so there should not be problems with number of files but we do it a bit different than they are stored in git client side

lafriks on 3 Oct 2018

The difference is we use OID[0:2]/OID[2:4]/OID[4:] but https://github.com/git-lfs/git-lfs/blob/master/docs/spec.md#intercepting-git say OID[0:2]/OID[2:4]/OID. I think we can create a migration to keep the compatible with the standard.

lunny on 6 Oct 2018

@lunny spec is for client side. Server side we can store them as we want. There is no point in changing something that is working and possibly break something

lafriks on 6 Oct 2018

👍2 ❤1 😕1

Hmm I think it's fairly cheap to tell whether a filename is a full oid or
not. I think it's even possible to tell whether the filename is a full oid
by the length of the name, afaics all oids have the same length.

You could check whether an lfs repository was a client style repository or
gitlab style fairly easily, meaning it could be autodetected at startup and
chosen from there.

A migration would also be fairly simple as it is just changing filenames.
Even if there is failure of autodetection, a more expensive check involving
rehashing the lfs files could be done.

I'm not suggesting that this be done but it seems like it would be possible
to support not only either, but both types of repository without too much
risk.

Similarly, it would be possible for downstream users to use an inotify
approach to keep a local client repository in sync using symbolic links.

It's just munging filenames.

Andrew Thornton
[email protected]

zeripath on 7 Oct 2018

@zeripath serverside and clientside lfs structure can not be reused as it can potentially lead to data loss.

lafriks on 8 Oct 2018

@lafriks I trust you on this, but I'm just curious... how does it lead to data loss? Are LFS objects not read-only once created?

Obviously a misplaced git lfs prune will be no good. Executed on the client-side, it would wipe out your server-side LFS too.

Is there some other aspect to LFS storage that could lead to data corruption? I'm curious because I also use a shared LFS storage on the client side (by git configure --global lfs.storage C:\lfs for example).

chowey on 9 Oct 2018

git lfs prune on client side is one thing that will definetly lead to data loss on server.
Other is that removing repository on server would corrupt checked out copy on client side as all orphaned oids will be deleted.

lafriks on 9 Oct 2018

Okay, thank you. That makes sense.

With things like git lfs prune able to delete data on the server, you've convinced me that it is a bad idea.

I've looked at using hard links to share files between the server storage and the client storage. Its tricky because you need to make sure files are read-only, or else you still risk one corrupting the other. But that is the best alternative I've come up with.

I'm closing this issue as a bad idea.

chowey on 10 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings