Gitea: Gitea LFS Storage different than Git LFS Storage

Created on 16 Sep 2018  路  15Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.5.1
  • Git version: 2.18.0.windows.1
  • Operating system: Windows 7
  • Database (use [x]):

    • [ ] PostgreSQL

    • [ ] MySQL

    • [ ] MSSQL

    • [x] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [ ] No

    • [x] Not relevant

Description

Git LFS stores its files in e.g.

01/23/0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Gitea LFS stores its files in e.g.

01/23/456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef

Why the difference? It prevents me from e.g. re-using the LFS storage between Git and Gitea for other purposes.

reviewewontfix

Most helpful comment

@lunny spec is for client side. Server side we can store them as we want. There is no point in changing something that is working and possibly break something

All 15 comments

What do you mean by git lfs?

@chowey Gitlab's LFS server also has this filesystem layout.

@lafriks when I do git lfs install, git puts the LFS cache into .git/lfs by default (although it can go anywhere if I set git config lfs.storage). The folder structure is almost the same as Gitea, but not quite.

@zeripath I didn't know that. I take it this means we won't be changing it?

The Git-LFS specs state:

These filters ensure that large files aren't written into the repository proper, instead being stored locally at .git/lfs/objects/{OID-PATH} (where {OID-PATH} is a sharded filepath of the form OID[0:2]/OID[2:4]/OID), synchronized with the Git LFS server as necessary.

I happen to track many very large files, so sharing the lfs.storage between Git and Gitea in my local workspace saves me terabytes of disk space.

It makes sense to me to follow the Git-LFS spec, even if Gitlab does it differently.

Most probably no as that is just server side and in the future we could rewrite it not to be stored on server locally but for example in aws as an option.
There is no spec on how to store it on server side. Also sharing same directory for server & client is dangerous as at one moment you can get corrupted lfs directory

~This maybe need to be an option since some filesystem are also limited in the number of file in a folder.~ (edit: I missed read)

@sapk we already do create two level subdirs so there should not be problems with number of files but we do it a bit different than they are stored in git client side

The difference is we use OID[0:2]/OID[2:4]/OID[4:] but https://github.com/git-lfs/git-lfs/blob/master/docs/spec.md#intercepting-git say OID[0:2]/OID[2:4]/OID. I think we can create a migration to keep the compatible with the standard.

@lunny spec is for client side. Server side we can store them as we want. There is no point in changing something that is working and possibly break something

Hmm I think it's fairly cheap to tell whether a filename is a full oid or
not. I think it's even possible to tell whether the filename is a full oid
by the length of the name, afaics all oids have the same length.

You could check whether an lfs repository was a client style repository or
gitlab style fairly easily, meaning it could be autodetected at startup and
chosen from there.

A migration would also be fairly simple as it is just changing filenames.
Even if there is failure of autodetection, a more expensive check involving
rehashing the lfs files could be done.

I'm not suggesting that this be done but it seems like it would be possible
to support not only either, but both types of repository without too much
risk.

Similarly, it would be possible for downstream users to use an inotify
approach to keep a local client repository in sync using symbolic links.

It's just munging filenames.

Andrew Thornton
[email protected]

@zeripath serverside and clientside lfs structure can not be reused as it can potentially lead to data loss.

@lafriks I trust you on this, but I'm just curious... how does it lead to data loss? Are LFS objects not read-only once created?

Obviously a misplaced git lfs prune will be no good. Executed on the client-side, it would wipe out your server-side LFS too.

Is there some other aspect to LFS storage that could lead to data corruption? I'm curious because I also use a shared LFS storage on the client side (by git configure --global lfs.storage C:\lfs for example).

git lfs prune on client side is one thing that will definetly lead to data loss on server.
Other is that removing repository on server would corrupt checked out copy on client side as all orphaned oids will be deleted.

Okay, thank you. That makes sense.

With things like git lfs prune able to delete data on the server, you've convinced me that it is a bad idea.

I've looked at using hard links to share files between the server storage and the client storage. Its tricky because you need to make sure files are read-only, or else you still risk one corrupting the other. But that is the best alternative I've come up with.

I'm closing this issue as a bad idea.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

jorise7 picture jorise7  路  3Comments

internalfx picture internalfx  路  3Comments

Fastidious picture Fastidious  路  3Comments

jonasfranz picture jonasfranz  路  3Comments