Gitea: Archive/Download do not include LFS files or Submodules

Created on 23 Aug 2018  路  13Comments  路  Source: go-gitea/gitea

  • Gitea version (or commit ref): 1.5.0
  • Git version: 2.16.4
  • Operating system: centos 7
  • Database (use [x]):

    • [ ] PostgreSQL

    • [ ] MySQL

    • [ ] MSSQL

    • [x] SQLite

  • Can you reproduce the bug at https://try.gitea.io:

    • [ ] Yes (provide example URL)

    • [ ] No

    • [x] Not relevant

  • Log gist: Downloads of LFS files

Description

With git-lfs installed and enabled on both the gitea server host and the client host, LFS controlled files do not get added properly to the .zip or .tar.gz files when:

  • Using the Download Repository button
  • Downloading a release

Instead of the expected file in the .zip or .tar.gz, a text file of the same name is placed in the file.

The rest endpoint also functions in the same way.

GET /repos/{owner}/{repo}/raw/{filepath}

In other respects, git-lfs works as expected when using git command line to interact with the repo.

Screenshots

Text files look like this:

version https://git-lfs.github.com/spec/v1
oid sha256:a7da80fc96bc0dd73ea0416fda5dfe1321910517634d4b142903a9fbab24f196
size 1465634
kinbug revieweconfirmed

Most helpful comment

I suspect they rewrote the command. Back in November 2018 https://github.com/git-lfs/git-lfs/issues/1322#issuecomment-426822783 states that they didn't include lfs files (and likely submodules) in their zips.

I think that's what we're going to have to do unfortunately.

All 13 comments

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.

This issue has been automatically closed because of inactivity. You can re-open it if needed.

Are there any updates, fixes or thoughts on how to approach this issue?

We would love to use Gitea and its API to download releases directly onto deployment servers and end users, but Gitea not including any LFS objects to the downloads is a huge problem. Using git to clone the repository is not an option as we cannot mandate our customers to install any extra software.

So there is at least now a GET /repos/{owner}/{repo}/media/{filepath} endpoint which means that you can get the actual lfs'd data.

Could you give me some information as to how you create the zips - I don't immediately know where to look to find the code that creates them.

7209 might be related

@schmittlauch I'm not certain. I would have to dive to see how these zips are created.

My suspicion is that these zips do not even attempt to dereference the LFS pointers whereas on #7209 your problem is different.

OK so yeah #7209 is not relevant to this.

The issue is that we use git archive to create these archives. That doesn't include submodules either - so I think this needs a complete rethink.

And how github archive did that?

I suspect they rewrote the command. Back in November 2018 https://github.com/git-lfs/git-lfs/issues/1322#issuecomment-426822783 states that they didn't include lfs files (and likely submodules) in their zips.

I think that's what we're going to have to do unfortunately.

This again leads to the slightly annoying issue whereby we don't know what files are LFS files except by reading them and checking if they're a pointer or not.

Similarly we need to do this zipping in the context of the current user and repository. In the case of submodules - it's conceivable that the zip that one user downloads may not be the same as the zip another user gets - I guess that's ok but it means caching these might be difficult unless we cache them with the associated permission state.

Finally we must be very careful indeed about which submodules we're happy to include, if any - perhaps just allow those that are local to the gitea instance?

Just wanted to add my two cents to this conversation.
Without this feature there is not much point in using git lfs at all. All the development work happens using lfs and when a production version is produced, it contains lots of nasty surprises in blank pointer files.
My current approach is to not use lfs at all and handle big binaries separately to git. This creates a lot of extra mess that would be much easier if I could just download the archive.

I know it wouldn't be as fast as git archive, but could gitea just checkout the repository to a temp directory and archive that? that way any smudge filters and submodules would be handled without directly having to handle them. It could even cache results for the head of the main branch to prevent it from having to run multiple times on subsequent downloads

Gitea checking out the repository is not a good idea.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mirhec picture mirhec  路  3Comments

thehowl picture thehowl  路  3Comments

Fastidious picture Fastidious  路  3Comments

jonasfranz picture jonasfranz  路  3Comments

lunny picture lunny  路  3Comments