Git-lfs: Add command to undo fetch and checkout

Created on 28 Apr 2016  路  26Comments  路  Source: git-lfs/git-lfs

Git LFS provides control over which files to fetch/cache (lfs.fetchinclude, lfs.fetchexclude) and which files to put in the working dir (checkout <filespec>). But once files are in the cache or working dir there is no (easy) way to undo those operations. I would like a command to change files into placeholders in the working dir and delete cached files.

The goal of this command is to free disk space used by LFS files. It goes much further than prune. I should be able to remove all LFS files from the working dir and cache. By default, the command should protect against data loss by verifying that files exist on the LFS server before deleting, but there should also be an option to skip that check.

I don't know if this should be a new command, multiple new commands, or options added to existing commands. Some ideas...

  • Single new command: scrub, clear, free
  • Multiple new commands: uncheckout and unfetch
  • New options for commands: checkout --placeholder and fetch --clear-cache

Most helpful comment

I think this sounds like a great idea. I think git lfs prune could get some kind of --all option that just basically does rm -rf .git/lfs/objects. "Uncheckout" (the act of replacing LFS tracked files in the working directory with the LFS pointer) should probably be a new command though. I don't like the idea of a command like git lfs checkout doing the opposite given a new flag like git lfs checkout --clear.

All 26 comments

I think this sounds like a great idea. I think git lfs prune could get some kind of --all option that just basically does rm -rf .git/lfs/objects. "Uncheckout" (the act of replacing LFS tracked files in the working directory with the LFS pointer) should probably be a new command though. I don't like the idea of a command like git lfs checkout doing the opposite given a new flag like git lfs checkout --clear.

Thanks for raising this, it's a solid point.

In terms of deleting fetched content in the .git/lfs/objects store, I think that's a valid extension to git lfs prune of anything that would have been omitted at fetch time because of lfs.fetchinclude and lfs.fetchexclude. That could be default behaviour, the rest of prune is just an inverse of fetch with a little date padding to avoid thrashing.

As for resetting what's currently in the working copy back to pointers, you can actually achieve this now using git reset --hard _if_ the object isn't already fetched into .git/lfs/objects. The smudge filter will get invoked and _if_ the object data isn't local already and the fetch is suppressed due to include/exclude settings then it will write the pointer to the working copy instead, over the top of any current content. The only reason this doesn't work as a solution right now is that you have no real way to easily remove all the objects you've already fetched that now match those settings, which the changes above for prune would do.

So I think this doesn't need any extra commands, just an enhancement to the default behaviour of git lfs prune.
[edit]Ha @technoweenie beat me while I was typing 馃槅

FWIW, an "uncheckout" command was requested in https://github.com/github/git-lfs/issues/944#issuecomment-175418615 too. I think I'd prefer having our own documented/tested command, instead of encouraging git reset --hard with caveats that the user has to worry about.

Thanks for the tips on git reset --hard. I can use that right now.

I like the idea of extending git lfs prune and adding "uncheckout". Perhaps git lfs checkin?

I'd favour something like git lfs checkout --clean since it's not really doing the opposite of what checkout normally does, it's just doing it again from scratch with the latest settings.

Hi All, recently I'm evaluating LFS support big binary test data. I'm really like the idea to revert local checkout LFS file to pointer file. Is it available now in latest LFS version? Because we have many large test data, each one is around 2GB, so we want to revert all LFS files to pointer files even if those are latest version. Thanks!

I think people start to ask for more from Git LFS to serve large files on demand such as Google Drive File Stream or Dropbox smart sync despite limited local storage.
Before that, a clumsy way to manually convert the large file back is to use git lfs clean as follows:

$> mv largefile.bin largefile.bin.bak
$> cat largefile.bin.bak | git lfs clean > largefile.bin
$> rm largefile.bin.bak

Those lines can be wrapped as a script command.
Unfortunately, the clean filter needs to compute based on the entire file and is running slow.
Any suggestion to improve the performance would be highly anticipated.

@farleylai Without more serious workarounds, I think that this is the "best" solution for now. That said, I really like the idea of an git-lfs-stash command. "stash" is perhaps confusing as it has other connotations, but something that would remove un-checkout the object and potentially prune it from your local cache.

This is a reasonably sized project, but I would be more than happy to guide you or anyone else through it as an OSS contribution. That said, if nobody takes this on, I'd be happy to add it myself within the next few releases (cc @technoweenie).

@ttaylorr and @swordfly I just figured out a way to check out the pointer file as is without re-computation by untracking the file type temporarily:

git lfs untrack '*.bin'
git checkout largefile.bin
git lfs track '*.bin'

However, I still imagine a handy flag works as follows:
When the flag is set true, the pointer files are always checked out as is with git checkout/fetch/pull. The user must explicitly download the large files with git lfs pull. The recovery is simply to check out again.
Otherwise, git-lfs always materializes the pointers implicitly with with git checkout/fetch/pull.
This essentially means to control the filtering by tracking add/push and untracking checkout/pull.
So far, the closest one is set by running git lfs install --skip-smudge but only works for the first time clone.
A little bit more flexibility would be appreciated.

However, I still imagine a handy flag works as follows:

If I'm understanding your proposal correctly, I think that this is largely accomplishable with the --include and --exclude flags that are provided in Git LFS. If what you're looking for are ways to _by default_ not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

If what you're looking for are ways to by default not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

You can do this by running:

# in your working directory
$ git config --file=.lfsconfig lfs.fetchexclude "*"

Add that .lfsconfig file to your repository, and the default exclude value will be used if no alternate is given (user git config, arguments to git lfs clone or git lfs pull, etc).

@ttaylorr Not exactly. I am aligned with the OP. So the requirement is for git to checkout/recover the pointer files as is in the repo if unchanged or absent in a transparent way. Sure enough, specifying --exlude or lfs.fetchexclude can serve as the hint but git-lfs does not seem to recover the pointer files essentially.

@farleylai Sorry, can you explain what you mean by the phrase "recover the pointer file(s)"? Thanks.

I think he means the reverse of smudge. It sounds like he has a repository with the pointer files already replaced by the actual large files via the smudge filter, and wants to reclaim a little local disk space by removing them.

You can script it, or do it manually right now:

# all LFS files are real
$ git lfs ls-files
9252a75c94 * bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

$ git config lfs.fetchexclude '*'
$ git show HEAD:bin/again.bin > bin/again.bin
$ git lfs pull # no-op because of lfs.fetchexclude

# bin/again.bin is just a pointer
$ git lfs ls-files
9252a75c94 - bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

That's pretty cumbersome, and doesn't remove the file from .git/lfs/objects.

...
$ git show HEAD:bin/again.bin > bin/again.bin
...

is exactly the key command to getting the pointer file back but it seems to require the path matching what is listed by git lfs ls-files. Alternatively, turning off the lfs tracking temporarily and git checkout works as shown earlier in general for files and directories relative to cwd. Ultimately, removing the corresponding lfs files in .git/lfs/objects accordingly is welcome. So the followup question is how to get the lfs object path in .git/lfs/objects corresponding to the oid sha256. Is it sufficient to just delete it?

Yes, LFS will happily re-download the files if they're not in .git/lfs/objects.

@technoweenie Any effort started on this? Not that I need this urgently; I have a script that users run to empty the entire .git/lfs/objects folder and restore every pointer into the working directory. I even have a script for git lfs pull origin <some-large-file>.

This feature isn't urgent because non-tech users don't have huge files (Word documents less than 5MB). Tech users who need to flit between huge files (ISOs, binaries, etc) can already revert huge files to text pointers on their own.

@hannwong no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

Looking for this feature as well and google lead me to this thread, do we it supported ready?

do we it supported ready?

Not yet, but we will make sure to update this issue if/when we do.

I use this one from my git root:

lfs_files=($(git lfs ls-files -n))
for file in "${lfs_files[@]}"; do
  git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file"
done

@fstefanov this will write back the pointer but not delete the cached objects in .git/lfs/objects which still use up disk space. It also makes git think that the file has changed, at least with my version of git (2.21.0).

no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

We're past 2.7 already and we're still using scripts we hacked together to accomplish this.
Any way you guys could devise your own git-lfs certified and approved command for the next release?

My team :heart: git-lfs's ease of use and it'd be great to have these pruning features out-of-the-box.

Great feature, I look forward to it being implemented. Here's the cleanup script I use in the meanwhile based on @fstefanov

!/bin/bash

lfs_files=($(git lfs ls-files -n))
for file in "${lfs_files[@]}"; do
git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file"
done
rm -rf .git/lfs/objects

For only undo checkout, I find the following to be faster and easier.

git lfs uninstall
git lfs ls-files -n | xargs rm
git resotre .
git lfs install

Here is a small program that I've written and been using for a few weeks: https://github.com/hobofan/lfs-unload

Still not a builtin solution, but it's been pretty robust for me so far.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Will-Moshe picture Will-Moshe  路  34Comments

limor-gs picture limor-gs  路  43Comments

IoriBranford picture IoriBranford  路  33Comments

gully picture gully  路  58Comments

jplu picture jplu  路  50Comments