Restic: Question: Why is restic prune slow?

Created on 5 Oct 2017  路  2Comments  路  Source: restic/restic

If this is documented somewhere, please redirect me. If it isn't documented, I would be happy to contribute. I started reading the code, but haven't obviously digested it all.

Output of restic version

restic version
restic 0.7.3
compiled with go1.9 on linux/amd64

How did you run restic exactly?

restic backup --files-from .restic-include --exclude-file .restic-exclude # about 25gb
restic forget --keep-last 1 --prune # this is fast
restic prune # slow, ran for over an hour
counting files in repo
building new index for repo
[0:26] 100.00%  430 / 430 packs
repository contains 430 packs (6430 blobs) with 1.937 GiB bytes
processed 6430 blobs: 0 duplicate blobs, 0B duplicate
load all snapshots
find data that is still in use for 2 snapshots
  Interrupt received, cleaning up

RESTIC_REPOSITORY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and RESTIC_PASSWORD are set.

What backend/server/service did you use?

AWS S3

Expected behavior

Backup and only keep 1 copy. Want to verify what is expected performance of restic prune and when it is needed to be run - which I thought was when you want to only store what you have in your repo (forgive me if I get jargon incorrect, new user)

I would have expected assumed that running restic forget [options] --prune and restic forget [options] && restic prune would have taken about the same amount of time.

Actual behavior

Expected behavior happened, but the lone restic prune took a long time. Left it for an hour and didn't register a % complete after rebuilding the index completed successfully.

Steps to reproduce the behavior

Backup about 25gb of data on S3 backend and run restic prune.

Do you have any idea what may have caused this?

I eliminated noisy log, cache, tmp, and db dirs so this was a realistic test.

Do you have an idea how to solve the issue?

No - see questions below.

Questions

1) Is restic prune expected to take a LONG time with S3 backend & 25gb of data?
2) When do you realistically need to run prune? Only if you forget to add --prune on the forget command?
3) Am I expecting the moon here? Or should I be thinking about this differently?

Love the project!

questioproblem

Most helpful comment

Hi, welcome to the project! In order to understand what's going on, let me give you a bit of background: When archiving files, restic splits them into smaller "blobs", then bundles these blobs together to "pack files" and uploads the files to the repo. Metadata such as filename, directory structure etc. is converted to a JSON document and also saved there (bundled together in a pack file).That's what's contained in the repo below the data/ directory. At the end of the backup run, it uploads an "index" file (stored below index/) and a "snapshot" file (in snapshots). The index file contains a list of the pack file names and their contents (which blobs are stored in a file and where). At the start, restic loads all index files and then knows which blobs are are already saved.

When you run forget it just removes the really small file in snapshots, so that operation is fast. In your case, it didn't even remove anything, because there's just one snapshot and you specified --keep-last 1. For this reason, the prune command wasn't even run although you specified forget --prune, restic figured out there's nothing to do because no snapshot was removed.

When you run prune manually on the other hand, it'll gather the list of all pack files, read the headers to discover what's in each file, then starts traversing all snapshots to build a list of all still-referenced blobs, then repacks these blobs into new pack files, uploads a new index (removing the others) and finally removes the pack files that are now unneeded. When prune is run manually, it will run through the whole process. This will also clean up files left over from aborted backups.

There are several steps in the prune process that are slow, most notably building the list of referenced blobs, because that'll incrementally load single blobs from the repo, and for a remote repo that'll take a lot of time. The prune operation is also the most critical one in the whole project: One error there means data loss, and we're trying hard to prevent that. So there are several safeguards, and the process is not yet optimized well. We'll get to that eventually.

In order to address this and make prune much faster (among others), we've recently added a local cache which keeps all metadata information, the index files, and the snapshot files locally (encrypted of course, just simple copies of files that are in the repo anyway). Maybe you can re-try (ideally with a new repo) using the code in the master branch. That'd speed up prune a lot.

I'm going to close this issue for now (since your question is answered), please feel free to add further comments. There's also the forum at https://forum.restic.net, which may be better suited for asking such questions :)

All 2 comments

Hi, welcome to the project! In order to understand what's going on, let me give you a bit of background: When archiving files, restic splits them into smaller "blobs", then bundles these blobs together to "pack files" and uploads the files to the repo. Metadata such as filename, directory structure etc. is converted to a JSON document and also saved there (bundled together in a pack file).That's what's contained in the repo below the data/ directory. At the end of the backup run, it uploads an "index" file (stored below index/) and a "snapshot" file (in snapshots). The index file contains a list of the pack file names and their contents (which blobs are stored in a file and where). At the start, restic loads all index files and then knows which blobs are are already saved.

When you run forget it just removes the really small file in snapshots, so that operation is fast. In your case, it didn't even remove anything, because there's just one snapshot and you specified --keep-last 1. For this reason, the prune command wasn't even run although you specified forget --prune, restic figured out there's nothing to do because no snapshot was removed.

When you run prune manually on the other hand, it'll gather the list of all pack files, read the headers to discover what's in each file, then starts traversing all snapshots to build a list of all still-referenced blobs, then repacks these blobs into new pack files, uploads a new index (removing the others) and finally removes the pack files that are now unneeded. When prune is run manually, it will run through the whole process. This will also clean up files left over from aborted backups.

There are several steps in the prune process that are slow, most notably building the list of referenced blobs, because that'll incrementally load single blobs from the repo, and for a remote repo that'll take a lot of time. The prune operation is also the most critical one in the whole project: One error there means data loss, and we're trying hard to prevent that. So there are several safeguards, and the process is not yet optimized well. We'll get to that eventually.

In order to address this and make prune much faster (among others), we've recently added a local cache which keeps all metadata information, the index files, and the snapshot files locally (encrypted of course, just simple copies of files that are in the repo anyway). Maybe you can re-try (ideally with a new repo) using the code in the master branch. That'd speed up prune a lot.

I'm going to close this issue for now (since your question is answered), please feel free to add further comments. There's also the forum at https://forum.restic.net, which may be better suited for asking such questions :)

Thanks for the explanation! I'll checkout the forum - don't know how I missed it. I'll also start running off of master.

Thanks for taking the time to reply.

Was this page helpful?
0 / 5 - 0 ratings