Borg: What's better, many small repos or one monolithic repo?

Created on 27 Oct 2016 · 7Comments · Source: borgbackup/borg

I want to backup all users' home directories with borg and then sync the repo to S3.
I prefer to have one large repo for this and create archives for each user in same repo. The reason for this is I only want to manage one encryption key. Every time a new user is created, I don't want a new key.
My concern with one large repo is that if it becomes corrupted, all users are affected. Whereas, if I create a repo for each user, corruption can be mitigated to a user.
Is this crazy talk? Hoping someone can tell me, "that's not how borg works, doesn't matter if you use one or 1000 repos."

question

Source

pixelrebel

👍1

Most helpful comment

If you use same AES key for different repos, borg will manage IVs (NONCEs) for the repos independently and same key+iv combination will be used (not in same repo, but in different repos - likely even N times) and this makes the crypto unsecure. So, don't copy/reuse keys.

ThomasWaldmann on 27 Oct 2016

👍2

All 7 comments

it is doable to just use the same key/key file for all user repos,

when a borg repository becomes indeed corrupt, its usually because a segment file got corrupted
and the affected archives are the the ones using data of that segment

so its mainly depends on if your users share data, and how much of it they share

note that a borg repo that is made aware of corrupt segments can correct them the next time it backs up the same data

that said, having many repos is a standard mitigation technique

RonnyPfannschmidt on 27 Oct 2016

When using only one repo you can also save space, since deduplication can also happens between archives. Just remember to set up a cache TTL (default 20) >= number archives to cycle, or performance will suffer! See here.

For that you may want to use something like this before using borg in your script:

export BORG_FILES_CACHE_TTL=$(expr `ls /home/* | wc -l` \* 4)

FabioPedretti on 27 Oct 2016

ThomasWaldmann on 27 Oct 2016

👍2

Another thing to consider is parallel execution. If you have N repos, you can run N backups in parallel or overlapping.

ThomasWaldmann on 27 Oct 2016

Thanks for the tips! I think I will run borg unencrypted and let rclone handle the encryption.

@ThomasWaldmann Do you suggest I feed my home directories into parallel borg create...? Would that make for fragmented writes on the disk? Does that even matter for backup data?

pixelrebel on 28 Oct 2016

Well, if you have a lot of source directories on different hosts, feeding them all into 1 repo might give a timing / serialization problem / total runtime problem. Also, the multiple clients will need to resync their chunks cache often. If you use multiple repos, you can run N in parallel in likely less time.

If all homes are on same host, i guess it is better to just use one repo (except if they are exceptionally huge or contain exceptionally many non-duplicate files).

ThomasWaldmann on 28 Oct 2016

In my case, I would like to have different systemd timers that fire borg at different frequencies for different folders (and also have different pruning strategies). However, this means that borg instances fired by different timers could be asking for repo access at the same time, so some instances get the access and others fail with "could not aquire lock".

I believe this is a use case for multiple "thematic" repo's