Borg: Scalability questions

Created on 31 Aug 2019  路  6Comments  路  Source: borgbackup/borg

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Question

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.1.10

Operating system (distribution) and version.

Debian 9

Hardware / network configuration, and filesystems used.

different NFS for source (data) and destination (repo)

How much data is handled by borg?

60Tb

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

My question is threefold:

First and foremost, is borg recommended for 60Tbs (and growing) amounts of data?

Is it possible to contain all backups within a single archive of a single repo?

Does dangerous situations occur if a nightly script happens to trigger a borg create backup without checking if existing ones are already running?

question

All 6 comments

  1. I do not administer any system that has that much data, so other users might be more qualified to answer this. In general, it might also depend on the file count and on system/network performance and resources.

  2. When using borg, one borg create run (aka "a backup") creates one new, additional borg archive in a borg repository.

    You can have multiple repositories (and that is advisable, esp. if you run multiple borg clients and if your preference is on performance rather than on space saving).

    You can have a few or many archives within a single repository (but maybe avoid having too many archives as that might impact performance of some operations).

  3. No, because borg locks the repository against concurrent access. There is a --lock-wait (or so) option, which defaults to few seconds. borg will wait that time for the lock to go away. If it goes away in time, borg will acquire the lock and do its operation. If not, it will terminate with non-success error level.

One thing you maybe should consider is to run the borg client on the system that has local access to the data (not via NFS).

Also, consider running the borg "server" (borg serve invoked via ssh from the client) on the system that has local access to the repository (not via NFS).

Guess performance will be much better and maybe also reliability.

As long as a filesystem works correctly and reliable, there is no problem for borg to use it. But network filesystems introduce quite some overhead and have more issues than (good) local filesystems.

Thanks for the elaborate responses @ThomasWaldmann

One thing I wanted to clarify is about my second question is regarding borg appending to an existing archive. The reason I want to know if it's possible to backup succeeding backups to incrementally add to a single, large archive is to know if my less tech-inclined client would be able to mount the archive themselves and find the missing files.

It's not a big issue but would help discover files better across several backup runs. If this isn't possible I plan on using the borg create together with ::Backup-$(date '+%Y-%m-%d'). Any comments on this? :)

Greatly appreciate the tip for executing borg ssh directly on the storage device (my NAS). I will experiment with both setups to check out the performance :)

You can not "append" to an existing backup archive.

You always create a new full archive, but it will feel like an incremental backup speed-wise because it only needs to transfer/store the new chunks.

borg mount makes it relatively easy to find what one is searching for, but be careful with scalability especially with that.

One usually puts the date/time into the archive name, yes. Maybe also machine name and dataset name (if there are multiple different ones per machine).

but be careful with scalability especially with that.

Care to elaborate? :)

If you'ld put huge amount of files (like 60TB in small files) and a big amount of archives into one repository and then mount the repository, that could use quite some time and memory resources (esp. if you then trigger reading metadata of all or many archives by visiting the corresponding toplevel directory).

It is possible to only mount a single archive though or even only part of a single archive, but in general, reading a lot of metadata takes a lot of time.

Similar to tar, borg archives are a sequential stream (of metadata only, no data in case of borg).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TinajaLabs picture TinajaLabs  路  6Comments

enkore picture enkore  路  5Comments

htho picture htho  路  5Comments

tconstans picture tconstans  路  5Comments

pierreozoux picture pierreozoux  路  4Comments