Borg: Possible borg/ssh/network issue

Created on 20 Feb 2017  路  7Comments  路  Source: borgbackup/borg

Hello,

This is my first post in borgbackup. So, first and foremost, I would like to thank Thomas Waldmann and the Borg Collective for such a great backup tool. Borg Backup is hands down the best backup tool I have used in years, is relatively easy to set up and has proven reliable. Can't ask for more than that :)

I have an issue, however, but am not sure if it Borg backup or ssh/network related. So, I would appreciate other pairs of eyes on it.

I run Borg on the backup server only and access client folders using sshfs. The server has 2GB of memory and 2 CPUs. For full spec, it is a Digital Ocean $20 per month droplet with enough Block Storage attached for my needs.

One of my backups is a remote directory of 70GB in size. It is is a different data centre from my backup server and on another continent. I ran my first backup of it recently, which took some time. I tested restores successfully .

However, here are the stats from the weekend run:

Time (start): Sun, 2017-02-19 21:46:52
Time (end):   Mon, 2017-02-20 03:15:48
Command line: /usr/bin/borg create -v /borg/xxxx::2017-February-19 /mnt/xxxx /mnt/yyy (names edited out)
Number of files: 307

                       Original size      Compressed size    Deduplicated size
This archive:               40.53 GB             40.53 GB            102.83 MB
All archives:              283.72 GB            283.73 GB             41.10 GB

                       Unique chunks         Total chunks
Chunk index:                   17877               112040

As you can see, this looks like it took 5 and a half hours to run, but the deduplicated size is only 102 MB. Am I correct ? Also, is there anything I can do to speed this up ?

Thanks for any help.

question

All 7 comments

Only 307 files. I think we can rule out a very low ping!?

My guess would be, that the Borg-cache is not being used for some reason.
Is the remote-path always the same? No changing id or something like that?

Thanks. Is there any way to verify whether the cache is being used ?
If you are talking about ~/root/.cache/borg on the backup server, that does exist and has data in it. Is there any way to explicitly tell Borg to use this ?

Or do you mean something else ?

Thanks ;)

Can you explain a bit how you run borg? From the command line it looks like the backup data is read from /mnt/xxxx. Do you run the "borg create" on the storage server and mount the data of the remote server with the 70 GB via sshfs/some network filesystem?

In this case:

  • Running "borg create" where the data is, is always the most efficient way (I/O-bandwidth-wise) to run things (this would be were the 70 GB are)

    • But it also means that both servers will need to synchronize their caches, which may take extra time (linear to number of archives).

  • When using sshfs and other networked FSes certain metadata is never stable, like inode IDs. Try the --ignore-inode option, which should speed things up (but may make the change detection less accurate, so preferably don't use it unless really needed)

I am running borg via a shell script on the backup server itself and in no other way. Every backup is done via ssh to the clients. For environmental reasons, I don't want to install Borg on any clients. All the work is done by and on the backup server.

Here is an extract from my shell script (with confidential information edited out):

sshfs -o nonempty 1.2.3.4:/somedir /mnt/somedir
sshfs -o nonempty 1.2.3.4:/someotherdir /mnt/someotherdir
borg create -v /borg/somerepo::$DATE(set elsewhere in the script) /mnt/somedir /mnt/someotherdir >> $LOG (set elsewhere in the script)
for i in borg list /borg/somerepo | awk '{print $1}' | tail -1
do
borg info /borg/somerepo::$i >> $LOG
done
fusermount -u /mnt/somedir
fusermount -u /mnt/someotherdir
borg prune -v /borg/somerepo --keep-within=7d --keep-weekly=4 --keep-monthly=3

This appends all the information I want to a log, which is emailed to me once the script completes. This way, I only get one backup email every night, as opposed to the dozens (and hundreds) of emails I used to get with other backup tools.

Every time I add new folders from a client to the backup, I simply add another snippet to the shell script, which runs under cron.

Is this enough information ? Please let me know if you need more. Thanks.

I will hold off using the --ignore-inode option until I hear from you, as I don't want to make this any less reliable.

I have such a high opinion of Borg and such good experiences with it (apart from this one issue) that I am now moving to use it as my sole Production backup tool and get rid of everything else.

To expand on enkore's point, imagine a large file has one byte added to the end. With your set-up using sshfs, the backup client running on the server will have to read the entire file over the network and chunk it in order to see that only the last chunk needs to be added to the repository. If, on the other hand, the client runs on the same machine as the files, the chunking happens locally, and only the one new chunk goes over the network. So you should really consider running the client locally. It is pretty easy to do with the pre-compiled standalone binaries.

Thanks. That does make sense. I will look into doing this on each client with a remote repository on the backup server itself. I will, however, have to make some other changes in the environment in order to make this bird fly.

Ok, done for the same client I referenced above and this is much more like it -
Time (start): Tue, 2017-02-21 13:28:01
Time (end): Tue, 2017-02-21 13:28:02
Duration: 0.14 seconds

Number of files: 304

                  Original size      Compressed size    Deduplicated size

This archive: 40.53 GB 40.53 GB 698.56 kB
All archives: 81.06 GB 81.07 GB 40.47 GB

                  Unique chunks         Total chunks

Chunk index: 15964 32095

What's not to love ?

I also tested restoring on both the server with the Borg repo and the client itself and both restores took minutes rather than the hour I was getting previously.

Thank you all very much.

Was this page helpful?
0 / 5 - 0 ratings