Borg: Borg to compress partclone images

Created on 5 Dec 2016  路  19Comments  路  Source: borgbackup/borg

I'm trying to save space on server backups using borg but the savings are practically nonexistent...

I use lvm to take a snapshot of the mounted root file system, partclone to create an image of the snapshot (to get a backup file which size is about that of the files in the root partition) and finally borg to compensate for the lack of differential/incremental features of partclone... alas borg doesn't seem to work with it.

I read that VM snapshots were supported and I thought that this could be similar. I also tried to increase the number of chunks, to little benefit...

Do you have any advice?

All 19 comments

please provide a little more details on your workflow, as well as outputs of borg info before/after the increments you do

The packing done by partclone likely eliminates dedupable runs, or the packing could also be not deterministic ... try it without partclone.

Thanks!

@enkore
Would the chunks be completely different between runs if the images had just a tiny difference at the beginning of the file?

@RonnyPfannschmidt

backup commands sample

lvcreate --size 1G -s -n root_2016-12-05_03-00 /dev/vg/root

partclone.ext4 -c -s /dev/vg/root_2016-12-05_03-00 -o /mnt/backup/partclone/root_2016-12-05_03-00.img

borg create --chunker-params=10,23,16,4095 /mnt/backup/borg::root_2016-12-05_03-00 /mnt/backup/partclone/root_2016-12-05_03-00.img

borg list

root_2016-12-02_10-34                Fri, 2016-12-02 10:43:14
root_2016-12-03_03-00                Sat, 2016-12-03 03:09:44
root_2016-12-04_03-00                Sun, 2016-12-04 03:09:40
root_2016-12-05_03-00                Mon, 2016-12-05 03:09:41

borg info

Name: root_2016-12-05_03-00
Username: root
Time (start): Mon, 2016-12-05 03:09:41
Time (end):   Mon, 2016-12-05 03:32:02
Command line: /usr/bin/borg create --chunker-params=10,23,16,4095 /mnt/backup/borg::root_2016-12-05_03-00 /mnt/backup/partclone/root_2016-12-05_03-00.img
Number of files: 1

                       Original size      Compressed size    Deduplicated size
This archive:               20.08 GB             20.08 GB             19.86 GB
All archives:               78.14 GB             78.15 GB             77.49 GB

                       Unique chunks         Total chunks
Chunk index:                  200475               223991

Ah! I need to use partclone to not have backups that weigh as the size of the partition I'm backing up (e.g., dd on a 1 TB partition snapshot, generates a 1 TB file)

Would the chunks be completely different between runs if the images had just a tiny difference at the beginning of the file?

No, differences would have to be all over. Eg. one flipped bit every one MB would be sufficient.

Normally deduplication will already handle unallocated areas quite well, so there's likely no real need for partclone.

@enkore
So it doesn't work as I think it does...

File A
1 2 3 4 5 6 7 8

File B
2 3 4 5 6 7 8 9

No chunk, whatever the size, is the same.

@wc-matteo borg uses variable sized chunks and cuts chunks as indicated by content hash, so even a shifting content as you pictured would work ok.

About needing partclone to not generate a huge output file: you can pipe content into borg or let borg directly read special files, read about it there:

Thanks @ThomasWaldmann
I need to read about variable sized chunks/content hash splitting then.

I saw --read-special but I thought I wanted something easy to restore from bare-metal (partclone images should be compatible with clonezilla)... but having already introduced borg into the mix, I guess it doesn't make a big difference now to do what you suggest.

borg extract --stdout /path/to/repo::arch dev/vg0/home-snapshot > /dev/vg0/home

I don't completely understand this command.

Do you have to create both dev/vg0/home-snapshot and /dev/vg0/home before? Why can't you output directly to the block device you want restored (/dev/vg0/home)? This copies one file at a time? The step where you restore the LV sizes as they were is not _really_ necessary, it'd work for any size able to contain the files, right?

"dev/vg0/home-snapshot" refers to a file in the borg archive called "arch".
due to --stdout, it won't be extracted as a file into the filesystem, but all its contents will be output via stdout, which is redirected into the device file /dev/vg0/home, which needs to exist beforehands and have the right size.

The step with restoring the LV sizes is (more or less) necessary to create a precise fit. Of course, you could have a larger LV than the archived content, but then you'ld waste space.

Ah! So with _read special_ you read the block device contents for the purposes of chunking but you still _refer_ to it as a single file?

The LV size should also include free space. If you have a 1 TB LV with just 10 GB of it actually used, could you restore to a new 10 GB LV? I mean, what borg does is output every file (in this case the target LV would need to have a filesystem) or...?

Ah! So with read special you read the block device contents for the purposes of chunking but you still refer to it as a single file?

Correct. Like the help says, normally Borg won't read block or character devices (and entirely ignore leaked objects鹿), but rather store the mknod information of them. With --read-special any kind of readable FS object is read -- it's contents are read --, including block devices like disks, partitions, VGs etc., even pipes or sockets. 虏

Which is where we come to your next question

The LV size should also include free space. If you have a 1 TB LV with just 10 GB of it actually used, could you restore to a new 10 GB LV? I mean, what borg does is output every file (in this case the target LV would need to have a filesystem) or...?

Since Borg treats the LV as a file in this mode it's read like a file. You will need a 1 TB LV to restore 1 TB LV, unless you want to truncate it. The LV could be sparse, though.

If you routinely have cases with very large used-to-available/actual-size ratios it might make more sense to mount the LV snapshot and create a backup from that, which would also allow Borg's change detection to kick in, which is much faster than chunking (because it's metadata based).

Borg is used by quite some people to backup VMs, usually their disk images are backed up directly (at something like ~400 MB/s deduplication rate that's fine for a couple -- or a couple dozen -- GB, but at 1+ TB it kinda takes a while, especially if there are actually only a couple gig of actual payload in there).


鹿 UDS (unix domain sockets), doors, fifos and so on.
虏 You probably could even do tar | netcat on one host and netcat >fd, borg /proc/.../fd on another.

Thank you very much @enkore. So informative (and informed)! 馃憤

Thinking about a bare-metal restore scenario, would it make much difference to have a single LV file as with --read-special vs. just the contents as with the mounted snapshot? I guess it would be faster with the single LV file, as with the other method you first have to create a file system and then copy all the files... anything else?

Unless the FS is very full or has a great many small files I'd assume that backing the contents up (mounting the snapshot) and restoring onto a fresh partition would be faster, since only the file contents have to be read (backup) / written (restore), not any "unused" space (not that Borg would know) -- even though Borg will dutifully deduplicate unallocated areas, it will still have to write them out to disk when restoring.

Also when mounting the snapshot the metadata-based change detection works and will speed backups up.

metadata change = file size + file last mod?

and if you used a sparse file for the LV file restore? Could it be done?

metadata change = file size + file last mod?

+ inode

and if you used a sparse file for the LV file restore? Could it be done?

Yes, that should work. It can still have some overhead due to areas that were used but are now deallocated, which wouldn't be zeroed (but still deduplicated), so they are not sparse in a technical sense. Not a 100% sure if the Linux IO/LVM stack has some additional smarts here (Perhaps internal TRIM?)

Reading through sparse areas -- which ofc. can only happen if the LV itself is sparse, I assume -- is quite a bit faster, at least with regular sparse files, since no IO is involved, hence no iowait.

Thanks again @enkore!
I'll try one (or both) of the solutions and maybe post the results.

@wc-matteo, did any of the solutions work for you?

I didn't test them alas... I ended up buying... a commercial product. Much more reliable and manageable than something you can come up with combining a bunch of scripts.

The only advice I can give is do NOT use partclone. Not only does it not play well with borg, but I had a lot of trouble mounting one of the images to restore a file.

closing this. if partclone does not dedup well, it is due to the structure of partclone output - outside of the scope of borg.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

anarcat picture anarcat  路  4Comments

tconstans picture tconstans  路  5Comments

phdoerfler picture phdoerfler  路  6Comments

auanasgheps picture auanasgheps  路  5Comments

htho picture htho  路  5Comments