Archive size: 400-500G (20~30 archive dates, expected restore size of ~90% of the total archive size)
After restoring around 60-70G, RAM usage was going up exponentially (very fast: few minutes?), going from 100-200mb to a few gigs (10-16G) till killed by OOM while the system became unstable as also other things where killed by OOM (when no RAM or SWAP was free anymore).
When RAM went up, it seemed that nothing, or hardly anything was saved to the disk. I had the feeling that nothing was written to disk but the data was kept in memory. (but this last can be a wrong assumption, and might point to a memory leak or a recursion,..)
example vars:
[email protected]
path=/mount/backups/borgarchivename
archive_backup_date=2020-05-01
restore_path=/mount/restored
command used with using the internal ssh of borg:
borg export-tar "${backfrom}:${path}::${archive_backup_date}" - | tar -C $restore_path -x
But using borg over ssh resolved that issue, and only ~200mb was used on the system that had the backup (where we connect to):
ssh ${backfrom} "borg export-tar --tar-filter='gzip -2' \"${path}::${archive_backup_date}\" -" | tar -C $restore_path -xz
The filesystem on source and target is ZFS.
Network transfer was at least 100Mbit, could have been 1Gbit. (transfer speed was for site ~10mb/s, maybe it was 2-3 times faster, but can't remember)
Checked and tested with 1.1.13.
if you can reproduce the issue, can you also try with 1.1.14 and 1.1.15-beta/rc?
@ThomasWaldmann If I recall correctly, I tested with the latest version that was available 2 months ago when I was testing this. If I'm correct it was 1.1.14.
1.1.15-beta/rc isn't yet available on my system. (I didn't mentioned it since I wasn't sure about it, only that I tried a more recent version that was available at that time.)
OK, this looks interesting. Especially the difference between:
That somehow points to an issue in the RemoteRepository code or to how it is used - maybe preloading gone wrong (again)?
Somehow this behaves like the issue fixed in borg 1.1.12.
So @Sephtex are you pretty sure you had this with borg >= 1.1.12 ?
Also, the command lines you gave are complete?
It is important to know whether you tar-exported the whole archive or only parts of the archive.
If you can still reproduce, could you please:
So @Sephtex are you pretty sure you had this with borg >= 1.1.12 ?
Yes, I'll test this again today with the latest version of borg available on gentoo: 1.1.14
Also, the command lines you gave are complete?
Yes, I only changed the variable names as some where filled in with jinja2 through ansible
It is important to know whether you tar-exported the whole archive or only parts of the archive.
The biggest part of the archive is extracted, it has incremental backups of multiple weeks. But most of the data didn't changed, so I suspect that 90% or more should be copied, which it does when I use borg over ssh.
So when testing with 1.1.14:
borg export-tar /mount/backups/borgarchivename archive.tar
No issue, the archive went over 80G
Will also test the same with a tarpipe.
borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv
Around 59G extracted, RAM usage went over 4G, but the transfer is still ongoing:
Extra info about size of the original data:
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
zfs-store/lxd/container-persistent/nextcloud/data 76.3G 374G 2.21G 372G 0B 0B
Archive name: 2020-12-14
Archive fingerprint: 353458696428cb766bbd1ec4619e3136c8b65e0da985a8f52da1de36736c31ce
Time (start): Mon, 2020-12-14 03:28:10
Time (end): Mon, 2020-12-14 08:58:53
Duration: 5 hours 30 minutes 42.61 seconds
Number of files: 176105
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
Original size Compressed size Deduplicated size
This archive: 413.93 GB 402.10 GB 256.65 MB
All archives: 5.04 TB 4.90 TB 351.35 GB
Unique chunks Total chunks
Chunk index: 261822 3929117
md5-7988b4d178de93a32e1fe39e75537903
Backup archive size: 302GB
Contents of archive /path/to/archive at [email protected]:
Archive Orig Comp Dedup
2020-09-28 381G 370G 6.1M
2020-10-05 382G 371G 2.0M
2020-10-12 382G 371G 11M
2020-10-19 382G 371G 3.2M
2020-10-26 382G 371G 2.7M
2020-11-02 383G 372G 2.2M
2020-11-09 383G 372G 24M
2020-11-16 385G 374G 26M
2020-11-23 385G 374G 1.2M
2020-11-30 386G 375G 1.9M
2020-12-07 386G 375G 15M
2020-12-14 386G 375G 245M
My question about whether you extracted the archive completely related to inclusion / exclusion / strip usage (when used, these usually do not extract completely and the code internally has to work differently).
borg export-tar /mount/backups/borgarchivename archive.tar
borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv
You modified 2 parameters here, local vs ssh repo and file vs. stdout.
@ThomasWaldmann ,
My question about whether you extracted the archive completely related to inclusion / exclusion / strip usage (when used, these usually do not extract completely and the code internally has to work differently).
I extract an archive completely, but I killed it after a point where the extraction starts to use an extensive amount of RAM.
You modified 2 parameters here, local vs ssh repo and file vs. stdout.
Yes I am/was testing the multiple cases to check and prove that 1 doesn't have the same issue as the other.
Yesterday I was still testing with the ssh build inside of borg, and with 1.1.14 I noticed that I seem to get a stacktrace at the point where 1.1.13 started to consume an extensive amount of RAM. So I might not have tested 2 months ago with 1.1.14 since it will probably not have been available yet.
So yesterday with the command:
borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv
RAM usage of borg was 379MiB, and then suddenly I got:
Local Exception
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 176, in wrapper
return method(self, args, repository=repository, **kwargs)
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 189, in wrapper
return method(self, args, repository=repository, manifest=manifest, key=key, archive=archive, **kwargs)
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 889, in do_export_tar
self._export_tar(args, archive, tar)
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 1031, in _export_tar
tar.addfile(tarinfo, stream)
File "/usr/lib/python3.7/tarfile.py", line 1972, in addfile
copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
File "/usr/lib/python3.7/tarfile.py", line 247, in copyfileobj
buf = src.read(bufsize)
File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2081, in read
read_data = self._read(nbytes)
File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2073, in _read
remaining = self._refill()
File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2061, in _refill
chunk = next(self.chunk_iterator)
File "/usr/lib/python3.7/site-packages/borg/archive.py", line 251, in fetch_many
for id_, data in zip(ids, self.repository.get_many(ids, is_preloaded=is_preloaded)):
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 937, in get_many
for resp in self.call_many('get', [{'id': id} for id in ids], is_preloaded=is_preloaded):
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 812, in call_many
data = os.read(fd, BUFSIZE)
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 4591, in main
exit_code = archiver.run(args)
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 4523, in run
return set_ec(func(args))
File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 176, in wrapper
return method(self, args, repository=repository, **kwargs)
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 634, in __exit__
self.rollback()
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 477, in do_rpc
return self.call(f.__name__, named, **extra)
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 712, in call
for resp in self.call_many(cmd, [args], **kw):
File "/usr/lib/python3.7/site-packages/borg/remote.py", line 812, in call_many
data = os.read(fd, BUFSIZE)
MemoryError
Platform: Linux GF63-Thin-95C 5.4.48-gentoo #1 SMP Sun Sep 6 11:24:04 CEST 2020 x86_64
Linux: Gentoo Base System 2.7
Borg: 1.1.14 Python: CPython 3.7.9 msgpack: 0.5.6
PID: 1929 CWD: /path/to/extract/dir/test
sys.argv: ['/usr/lib/python-exec/python3.7/borg', 'export-tar', '[email protected]:/path/to/lxd-container-persistent/nextcloud/data/borg/2020-09::2020-11-02', '-']
SSH_ORIGINAL_COMMAND: None
path/to/file/that/failed.tar.bz2
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
So the point where it failed is that path/to/file/that/failed.tar.bz2, it's just an archive that was created from an installed game. There isn't anything fancy, nor any special characters. I tried twice, and got twice the same result.
MemoryError just means it ran out of available memory.
@Sephtex I am quite confident I have found the fix to the memory leak you discovered.
Can you try the code from PR #5569 - esp. the change in the 3rd commit?
The first commit doesn't apply to your usecase, but likely fixes a memory leak if one does only a partial extraction.
Side note: the code now corresponds to what we already had for a long time in borg extract, so this bug only affects borg export-tar. I also did a global review of the code to find other problematic places, but extract and export-tar are the only users and both are correct now.
@LocutusOfBorg @FelixSchwarz ^^^ maybe interesting for patching stable packages.
Fixes will be also in borg 1.1.15.
Cool, thanks @ThomasWaldmann ! I'll try the patch, and report back when I got it tested!
@ThomasWaldmann , I can confirm that the 3rd commit from #5569 fixed the issue!
Only give or take 50% of RAM is in use by the process now, and the file where an exponential RAM usage was seen in the past without the file even coming through is resolved. The file comes through now!