Borg: memory leak in borg export-tar with remote (ssh:) repo

Created on 15 Dec 2020  路  16Comments  路  Source: borgbackup/borg

Archive size: 400-500G (20~30 archive dates, expected restore size of ~90% of the total archive size)

After restoring around 60-70G, RAM usage was going up exponentially (very fast: few minutes?), going from 100-200mb to a few gigs (10-16G) till killed by OOM while the system became unstable as also other things where killed by OOM (when no RAM or SWAP was free anymore).

When RAM went up, it seemed that nothing, or hardly anything was saved to the disk. I had the feeling that nothing was written to disk but the data was kept in memory. (but this last can be a wrong assumption, and might point to a memory leak or a recursion,..)

example vars:

[email protected]
path=/mount/backups/borgarchivename
archive_backup_date=2020-05-01
restore_path=/mount/restored

command used with using the internal ssh of borg:

borg export-tar "${backfrom}:${path}::${archive_backup_date}" - | tar -C $restore_path -x

But using borg over ssh resolved that issue, and only ~200mb was used on the system that had the backup (where we connect to):

ssh ${backfrom} "borg export-tar --tar-filter='gzip -2' \"${path}::${archive_backup_date}\" -" | tar -C $restore_path -xz

The filesystem on source and target is ZFS.
Network transfer was at least 100Mbit, could have been 1Gbit. (transfer speed was for site ~10mb/s, maybe it was 2-3 times faster, but can't remember)

Checked and tested with 1.1.13.

bug

All 16 comments

if you can reproduce the issue, can you also try with 1.1.14 and 1.1.15-beta/rc?

@ThomasWaldmann If I recall correctly, I tested with the latest version that was available 2 months ago when I was testing this. If I'm correct it was 1.1.14.

1.1.15-beta/rc isn't yet available on my system. (I didn't mentioned it since I wasn't sure about it, only that I tried a more recent version that was available at that time.)

OK, this looks interesting. Especially the difference between:

  • borg repo access via ssh (and the tar stream over local pipe)
  • borg repo access locally (and the tar stream over ssh pipe)

That somehow points to an issue in the RemoteRepository code or to how it is used - maybe preloading gone wrong (again)?

Somehow this behaves like the issue fixed in borg 1.1.12.

So @Sephtex are you pretty sure you had this with borg >= 1.1.12 ?

Also, the command lines you gave are complete?

It is important to know whether you tar-exported the whole archive or only parts of the archive.

If you can still reproduce, could you please:

  • first reproduce the issue (watch extremely growing memory needs)
  • apply patch from #5569, then try to reproduce again, using the same borg command (does memory consumption still grow to extremes?)

So @Sephtex are you pretty sure you had this with borg >= 1.1.12 ?

Yes, I'll test this again today with the latest version of borg available on gentoo: 1.1.14

Also, the command lines you gave are complete?

Yes, I only changed the variable names as some where filled in with jinja2 through ansible

It is important to know whether you tar-exported the whole archive or only parts of the archive.

The biggest part of the archive is extracted, it has incremental backups of multiple weeks. But most of the data didn't changed, so I suspect that 90% or more should be copied, which it does when I use borg over ssh.

So when testing with 1.1.14:

  • with command (borg local):
borg export-tar /mount/backups/borgarchivename archive.tar

No issue, the archive went over 80G
Will also test the same with a tarpipe.

  • with command (borg ssh):
borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv

Around 59G extracted, RAM usage went over 4G, but the transfer is still ongoing:

  • I see the data-remote-extract getting bigger
  • RAM usage grows almost with the same speed at some moments

Extra info about size of the original data:

NAME                                               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
zfs-store/lxd/container-persistent/nextcloud/data  76.3G   374G     2.21G    372G             0B         0B
Archive name: 2020-12-14
Archive fingerprint: 353458696428cb766bbd1ec4619e3136c8b65e0da985a8f52da1de36736c31ce
Time (start): Mon, 2020-12-14 03:28:10
Time (end):   Mon, 2020-12-14 08:58:53
Duration: 5 hours 30 minutes 42.61 seconds
Number of files: 176105
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:              413.93 GB            402.10 GB            256.65 MB
All archives:                5.04 TB              4.90 TB            351.35 GB

                       Unique chunks         Total chunks
Chunk index:                  261822              3929117



md5-7988b4d178de93a32e1fe39e75537903



Backup archive size: 302GB
Contents of archive /path/to/archive at [email protected]:
Archive                                 Orig    Comp    Dedup
2020-09-28                              381G    370G    6.1M
2020-10-05                              382G    371G    2.0M
2020-10-12                              382G    371G    11M
2020-10-19                              382G    371G    3.2M
2020-10-26                              382G    371G    2.7M
2020-11-02                              383G    372G    2.2M
2020-11-09                              383G    372G    24M
2020-11-16                              385G    374G    26M
2020-11-23                              385G    374G    1.2M
2020-11-30                              386G    375G    1.9M
2020-12-07                              386G    375G    15M
2020-12-14                              386G    375G    245M

My question about whether you extracted the archive completely related to inclusion / exclusion / strip usage (when used, these usually do not extract completely and the code internally has to work differently).

borg export-tar /mount/backups/borgarchivename archive.tar
borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv

You modified 2 parameters here, local vs ssh repo and file vs. stdout.

@ThomasWaldmann ,

My question about whether you extracted the archive completely related to inclusion / exclusion / strip usage (when used, these usually do not extract completely and the code internally has to work differently).

I extract an archive completely, but I killed it after a point where the extraction starts to use an extensive amount of RAM.

You modified 2 parameters here, local vs ssh repo and file vs. stdout.

Yes I am/was testing the multiple cases to check and prove that 1 doesn't have the same issue as the other.

Yesterday I was still testing with the ssh build inside of borg, and with 1.1.14 I noticed that I seem to get a stacktrace at the point where 1.1.13 started to consume an extensive amount of RAM. So I might not have tested 2 months ago with 1.1.14 since it will probably not have been available yet.

So yesterday with the command:

borg export-tar [email protected]:/path/to/archive::2020-11-02 - | tar -C data-remote-extract -xv

RAM usage of borg was 379MiB, and then suddenly I got:

Local Exception
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 189, in wrapper
    return method(self, args, repository=repository, manifest=manifest, key=key, archive=archive, **kwargs)
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 889, in do_export_tar
    self._export_tar(args, archive, tar)
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 1031, in _export_tar
    tar.addfile(tarinfo, stream)
  File "/usr/lib/python3.7/tarfile.py", line 1972, in addfile
    copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)
  File "/usr/lib/python3.7/tarfile.py", line 247, in copyfileobj
    buf = src.read(bufsize)
  File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2081, in read
    read_data = self._read(nbytes)
  File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2073, in _read
    remaining = self._refill()
  File "/usr/lib/python3.7/site-packages/borg/helpers.py", line 2061, in _refill
    chunk = next(self.chunk_iterator)
  File "/usr/lib/python3.7/site-packages/borg/archive.py", line 251, in fetch_many
    for id_, data in zip(ids, self.repository.get_many(ids, is_preloaded=is_preloaded)):
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 937, in get_many
    for resp in self.call_many('get', [{'id': id} for id in ids], is_preloaded=is_preloaded):
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 812, in call_many
    data = os.read(fd, BUFSIZE)
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 4591, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 4523, in run
    return set_ec(func(args))
  File "/usr/lib/python3.7/site-packages/borg/archiver.py", line 176, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 634, in __exit__
    self.rollback()
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 477, in do_rpc
    return self.call(f.__name__, named, **extra)
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 712, in call
    for resp in self.call_many(cmd, [args], **kw):
  File "/usr/lib/python3.7/site-packages/borg/remote.py", line 812, in call_many
    data = os.read(fd, BUFSIZE)
MemoryError

Platform: Linux GF63-Thin-95C 5.4.48-gentoo #1 SMP Sun Sep 6 11:24:04 CEST 2020 x86_64
Linux: Gentoo Base System 2.7 
Borg: 1.1.14  Python: CPython 3.7.9 msgpack: 0.5.6
PID: 1929  CWD: /path/to/extract/dir/test
sys.argv: ['/usr/lib/python-exec/python3.7/borg', 'export-tar', '[email protected]:/path/to/lxd-container-persistent/nextcloud/data/borg/2020-09::2020-11-02', '-']
SSH_ORIGINAL_COMMAND: None

path/to/file/that/failed.tar.bz2
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

So the point where it failed is that path/to/file/that/failed.tar.bz2, it's just an archive that was created from an installed game. There isn't anything fancy, nor any special characters. I tried twice, and got twice the same result.

MemoryError just means it ran out of available memory.

@Sephtex I am quite confident I have found the fix to the memory leak you discovered.

Can you try the code from PR #5569 - esp. the change in the 3rd commit?

The first commit doesn't apply to your usecase, but likely fixes a memory leak if one does only a partial extraction.

Side note: the code now corresponds to what we already had for a long time in borg extract, so this bug only affects borg export-tar. I also did a global review of the code to find other problematic places, but extract and export-tar are the only users and both are correct now.

@LocutusOfBorg @FelixSchwarz ^^^ maybe interesting for patching stable packages.

Fixes will be also in borg 1.1.15.

Cool, thanks @ThomasWaldmann ! I'll try the patch, and report back when I got it tested!

@ThomasWaldmann , I can confirm that the 3rd commit from #5569 fixed the issue!

Only give or take 50% of RAM is in use by the process now, and the file where an exponential RAM usage was seen in the past without the file even coming through is resolved. The file comes through now!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

unlandm picture unlandm  路  4Comments

tconstans picture tconstans  路  5Comments

enkore picture enkore  路  5Comments

zatricky picture zatricky  路  3Comments

rugk picture rugk  路  3Comments