Borg: `borg init` failing on Raspbian (1.1.9) with 64-bit kernel

Created on 22 Dec 2019  路  51Comments  路  Source: borgbackup/borg

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG / ISSUE

System information. For client/server mode post info for both machines.

Using an external drive USB drive on a Raspberry Pi 4, the partition is btrfs formatted.

Your borg version (borg -V).

borg 1.1.9 (Raspbian distribution package)

Operating system (distribution) and version.

Raspbian
4.19.88-v8+ #1284 SMP PREEMPT Wed Dec 11 13:57:50 GMT 2019 aarch64 GNU/Linux

Hardware / network configuration, and filesystems used.

Raspberry Pi 4, 4 GB, plenty of free space on SD card and external USB drive.

How much data is handled by borg?

None so far, repo creation fails.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg init --encryption=repokey </local/path>

Describe the problem you're observing.

Enter new passphrase: 
Enter same passphrase again: 
Do you want your passphrase to be displayed for verification? [yN]: 
Fatal Python error: Bus error

Current thread 0xf7e898e0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 75 in write
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 29 in write
  File "/usr/lib/python3/dist-packages/msgpack/__init__.py", line 38 in pack
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 580 in write_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 443 in commit
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 275 in do_init
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, it is fully reproducible. The same happens if I use a a Ubuntu client running Borg 1.1.5 accessing the Raspberry Pi via ssh.

Encryption argument does not change the observed behaviour.

Include any warning/errors/backtraces from the system logs

Nothing relevant in syslog etc. drive seems healthy.

Repo content:

$ ls -lR
.:
total 16
-rw------- 1 user user  73 Dec 22 02:41 README
-rw------- 1 user user 700 Dec 22 02:41 config
drwx------ 1 user user   2 Dec 22 02:41 data
-rw------- 1 user user   0 Dec 22 02:41 hints.1.tmp
drwx------ 1 user user  52 Dec 22 02:41 lock.exclusive
-rw------- 1 user user  49 Dec 22 02:41 lock.roster
-rw------- 1 user user  16 Dec 22 02:41 nonce

./data:
total 0
drwx------ 1 user user 4 Dec 22 02:41 0

./data/0:
total 8
-rw------- 1 user user 530 Dec 22 02:41 0
-rw------- 1 user user  17 Dec 22 02:41 1

./lock.exclusive:
total 0
-rw------- 1 user user 0 Dec 22 02:41 [email protected]
bug help wanted

All 51 comments

Can you track this down more closely with gdb?

Also, if somebody else has a raspi (4) available: can you reproduce the above issue?

@noma in case you have a 2nd rpi4 (or another rpi), it would be useful to reproduce the issue on other hardware (and maybe also rather on ext4 and not on the same usb hdd).

Just want to make sure it is a borg problem and not something in the hardware or other software.

@noma please find out how your ARM CPU is configured.

does it raise a bus error for an unaligned memory access?

https://www.kernel.org/doc/html/v5.4/arm/mem_alignment.html

    self.hash seems to be a StreamingXXH64 instance.

    def write(self, data):
        """
        Write *data* to backing file and update internal state.
        """
        n = super().write(data)
        self.hash.update(data)               # <---------- line 75 of file_integrity.py
        return n

Notable:

This does not check for failure of PyObject_GetBuffer and is called by StreamingXXH64.update code:

cdef Py_buffer ro_buffer(object data) except *:
    cdef Py_buffer view
    PyObject_GetBuffer(data, &view, PyBUF_SIMPLE)
    return view

So, I need the faulty access address which causes the bus error.

Is it 0?
Is it not 32bit or 64bit aligned?
Contents of /proc/cpu/alignment on the system where/when this happens.

Thanks @ThomasWaldmann for the fast reply. Here are some follow-ups:

Can you track this down more closely with gdb?

I tried:

$ gdb --args /usr/bin/python3 `which borg` init -e none test

and got

Reading symbols from /usr/bin/python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /usr/bin/python3 /usr/bin/borg init -e none test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[Detaching after fork from child process 1646]

Program received signal SIGBUS, Bus error.
0xf6e20044 in XXH64_update ()
   from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
(gdb) bt
#0  0xf6e20044 in XXH64_update ()
   from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
#1  0xf6e20498 in ?? ()
   from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Let me know if you need something specific here.

Also, if somebody else has a raspi (4) available: can you reproduce the above issue?

@mariusknaust can you state your setup here?

@noma please find out how your ARM CPU is configured.

does it raise a bus error for an unaligned memory access?

https://www.kernel.org/doc/html/v5.4/arm/mem_alignment.html

/proc/cpu/alignment does not exist on my system, see:

https://github.com/raspberrypi/linux/issues/2283

You should have mentioned that yours was a 64-bit build - alignment trap is not available on any arm64 kernel.

@noma in case you have a 2nd rpi4 (or another rpi), it would be useful to reproduce the issue on other hardware (and maybe also rather on ext4 and not on the same usb hdd).

Just want to make sure it is a borg problem and not something in the hardware or other software.

I tried it on the SD card with ext4, same result. Don't have another Raspberry Pi at hand.

I tried the 32-bit kernel (default) of Raspbian, to get /proc/cpu/alignment:

4.19.88-v7l+ #1284 SMP Wed Dec 11 13:51:57 GMT 2019 armv7l

vs. the 64-bit kernel (via setting arm_64bit=1 in /boot/config.txt:

4.19.88-v8+ #1284 SMP PREEMPT Wed Dec 11 13:57:50 GMT 2019 aarch64 GNU/Linux

The problem does not occur with the 32-bit kernel. So it seems to be related to the 64-bit kernel.

https://www.raspberrypi.org/forums/viewtopic.php?t=250730

The 64-bit kernel seems to be somewhat experimental, so at this point I am not sure if the problem here is caused by the Borg code-base or the 64-bit kernel. Let me know what you think, maybe we can help the Raspbian/Kernel devs to fix something here.

For completeness, with 32-bit kernel, but working:

$ cat /proc/cpu/alignment
User:           3993
System:         0 (  (null))
Skipped:        0
Half:           0
Word:           0
DWord:          0
Multi:          3993
User faults:    2 (fixup)

@noma thanks for trying the 32bit kernel in fixup mode.

Can you try it again after:

echo 1 > /proc/cpu/alignment

So it will print the faulty access to the kernel log (see dmesg). Can you post that info?

Or even use 3, then use all sorts of borg commands and give me all of the faulty access infos it printed.

Guess you need to look after each borg command, whether it printed more faults and give me command plus the faults that specific command has caused.

BTW, I played with borg on my android smartphone (using AnLinux, Termux, Debian, Ubuntu). It is arm64 (aarch64) and i got borg 1.1.5 from ubuntu and 1.1.9 from Debian and both just worked, no Bus Error.

Here is the part of /var/loh/syslog after:

borg init -e none test

Note: there is no Bus Error with the 32-bit kernel, without checking the log, I'd assume everything is fine.

Let me know which other command is of interest to you.

borg create, check, extract, mount would be interesting.

The log lists a lot of alignment traps and the access address is obviously unaligned (and it is not a NULL pointer issue).

The means the 64bit kernel is not configured to deal with unaligned data, but the 32bit kernel is configured to fixup automatically and stay silent.

Maybe you should file a bug for raspbian 64bit kernel, "fixup" is the better mode for production.

BTW, I did not find anything to get an aligned memory address from PyObject_GetBuffer.

I set the alginment trap handling to signal+warn, but disabled fixup. This way I can give you the call stack from the signal and the unaligned address. If needed, I re-ran the command with fixup+warn, in order to get to the state needed for some command.

$ echo 4 > /proc/cpu/alignment
$ cat /proc/cpu/alignment
User:       3995
System:     0 (  (null))
Skipped:    0
Half:       0
Word:       0
DWord:      0
Multi:      3993
User faults:    5 (signal+warn)

init:

$ borg init -e none test
Fatal Python error: Bus error

Current thread 0xb6f23ad0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 75 in write
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 29 in write
  File "/usr/lib/python3/dist-packages/msgpack/__init__.py", line 38 in pack
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 580 in write_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 443 in commit
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 275 in do_init
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

$ dmesg
Alignment trap: borg (2424) PC=0xb5d60044 Instr=0xe8905001 Address=0xb58a6c0f FSR 0x221

create:

$ borg create test::archive /usr/share/man
Fatal Python error: Bus error

Current thread 0xb6f71ad0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
  File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

$ dmesg
Alignment trap: borg (2475) PC=0xb5dae044 Instr=0xe8905001 Address=0x01495b8f FSR 0x221

check:

$ borg check --verify-data test
Fatal Python error: Bus error

Current thread 0xb6fcaad0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 893 in check
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 315 in do_check
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

$ dmesg
Alignment trap: borg (2506) PC=0xb5e07044 Instr=0xe8905001 Address=0x0118445f FSR 0x221

extract:

$ borg extract test::archive extracted
Killed stale lock [email protected].
Removed stale exclusive roster lock for host host@242606366411511 pid 2506 thread 0.
Fatal Python error: Bus error

Current thread 0xb6f9dad0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
  File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

$ dmesg
Alignment trap: borg (2509) PC=0xb5dda044 Instr=0xe8905001 Address=0x00958697 FSR 0x221

mount:

$ borg mount test::archive mounted
Removed stale shared roster lock for host cube@242606366411511 pid 2509 thread 0.
Fatal Python error: Bus error

Current thread 0xb6f60ad0 (most recent call first):
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
  File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
  File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1355 in do_mount
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
  File "/usr/bin/borg", line 11 in <module>
Bus error

$ dmesg
Alignment trap: borg (2519) PC=0xb5d9d044 Instr=0xe8905001 Address=0x0100298f FSR 0x221

Looks all rather similar, with init writing the repo and all the other operations failing on opening it.

The other line where it crashes is basically the same call, just at another place:

    self.hash seems to be a StreamingXXH64 instance.

    def read(self, n=None):
        """
        Read *data* from backing file (*n* has the usual meaning) and update internal state.
        """
        data = super().read(n)
        self.hash.update(data)    # <----------- line 83 of file_integrity.py
        return data

OK, so we know it is a unaligned access and it can be avoided by using the "fixup" handler in /proc/cpu/alignment (like it should be usually configured).

We do not know yet where the misaligned access operation is located, whether it is in our or xxh64 code, whether we can avoid it / how to avoid it.

This might be related, as well:

https://github.com/raspberrypi/linux/issues/2283

You should have mentioned that yours was a 64-bit build - alignment trap is not available on any arm64 kernel

Looks like all the software causing unaligned accesses will be problematic on the 64-bit kernel - if this statement is still true.

I also reported the problem here (post is pending for moderator approval):
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=250730&start=275

To do anything about this, I need more information about the exact location of the unaligned access, see above. I don't have a platform where this happens, so someone who does needs to find out.

Besides that, I think that platforms that have issues with unaligned memory accesses should always run in "fixup" mode when in production, not in "blow up" mode.

I had the same issue, also on a RPI 4 Model B with a 64-bit Raspbian kernel.

# uname -a
Linux raspberrypi 4.19.97-v8+ #1294 SMP PREEMPT Thu Jan 30 13:27:08 GMT 2020 aarch64 GNU/Linux

I fixed it by removing the Raspbian package and instead using the ARMv7 version from https://borg.bauerj.eu.

@bauerj did you do anything special in your package regarding to alignment?

https://github.com/borgbackup/borg/issues/4891#issuecomment-568312912

It does actually check if that call succeeded through except * (basically Cython inserts boilerplate around the function that checks if an exception was set).

This pre-processor block probably needs updating because it doesn't know about AArch64 and hence generates buggy code down the line: https://github.com/borgbackup/borg/blob/8a6463d7bcf6940cd55ce7ff8f59d677d4975cf1/src/borg/algorithms/xxh64/xxhash.c#L39

Edit: Note upstream changes

https://github.com/Cyan4973/xxHash/blob/dev/xxhash.h#L664

OK, so TODO:

Upgrade xxhash.* from 0.6.2 to 0.7.3.

If I understand the conditions correctly, that should be enough to fix the unaligned access issues on ARM!?

borg 1.1.x: bug as we only use bundled code and the bundled 0.6.2 is buggy on ARM v6 when not operating in fixup mode.

borg master / upcoming 1.2: bug when using the bundled code.

also, the version requirement for the "use external libxxhash" seems wrong:

https://github.com/borgbackup/borg/blob/master/setup_checksums.py#L33

https://github.com/Cyan4973/xxHash/commit/662e199cebb42a0c94ae03d09459a889509063e4

https://github.com/Cyan4973/xxHash/releases/tag/v0.7.2

Looks like the minimum requirement for correct operations on ARMv6 in non-fixup mode is 0.7.2.

Doesn't matter, I checked a handful of distro packages and none of them contain a pkg-config file anyway.

Pushed some code to #4962 - can somebody try it on arm v6 in non-fixup mode?

Reproduce crash with previous code, then try that code, crash should go away.

Merged into 1.1-maint branch now.

Actually, for me the bus-error is still there with 32bit userland Raspbian on a 64 bit kernel. Compiling with CFLAGS="-DXXH_FORCE_MEMORY_ACCESS=1" seems to fix the issue.

The point here is that the default gcc-8 compiler pretends that the system can handle unaligned accesses (which the 64bit kernel cannot), hence the auto-selection yields the "wrong" result:

https://github.com/borgbackup/borg/blob/ba154d1104d960edb06f8752eeca675fce220f3b/src/borg/algorithms/xxh64/xxhash.h#L727

Just ran into this (Pi 4 with current minimal raspbian and borg 1.1.9 (installed via apt)) after i set arm_64bit=1 in the /boot/config.txt.

I've hit this on Buster 5.4 ARM 8 64 bits: Linux debian-buster 5.4.51-v8+ #1330 SMP PREEMPT Thu Jul 30 10:41:00 BST 2020 aarch64 GNU/Linux and Borg version 1.1.9 (via apt). What debugging/diagnostic info would you like provided, just to confirm?

Borg 1.1.9 is outdated and has known severe issues, you need at least 1.1.11, better 1.1.14.

See link there: https://github.com/borgbackup/borg/issues/4891#issuecomment-587367578

@bauerj can you please build some 1.1.14 packages?

@noplanman is already working on it 馃槉

As far as I can see there are some minor issues with dependencies, so it requires some more manual labour. Please be patient.

@ThomasWaldmann There are issues building on Jessie so I'm moving to Stretch.
Will try to get them out today 馃憤

1.1.9 is what was in the Buster repositories, I can try the latest static build from the link posted above using ARM V8 if that's any help.
@noplanman if you want any volunteers to test them at all before submitting upstream feel free to get in touch. I've got a Pi 3(B I think).

BTW, i moved the x86/x64 build to stretch in the last days, directly after 1.1.14, so maybe some of that PR is interesting.

Builds are done 馃帀
https://bintray.com/borg-binary-builder/borg-binaries/borg-binaries/1.1.14#files

(Just takes quite some time to build, especially if at the end of the process something was missing and the whole shebang needs to be redone.)

Should be on https://borg.bauerj.eu/ soon 馃挭

@stellarpower Just saw your message now, go ahead and try the new binaries linked above, thank you!

user@debian-buster /t/borgt> uname -a
Linux debian-buster 5.4.51-v8+ #1330 SMP PREEMPT Thu Jul 30 10:41:00 BST 2020 aarch64 GNU/Linux
user@debian-buster /t/borgt> file arm7
arm7: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=ac5c292b521a7e7a77c4214024d7cfb232c6ca3c, stripped
user@debian-buster /t/borgt> file arm8
arm8: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=db30356c75c705d80493fbd700016049de903395, stripped
user@debian-buster /t/borgt> ./arm8
Failed to execute process './arm8'. Reason:
The file './arm8' does not exist or could not be executed.
user@debian-buster /t/borgt> 

Right! For aarch64 you need the arm64 binary.

1.1.9 in Buster should have the patches for the pre-1.1.11 issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959661

I think that's what I picked. It was borg-1.1.14-arm64

Hmm, I get the same on one of my aarch64 servers and it works perfectly:

$ file borg
/usr/local/bin/borg: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=db30356c75c705d80493fbd700016049de903395, stripped

What version of GLIBC do you have?
$ ldd --version

$ ldd --version
ldd (Debian GLIBC 2.28-10+rpi1) 2.28

I'm guessing the fact that the loader version output by file doesn't seem to be in that location would be the issue. To be honest I had a fight trying to get the Pi up and running so I can't remember entirely off the top of my head what I am running right now. I wanted zfs so needed 64-bit support, which Raspbian has some experimental support for, and I went for Ubuntu but had to abandon that and file a launchpad bug, so at some point I recall bootstrapping Debian with the idea that I should be using 64 bits both in the kernel and the user space. However I'm guessing that the missing loader means I'm using a 32-bit userland, which means something isn't right as I have some Raspberry Pi things on here but I'm pretty sure this is the stock buster I bootstrapped.

Apologies if I've wasted your time with that, apsrt from that though, is there anything else of use that can help diagnose this or situations like this? Strace I'm guessing would be no help but is there something like it I can use to show what the OS exec or the linker-loader is doing, other than ldd?

Ok, I have the same version, so it shouldn't be that.

Unfortunately I'm not too versed in the internals of such things, so I can't help more.

Hopefully others can chime in and figure out what's happening 馃

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rugk picture rugk  路  5Comments

htho picture htho  路  5Comments

chebee7i picture chebee7i  路  5Comments

TinajaLabs picture TinajaLabs  路  6Comments

enkore picture enkore  路  5Comments