Yes
BUG / ISSUE
Using an external drive USB drive on a Raspberry Pi 4, the partition is btrfs formatted.
borg 1.1.9 (Raspbian distribution package)
Raspbian
4.19.88-v8+ #1284 SMP PREEMPT Wed Dec 11 13:57:50 GMT 2019 aarch64 GNU/Linux
Raspberry Pi 4, 4 GB, plenty of free space on SD card and external USB drive.
None so far, repo creation fails.
borg init --encryption=repokey </local/path>
Enter new passphrase:
Enter same passphrase again:
Do you want your passphrase to be displayed for verification? [yN]:
Fatal Python error: Bus error
Current thread 0xf7e898e0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 75 in write
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 29 in write
File "/usr/lib/python3/dist-packages/msgpack/__init__.py", line 38 in pack
File "/usr/lib/python3/dist-packages/borg/repository.py", line 580 in write_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 443 in commit
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 275 in do_init
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
Yes, it is fully reproducible. The same happens if I use a a Ubuntu client running Borg 1.1.5 accessing the Raspberry Pi via ssh.
Encryption argument does not change the observed behaviour.
Nothing relevant in syslog etc. drive seems healthy.
Repo content:
$ ls -lR
.:
total 16
-rw------- 1 user user 73 Dec 22 02:41 README
-rw------- 1 user user 700 Dec 22 02:41 config
drwx------ 1 user user 2 Dec 22 02:41 data
-rw------- 1 user user 0 Dec 22 02:41 hints.1.tmp
drwx------ 1 user user 52 Dec 22 02:41 lock.exclusive
-rw------- 1 user user 49 Dec 22 02:41 lock.roster
-rw------- 1 user user 16 Dec 22 02:41 nonce
./data:
total 0
drwx------ 1 user user 4 Dec 22 02:41 0
./data/0:
total 8
-rw------- 1 user user 530 Dec 22 02:41 0
-rw------- 1 user user 17 Dec 22 02:41 1
./lock.exclusive:
total 0
-rw------- 1 user user 0 Dec 22 02:41 [email protected]
Can you track this down more closely with gdb?
Also, if somebody else has a raspi (4) available: can you reproduce the above issue?
@noma in case you have a 2nd rpi4 (or another rpi), it would be useful to reproduce the issue on other hardware (and maybe also rather on ext4 and not on the same usb hdd).
Just want to make sure it is a borg problem and not something in the hardware or other software.
@noma please find out how your ARM CPU is configured.
does it raise a bus error for an unaligned memory access?
self.hash seems to be a StreamingXXH64 instance.
def write(self, data):
"""
Write *data* to backing file and update internal state.
"""
n = super().write(data)
self.hash.update(data) # <---------- line 75 of file_integrity.py
return n
Notable:
This does not check for failure of PyObject_GetBuffer and is called by StreamingXXH64.update code:
cdef Py_buffer ro_buffer(object data) except *:
cdef Py_buffer view
PyObject_GetBuffer(data, &view, PyBUF_SIMPLE)
return view
So, I need the faulty access address which causes the bus error.
Is it 0?
Is it not 32bit or 64bit aligned?
Contents of /proc/cpu/alignment on the system where/when this happens.
Thanks @ThomasWaldmann for the fast reply. Here are some follow-ups:
Can you track this down more closely with gdb?
I tried:
$ gdb --args /usr/bin/python3 `which borg` init -e none test
and got
Reading symbols from /usr/bin/python3...(no debugging symbols found)...done.
(gdb) r
Starting program: /usr/bin/python3 /usr/bin/borg init -e none test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[Detaching after fork from child process 1646]
Program received signal SIGBUS, Bus error.
0xf6e20044 in XXH64_update ()
from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
(gdb) bt
#0 0xf6e20044 in XXH64_update ()
from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
#1 0xf6e20498 in ?? ()
from /usr/lib/python3/dist-packages/borg/algorithms/checksums.cpython-37m-arm-linux-gnueabihf.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Let me know if you need something specific here.
Also, if somebody else has a raspi (4) available: can you reproduce the above issue?
@mariusknaust can you state your setup here?
@noma please find out how your ARM CPU is configured.
does it raise a bus error for an unaligned memory access?
/proc/cpu/alignment does not exist on my system, see:
https://github.com/raspberrypi/linux/issues/2283
You should have mentioned that yours was a 64-bit build - alignment trap is not available on any arm64 kernel.
@noma in case you have a 2nd rpi4 (or another rpi), it would be useful to reproduce the issue on other hardware (and maybe also rather on ext4 and not on the same usb hdd).
Just want to make sure it is a borg problem and not something in the hardware or other software.
I tried it on the SD card with ext4, same result. Don't have another Raspberry Pi at hand.
I tried the 32-bit kernel (default) of Raspbian, to get /proc/cpu/alignment:
4.19.88-v7l+ #1284 SMP Wed Dec 11 13:51:57 GMT 2019 armv7l
vs. the 64-bit kernel (via setting arm_64bit=1 in /boot/config.txt:
4.19.88-v8+ #1284 SMP PREEMPT Wed Dec 11 13:57:50 GMT 2019 aarch64 GNU/Linux
The problem does not occur with the 32-bit kernel. So it seems to be related to the 64-bit kernel.
https://www.raspberrypi.org/forums/viewtopic.php?t=250730
The 64-bit kernel seems to be somewhat experimental, so at this point I am not sure if the problem here is caused by the Borg code-base or the 64-bit kernel. Let me know what you think, maybe we can help the Raspbian/Kernel devs to fix something here.
For completeness, with 32-bit kernel, but working:
$ cat /proc/cpu/alignment
User: 3993
System: 0 ( (null))
Skipped: 0
Half: 0
Word: 0
DWord: 0
Multi: 3993
User faults: 2 (fixup)
@noma thanks for trying the 32bit kernel in fixup mode.
Can you try it again after:
echo 1 > /proc/cpu/alignment
So it will print the faulty access to the kernel log (see dmesg). Can you post that info?
Or even use 3, then use all sorts of borg commands and give me all of the faulty access infos it printed.
Guess you need to look after each borg command, whether it printed more faults and give me command plus the faults that specific command has caused.
BTW, I played with borg on my android smartphone (using AnLinux, Termux, Debian, Ubuntu). It is arm64 (aarch64) and i got borg 1.1.5 from ubuntu and 1.1.9 from Debian and both just worked, no Bus Error.
Here is the part of /var/loh/syslog after:
borg init -e none test
Note: there is no Bus Error with the 32-bit kernel, without checking the log, I'd assume everything is fine.
Let me know which other command is of interest to you.
borg create, check, extract, mount would be interesting.
The log lists a lot of alignment traps and the access address is obviously unaligned (and it is not a NULL pointer issue).
The means the 64bit kernel is not configured to deal with unaligned data, but the 32bit kernel is configured to fixup automatically and stay silent.
Maybe you should file a bug for raspbian 64bit kernel, "fixup" is the better mode for production.
BTW, I did not find anything to get an aligned memory address from PyObject_GetBuffer.
I set the alginment trap handling to signal+warn, but disabled fixup. This way I can give you the call stack from the signal and the unaligned address. If needed, I re-ran the command with fixup+warn, in order to get to the state needed for some command.
$ echo 4 > /proc/cpu/alignment
$ cat /proc/cpu/alignment
User: 3995
System: 0 ( (null))
Skipped: 0
Half: 0
Word: 0
DWord: 0
Multi: 3993
User faults: 5 (signal+warn)
init:
$ borg init -e none test
Fatal Python error: Bus error
Current thread 0xb6f23ad0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 75 in write
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 29 in write
File "/usr/lib/python3/dist-packages/msgpack/__init__.py", line 38 in pack
File "/usr/lib/python3/dist-packages/borg/repository.py", line 580 in write_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 443 in commit
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 275 in do_init
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
$ dmesg
Alignment trap: borg (2424) PC=0xb5d60044 Instr=0xe8905001 Address=0xb58a6c0f FSR 0x221
create:
$ borg create test::archive /usr/share/man
Fatal Python error: Bus error
Current thread 0xb6f71ad0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
$ dmesg
Alignment trap: borg (2475) PC=0xb5dae044 Instr=0xe8905001 Address=0x01495b8f FSR 0x221
check:
$ borg check --verify-data test
Fatal Python error: Bus error
Current thread 0xb6fcaad0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 893 in check
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 315 in do_check
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 154 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
$ dmesg
Alignment trap: borg (2506) PC=0xb5e07044 Instr=0xe8905001 Address=0x0118445f FSR 0x221
extract:
$ borg extract test::archive extracted
Killed stale lock [email protected].
Removed stale exclusive roster lock for host host@242606366411511 pid 2506 thread 0.
Fatal Python error: Bus error
Current thread 0xb6f9dad0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
$ dmesg
Alignment trap: borg (2509) PC=0xb5dda044 Instr=0xe8905001 Address=0x00958697 FSR 0x221
mount:
$ borg mount test::archive mounted
Removed stale shared roster lock for host cube@242606366411511 pid 2509 thread 0.
Fatal Python error: Bus error
Current thread 0xb6f60ad0 (most recent call first):
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 83 in read
File "/usr/lib/python3/dist-packages/borg/crypto/file_integrity.py", line 32 in read
File "/usr/lib/python3/dist-packages/borg/repository.py", line 466 in open_index
File "/usr/lib/python3/dist-packages/borg/repository.py", line 1070 in get
File "/usr/lib/python3/dist-packages/borg/helpers.py", line 330 in load
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 141 in wrapper
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1355 in do_mount
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4387 in run
File "/usr/lib/python3/dist-packages/borg/archiver.py", line 4455 in main
File "/usr/bin/borg", line 11 in <module>
Bus error
$ dmesg
Alignment trap: borg (2519) PC=0xb5d9d044 Instr=0xe8905001 Address=0x0100298f FSR 0x221
Looks all rather similar, with init writing the repo and all the other operations failing on opening it.
The other line where it crashes is basically the same call, just at another place:
self.hash seems to be a StreamingXXH64 instance.
def read(self, n=None):
"""
Read *data* from backing file (*n* has the usual meaning) and update internal state.
"""
data = super().read(n)
self.hash.update(data) # <----------- line 83 of file_integrity.py
return data
OK, so we know it is a unaligned access and it can be avoided by using the "fixup" handler in /proc/cpu/alignment (like it should be usually configured).
We do not know yet where the misaligned access operation is located, whether it is in our or xxh64 code, whether we can avoid it / how to avoid it.
This might be related, as well:
https://github.com/raspberrypi/linux/issues/2283
You should have mentioned that yours was a 64-bit build - alignment trap is not available on any arm64 kernel
Looks like all the software causing unaligned accesses will be problematic on the 64-bit kernel - if this statement is still true.
I also reported the problem here (post is pending for moderator approval):
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=250730&start=275
To do anything about this, I need more information about the exact location of the unaligned access, see above. I don't have a platform where this happens, so someone who does needs to find out.
Besides that, I think that platforms that have issues with unaligned memory accesses should always run in "fixup" mode when in production, not in "blow up" mode.
I had the same issue, also on a RPI 4 Model B with a 64-bit Raspbian kernel.
# uname -a
Linux raspberrypi 4.19.97-v8+ #1294 SMP PREEMPT Thu Jan 30 13:27:08 GMT 2020 aarch64 GNU/Linux
I fixed it by removing the Raspbian package and instead using the ARMv7 version from https://borg.bauerj.eu.
@bauerj did you do anything special in your package regarding to alignment?
No, the build process is pretty straight-forward: https://git.feneas.org/borg-binary-builder/borg-binaries/blob/master/Dockerfile.hot
https://github.com/borgbackup/borg/issues/4891#issuecomment-568312912
It does actually check if that call succeeded through except * (basically Cython inserts boilerplate around the function that checks if an exception was set).
This pre-processor block probably needs updating because it doesn't know about AArch64 and hence generates buggy code down the line: https://github.com/borgbackup/borg/blob/8a6463d7bcf6940cd55ce7ff8f59d677d4975cf1/src/borg/algorithms/xxh64/xxhash.c#L39
Edit: Note upstream changes
OK, so TODO:
Upgrade xxhash.* from 0.6.2 to 0.7.3.
If I understand the conditions correctly, that should be enough to fix the unaligned access issues on ARM!?
borg 1.1.x: bug as we only use bundled code and the bundled 0.6.2 is buggy on ARM v6 when not operating in fixup mode.
borg master / upcoming 1.2: bug when using the bundled code.
also, the version requirement for the "use external libxxhash" seems wrong:
https://github.com/borgbackup/borg/blob/master/setup_checksums.py#L33
https://github.com/Cyan4973/xxHash/commit/662e199cebb42a0c94ae03d09459a889509063e4
https://github.com/Cyan4973/xxHash/releases/tag/v0.7.2
Looks like the minimum requirement for correct operations on ARMv6 in non-fixup mode is 0.7.2.
Doesn't matter, I checked a handful of distro packages and none of them contain a pkg-config file anyway.
Pushed some code to #4962 - can somebody try it on arm v6 in non-fixup mode?
Reproduce crash with previous code, then try that code, crash should go away.
Merged into 1.1-maint branch now.
Actually, for me the bus-error is still there with 32bit userland Raspbian on a 64 bit kernel. Compiling with CFLAGS="-DXXH_FORCE_MEMORY_ACCESS=1" seems to fix the issue.
The point here is that the default gcc-8 compiler pretends that the system can handle unaligned accesses (which the 64bit kernel cannot), hence the auto-selection yields the "wrong" result:
Just ran into this (Pi 4 with current minimal raspbian and borg 1.1.9 (installed via apt)) after i set arm_64bit=1 in the /boot/config.txt.
I've hit this on Buster 5.4 ARM 8 64 bits: Linux debian-buster 5.4.51-v8+ #1330 SMP PREEMPT Thu Jul 30 10:41:00 BST 2020 aarch64 GNU/Linux and Borg version 1.1.9 (via apt). What debugging/diagnostic info would you like provided, just to confirm?
Borg 1.1.9 is outdated and has known severe issues, you need at least 1.1.11, better 1.1.14.
See link there: https://github.com/borgbackup/borg/issues/4891#issuecomment-587367578
@bauerj can you please build some 1.1.14 packages?
@noplanman is already working on it 馃槉
As far as I can see there are some minor issues with dependencies, so it requires some more manual labour. Please be patient.
@ThomasWaldmann There are issues building on Jessie so I'm moving to Stretch.
Will try to get them out today 馃憤
1.1.9 is what was in the Buster repositories, I can try the latest static build from the link posted above using ARM V8 if that's any help.
@noplanman if you want any volunteers to test them at all before submitting upstream feel free to get in touch. I've got a Pi 3(B I think).
BTW, i moved the x86/x64 build to stretch in the last days, directly after 1.1.14, so maybe some of that PR is interesting.
Builds are done 馃帀
https://bintray.com/borg-binary-builder/borg-binaries/borg-binaries/1.1.14#files
(Just takes quite some time to build, especially if at the end of the process something was missing and the whole shebang needs to be redone.)
Should be on https://borg.bauerj.eu/ soon 馃挭
@stellarpower Just saw your message now, go ahead and try the new binaries linked above, thank you!
user@debian-buster /t/borgt> uname -a
Linux debian-buster 5.4.51-v8+ #1330 SMP PREEMPT Thu Jul 30 10:41:00 BST 2020 aarch64 GNU/Linux
user@debian-buster /t/borgt> file arm7
arm7: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=ac5c292b521a7e7a77c4214024d7cfb232c6ca3c, stripped
user@debian-buster /t/borgt> file arm8
arm8: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=db30356c75c705d80493fbd700016049de903395, stripped
user@debian-buster /t/borgt> ./arm8
Failed to execute process './arm8'. Reason:
The file './arm8' does not exist or could not be executed.
user@debian-buster /t/borgt>
Right! For aarch64 you need the arm64 binary.
1.1.9 in Buster should have the patches for the pre-1.1.11 issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=959661
I think that's what I picked. It was borg-1.1.14-arm64
Hmm, I get the same on one of my aarch64 servers and it works perfectly:
$ file borg
/usr/local/bin/borg: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=db30356c75c705d80493fbd700016049de903395, stripped
What version of GLIBC do you have?
$ ldd --version
$ ldd --version
ldd (Debian GLIBC 2.28-10+rpi1) 2.28
I'm guessing the fact that the loader version output by file doesn't seem to be in that location would be the issue. To be honest I had a fight trying to get the Pi up and running so I can't remember entirely off the top of my head what I am running right now. I wanted zfs so needed 64-bit support, which Raspbian has some experimental support for, and I went for Ubuntu but had to abandon that and file a launchpad bug, so at some point I recall bootstrapping Debian with the idea that I should be using 64 bits both in the kernel and the user space. However I'm guessing that the missing loader means I'm using a 32-bit userland, which means something isn't right as I have some Raspberry Pi things on here but I'm pretty sure this is the stock buster I bootstrapped.
Apologies if I've wasted your time with that, apsrt from that though, is there anything else of use that can help diagnose this or situations like this? Strace I'm guessing would be no help but is there something like it I can use to show what the OS exec or the linker-loader is doing, other than ldd?
Ok, I have the same version, so it shouldn't be that.
Unfortunately I'm not too versed in the internals of such things, so I can't help more.
Hopefully others can chime in and figure out what's happening 馃