Borg: extract fails with KeyError in archiver.py

Created on 7 Jun 2019  路  23Comments  路  Source: borgbackup/borg

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

> borg -V
borg 1.1.10

Operating system (distribution) and version.

> lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.9 (stretch)
Release:        9.9
Codename:       stretch

(I've the same problem on Ubuntu 18.04)

Hardware / network configuration, and filesystems used.

ext4 on encrypted luks volume

How much data is handled by borg?

> borg info
Repository ID: 7c59174561a2f9cbbad17fc9b3d6738a03adb61c2a2d663b9f108b8d5c1447ac
Location: /tmp/borg-backups-blake2
Encrypted: Yes (repokey BLAKE2b)
Cache: /home/arathunku/.cache/borg/7c59174561a2f9cbbad17fc9b3d6738a03adb61c2a2d663b9f108b8d5c1447ac
Security dir: /home/arathunku/.config/borg/security/7c59174561a2f9cbbad17fc9b3d6738a03adb61c2a2d663b9f108b8d5c1447ac
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:               16.74 GB             13.60 GB             13.32 GB

                       Unique chunks         Total chunks
Chunk index:                  314126               347949

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg init --encryption=repokey-blake2 $BORG_REPO

borg create                           \
      --verbose                       \
      --filter AME                    \
      --stats                         \
      --show-rc                       \
      --progress                      \
      --compression lz4               \
      --exclude-caches                \
      --exclude '/home/*/.cache/*'    \
      --exclude '/var/cache/*'        \
      --exclude '/var/tmp/*'          \
      --exclude '/home/**/*.cache/*'  \
      --exclude '**/target/*/debug'   \
      --exclude '**/target/debug'     \
      --exclude '**/code/**/_build/*' \
      --exclude '**/node_modules/*'   \
      --exclude '**/android/app/build'\
      --exclude '**/etc/ssh/ssh_*'    \
      --exclude '**/tmp/*'            \
                                      \
      ::'{hostname}-{now}'            \
      "$HOME/code"                    

Fri Jun  7 12:34:43 CEST 2019 Starting backup to: /tmp/borg-backups-blake2


Fri Jun  7 12:34:43 CEST 2019 You can observe it progress via tail -f run.log

Creating archive at "/tmp/borg-backups-blake2::arath-tp-2019-06-07T12:34:44"
------------------------------------------------------------------------------
Archive name: arath-tp-2019-06-07T12:34:44
Archive fingerprint: 2a98f567ccf4fd68258f4cec835a71e3aef62104ecd42ea08e6295bc7d227868
Time (start): Fri, 2019-06-07 12:34:45
Time (end):   Fri, 2019-06-07 12:56:57
Duration: 22 minutes 11.91 seconds
Number of files: 342848
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               16.74 GB             13.60 GB             13.32 GB
All archives:               16.74 GB             13.60 GB             13.32 GB

                       Unique chunks         Total chunks
Chunk index:                  314126               347949
------------------------------------------------------------------------------
terminating with success status, rc 0

Fri Jun  7 12:57:05 CEST 2019 Pruning repository


> borg list
arath-tp-2019-06-07T12:34:44         Fri, 2019-06-07 12:34:45 [2a98f567ccf4fd68258f4cec835a71e3aef62104ecd42ea08e6295bc7d227868]


> borg extract $BORG_REPO::arath-tp-2019-06-07T12:34:44 home/arathunku/code/alacritty
Local Exception
Traceback (most recent call last):
  File "borg/archiver.py", line 4501, in main
  File "borg/archiver.py", line 4433, in run
  File "borg/archiver.py", line 166, in wrapper
  File "borg/archiver.py", line 179, in wrapper
  File "borg/archiver.py", line 760, in do_extract
  File "borg/archive.py", line 467, in iter_items
  File "borg/archive.py", line 236, in unpack_many
KeyError: 'home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77'

Platform: Linux arath-tp 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64
Linux: debian 9.9
Borg: 1.1.10  Python: CPython 3.5.7 msgpack: 0.5.6
PID: 8272  CWD: /home/arathunku/dotfiles/backups
sys.argv: ['borg', 'extract', '/tmp/borg-backups-blake2::arath-tp-2019-06-07T12:34:44', 'home/arathunku/code/alacritty']
SSH_ORIGINAL_COMMAND: None

Describe the problem you're observing.

Extract fails

> borg extract --debug $BORG_REPO::arath-tp-2019-06-07T12:34:44 home/arathunku/code/alacritty
using builtin fallback logging configuration
35 self tests completed in 0.12 seconds
Verified integrity of /tmp/borg-backups-blake2/index.29
TAM-verified manifest
security: read previous location '/tmp/borg-backups-blake2'
security: read manifest timestamp '2019-06-07T10:56:57.680016'
security: determined newest manifest timestamp as 2019-06-07T10:56:57.680016
security: repository checks ok, allowing access
Local Exception
Traceback (most recent call last):
  File "borg/archiver.py", line 4501, in main
  File "borg/archiver.py", line 4433, in run
  File "borg/archiver.py", line 166, in wrapper
  File "borg/archiver.py", line 179, in wrapper
  File "borg/archiver.py", line 760, in do_extract
  File "borg/archive.py", line 467, in iter_items
  File "borg/archive.py", line 236, in unpack_many
KeyError: 'home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77'

Platform: Linux arath-tp 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64
Linux: debian 9.9
Borg: 1.1.10  Python: CPython 3.5.7 msgpack: 0.5.6
PID: 12589  CWD: /home/arathunku/dotfiles/backups
sys.argv: ['borg', 'extract', '--debug', '/tmp/borg-backups-blake2::arath-tp-2019-06-07T12:34:44', 'home/arathunku/code/alacritty']
SSH_ORIGINAL_COMMAND: None

file permissions:

> ls -al /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77
.rwxr-xr-x 6.9M arathunku 17 Sep  2018 /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I can reproduce the problem when I backup all of my projects in "code".
I cannot reproduce the issue when I backup ONLY this one project - alacritty which is very strange.
I was able to reproduce the issue on newly created repo (logs are from that one)

> borg check --info
Starting repository check
Starting repository index check
Completed repository check, no problems found.
Starting archive consistency check...
Analyzing archive arath-tp-2019-06-07T12:34:44 (1/1)
Archive consistency check complete, no problems found.

Include any warning/errors/backtraces from the system logs

No backtraces/warnings in sys log.

bug

All 23 comments

Please:

ls -li /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77

Also, if you can easily check: does it work with borg 1.1.9 (or < 1.1.10 in general) and not work with 1.1.10?

@ThomasWaldmann

Extract works on v1.1.9

> ls -li /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77
10529305 .rwxr-xr-x 6.9M arathunku 17 Sep  2018 /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77
> borg --version
borg 1.1.9

> borg init --encryption=repokey-blake2 $BORG_REPO


By default repositories initialized with this version will produce security
errors if written to with an older version (up to and including Borg 1.0.8).

If you want to use these older versions, you can disable the check by running:
borg upgrade --disable-tam /tmp/borg-backups-blake2

See https://borgbackup.readthedocs.io/en/stable/changes.html#pre-1-0-9-manifest-spoofing-vulnerability for details about the security implications.

IMPORTANT: you will need both KEY AND PASSPHRASE to access this repo!
Use "borg key export" to export the key, optionally in printable format.
W



> borg create                         \
      --verbose                       \
      --filter AME                    \
      --stats                         \
      --show-rc                       \
      --progress                      \
      --compression lz4               \
      --exclude-caches                \
      --exclude '/home/*/.cache/*'    \
      --exclude '/var/cache/*'        \
      --exclude '/var/tmp/*'          \
      --exclude '/home/**/*.cache/*'  \
      --exclude '**/target/*/debug'   \
      --exclude '**/target/debug'     \
      --exclude '**/code/**/_build/*' \
      --exclude '**/node_modules/*'   \
      --exclude '**/android/app/build'\
      --exclude '**/etc/ssh/ssh_*'    \
      --exclude '**/tmp/*'            \
                                      \
      ::'{hostname}-{now}'            \
      "$HOME/code"
Creating archive at "/tmp/borg-backups-blake2::arath-tp-2019-06-07T15:28:32"
------------------------------------------------------------------------------
Archive name: arath-tp-2019-06-07T15:28:32
Archive fingerprint: 805b671d820326bfcac10f996cbad61eda209d19cf4c7a2d3f1e6b3eb413e9bd
Time (start): Fri, 2019-06-07 15:28:33
Time (end):   Fri, 2019-06-07 15:45:27
Duration: 16 minutes 53.06 seconds
Number of files: 342848
Utilization of max. archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               16.74 GB             13.60 GB             13.34 GB
All archives:               16.74 GB             13.60 GB             13.34 GB

                       Unique chunks         Total chunks
Chunk index:                  314273               348110
------------------------------------------------------------------------------
terminating with success status, rc 0


> borg extract --debug $BORG_REPO::arath-tp-2019-06-07T15:28:32 home/arathunku/code/alacritty

using builtin fallback logging configuration
35 self tests completed in 0.14 seconds
Verified integrity of /tmp/borg-backups-blake2/index.29
TAM-verified manifest
security: read previous location '/tmp/borg-backups-blake2'
security: read manifest timestamp '2019-06-07T13:45:27.163948'
security: determined newest manifest timestamp as 2019-06-07T13:45:27.163948
security: repository checks ok, allowing access

# extracted checksum
> tar -cf - home/arathunku/code/alacritty | md5sum
1c2c4605d2eca7dcb9a0e6be7ba09374  -
> tar -cf - /home/arathunku/code/alacritty | md5sum
1c2c4605d2eca7dcb9a0e6be7ba09374  -

EDIT: I just wanted to confirm, it's 100% problem with extract, not with init/create. I just tested that on another, bigger, repo. (I had to restore backups :see_no_evil:, borg saves the day :heart:)

Looks like my fix to "borg hanging when extracting hardlink slaves (only)" has introduced this (the above crash is in the new code that was added to borg 1.1.10). #4350 PRs: #4542 #4539

extract: hang on partial extraction with ssh: repo, when hardlink master
is not matched/extracted and borg hangs on related slave hardlink, #4350

Also it seems we have no unit test triggering this fault, so the most simple reproduction of this should be added as a unit test.

Hmm, @arathunku can you check if the problematic files are hardlinks (maybe some optimization storing sw packages only once if they are used by multiple projects)?

I thought the ls command prints the inode reference count, but seems like it does not.

Also, you could reproduce the KeyError and then do something like:

borg list $BORG_REPO::archive | grep problematic_filename_seen_in_KeyError

Hmm, I can't reproduce nor do I see a bug in the code yet.

Can you simplify the commands needed to reproduce the issue and post a bare-minimum reproduction script?

@ThomasWaldmann,

Can you simplify the commands needed to reproduce the issue and post a bare-minimum reproduction script?

I've tried to do that by backing up only alacritty directory and in that case, it worked but also the repo was a lot of smaller :disappointed: I'll try again after the weekend.

Some additional details for now:

@:/tmp() $ ./borg list $BORG_REPO::arath-tp-2019-06-07T15:28:32 | rg 'alacritty/target/release/deps/alacritty-aa39d744337c2a77'
38371:-rw-r--r-- arathunku arathunku      199 Mon, 2018-09-17 19:20:45 home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77.d
38373:-rwxr-xr-x arathunku arathunku  6923936 Mon, 2018-09-17 19:27:34 home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77
39443:hrwxr-xr-x arathunku arathunku        0 Mon, 2018-09-17 19:27:34 home/arathunku/code/alacritty/target/release/alacritty link to home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77

Sorry about the mess with ls, I had it aliased :|

@:/tmp() $ \ls -al /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77
-rwxr-xr-x 2 arathunku arathunku 6923936 Sep 17  2018 /home/arathunku/code/alacritty/target/release/deps/alacritty-aa39d744337c2a77

Thank you for taking a time to look into the issue.

This is likely a problem due to partial extraction of hardlinks (see issues I linked above).

I could not reproduce the problem yet, though.

If I can't debug it on my system, could there be the possibility to debug it on your system?

I had prepared a script:

#!/usr/bin/env bash
PREFIX="/tmp/extract-borg-test"
BORG_REPO="${PREFIX}/borg-repo-test"
BORG_PASSWORD="BORG_PASSWORDBORG_PASSWORDBORG_PASSWORDBORG_PASSWORDBORG_PASSWORDBORG_PASSWORD"
BORG_PASSCOMMAND="echo ${PREFIX}/borg-repo-test-password"
LOG="${PREFIX}/run.log"
echo $BORG_PASSSWORD > "${PREFIX}/borg-repo-test-password"

cd "${PREFIX}"
rm -f "${LOG}"

for ((i=1;i<=10;i++));
do
      set -e
      rm -rf "${PREFIX}/borg-repo-test"

      /tmp/borg init --encryption=repokey-blake2 $BORG_REPO >> "${LOG}" 2>&1
      /tmp/borg create                      \
            ::'test'                        \
            "$HOME/code/alacritty"          \
            >> "${LOG}" 2>&1

      set +e
      /tmp/borg extract --debug $BORG_REPO::test >> "${LOG}" 2>&1
      if [ $? -eq 0 ]; then
            echo "${i} Extract - OK"
      else
            echo "${i} Extract - FAIL"
      fi
done

output:

@:/tmp/extract-borg-test() $ ./test-borg.sh
1 Extract - FAIL
2 Extract - FAIL
3 Extract - FAIL
4 Extract - FAIL
5 Extract - FAIL
6 Extract - FAIL
7 Extract - FAIL
8 Extract - OK # !!!!!!!!!!!
9 Extract - FAIL
10 Extract - FAIL

I had packaged scripts and alacritty dir and retried that on my other computer(Ubuntu 18.04, ext4 luks too) and it passes there.

It also started passing when I tried to cp -r the alacritty directory into another place on the same computer and now I cannot repro it any longer? I'll try to regenerate the alacritty directory with its hardlinks.

I should still have my older snapshots on which I could maybe test modified version of borg that would provide more debugging info?

BORG_REPO="${PREFIX}/borg-repo-test"

You should always use export BORG_...=... so the stuff is put into an environment variable that can be read from within the borg process.

BORG_PASSWORD="BORG_PASSWORDBORG_PASSWORD"

borg does not use that, it has BORG_PASSPHRASE though and that would be easier than this:

BORG_PASSCOMMAND="echo ${PREFIX}/borg-repo-test-password"
echo $BORG_PASSSWORD > "${PREFIX}/borg-repo-test-password"

(also, you wanted cat as BORG_PASSCOMMAND, not echo)

Don't do rm -rf "${PREFIX}/borg-repo-test", but rather use borg delete ... (it does not only remove the repo dir, but also the caches related to the repo).

I like to use rsync -aH as a better "copy all" than cp -r.

borg rc != 0 does not necessarily mean FAIL btw. rc 1 is just warning, only rc 2 is error.

And as a general comment:

It is interesting that rc differs (but see log for details, option --show-rc would be useful to log the rc), but this is not a bare-minimum reproduction script.

Keep in mind that we try to find a bug in borg here, not in the shell script. :-)

So:

  • --encryption=none
  • use BORG_REPO (borg reads that automatically, no need to give it on the commandline)
  • no less-than-trivial shell scripting (stay away from all other variables, loops, quoting, conditionals, multi-line statements, ...)

Thanks for the tips! cp -r -> rsync -aH :heart:
The script was hacked super quickly just to skip some manual steps for testing.

I had verified the script works correctly by manually inspecting logs, I also run like 100~ times. Extract fails nondeterministically. It also failed only in this specific case, when I had regenerated the project (with hardlinks too), extract started to work 100% times.

I'll update the script following the instructions above and get more data.

Untested:

#!/usr/bin/env bash
LOG="/tmp/borg.log"
EXTRACT="/tmp/extract"
export BORG_REPO="/tmp/borg-repo-test"
export BORG_DELETE_I_KNOW_WHAT_I_AM_DOING="YES"

borg init --encryption=none >> $LOG 2>&1
borg create --list ::test ~/code/alacritty >> $LOG 2>&1
mkdir -p $EXTRACT
cd $EXTRACT
borg extract --list --show-rc ::test xxxxxxxxxx >> $LOG 2>&1  # fix to do partial extract
cd
rm -rf $EXTRACT
borg delete >> $LOG 2>&1

About the conditions that might affect it:

  • usage of hardlinks in archived data
  • specific recursion and inode order while creating the archive
  • file type of hardlinked "file"
  • guess it only can fail at extract time when doing a partial extract

As you had rc != 0 with full extract, please check the logs.

Run log output

Local Exception
Traceback (most recent call last):
  File "borg/archiver.py", line 4501, in main
  File "borg/archiver.py", line 4433, in run
  File "borg/archiver.py", line 166, in wrapper
  File "borg/archiver.py", line 179, in wrapper
  File "borg/archiver.py", line 760, in do_extract
  File "borg/archive.py", line 467, in iter_items
  File "borg/archive.py", line 236, in unpack_many
KeyError: 'home/arathunku/code/alacritty/alacritty/target/release/deps/alacritty-aa39d744337c2a77'

Platform: Linux arath-tp 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3 (2019-02-02) x86_64
Linux: debian 9.9
Borg: 1.1.10  Python: CPython 3.5.7 msgpack: 0.5.6
PID: 15453  CWD: /tmp/extract
sys.argv: ['/tmp/borg', 'extract', '--list', '--show-rc', '::test', 'home/arathunku/code']
SSH_ORIGINAL_COMMAND: None

terminating with error status, rc 2
You requested to completely DELETE the repository *including* all archives it contains:
test                                 Sun, 2019-06-09 13:45:56 [f53882ea3e1b078e4035b7877e4f1ef4c294367fd3231045958d95373925e2ba]
Type 'YES' if you understand this and want to continue: YES (from BORG_DELETE_I_KNOW_WHAT_I_AM_DOING)

Full log:
run.log

script:

#!/usr/bin/env bash
export PREFIX="/tmp/extract-borg-test"
export BORG_REPO="${PREFIX}/repo"
export BORG_DELETE_I_KNOW_WHAT_I_AM_DOING="YES"
LOG="${PREFIX}/run.log"
EXTRACT="/tmp/extract"

mkdir "${EXTRACT}"
cd "${PREFIX}"

rm -f "${LOG}"

/tmp/borg init --encryption=none >> "${LOG}" 2>&1
/tmp/borg create --list ::test "${HOME}/code/alacritty" >> "${LOG}" 2>&1

cd $EXTRACT
/tmp/borg extract --list --show-rc ::test home/arathunku/code >> $LOG 2>&1  # fix to do partial extract
cd "${PREFIX}"
rm -rf $EXTRACT

/tmp/borg delete >> "${LOG}" 2>&1
echo "done"

I had repeated that multiple times and made sure there are no leftovers in cache.

OK, so the only thing missing in the script is to create a "input files" directory and create an arrangement of input files / hardlinks / directories that reproduce the problem.

Full repro on 2 different machines:

input generted via:

#!/usr/bin/env bash
export INPUT="/tmp/codecode"

mkdir $INPUT
cd $INPUT

cargo --version
rustup --version
rustc --version

git clone [email protected]:jwilm/alacritty.git alacritty0 || true
cd alacritty0 && cargo build --release 

script:

#!/usr/bin/env bash
export PREFIX="/tmp/extract-borg-test"
export BORG_REPO="${PREFIX}/repo"
export BORG_DELETE_I_KNOW_WHAT_I_AM_DOING="YES"
# Relative because later I use this for partial extract
export INPUT="tmp/codecode"
LOG="${PREFIX}/run.log"
EXTRACT="/tmp/extract"

mkdir "${EXTRACT}"
cd "${PREFIX}"

rm -f "${LOG}"

/tmp/borg init --encryption=none >> "${LOG}" 2>&1
/tmp/borg create --list ::test "/${INPUT}" >> "${LOG}" 2>&1

cd $EXTRACT
/tmp/borg extract --list --show-rc ::test "${INPUT}" >> $LOG 2>&1  # fix to do partial extract
cd "${PREFIX}"
rm -rf $EXTRACT

/tmp/borg delete >> "${LOG}" 2>&1
echo "KeyError lines in run.log: $(grep KeyError $LOG | wc -l)"
> ./generate-input.sh && ./test-borg.sh
cargo 1.37.0-nightly (545f35425 2019-05-23)
rustup 1.18.3 (435397f48 2019-05-22)
rustc 1.37.0-nightly (7cdaffd79 2019-06-05)
Cloning into 'alacritty0'...
remote: Enumerating objects: 23, done.
remote: Counting objects: 100% (23/23), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 9264 (delta 4), reused 8 (delta 1), pack-reused 9241
Receiving objects: 100% (9264/9264), 6.41 MiB | 8.16 MiB/s, done.
Resolving deltas: 100% (6249/6249), done.
   ...........
   ...........
   ...........

Finished release [optimized + debuginfo] target(s) in 5m 14s
KeyError lines in run.log: 1
> grep KeyError run.log
KeyError: 'tmp/codecode/alacritty0/target/release/deps/libwinpty-996b66e13f96580c.rlib'
> /bin/ls -ali /tmp/codecode/alacritty0/target/release/deps/libwinpty-996b66e13
f96580c.rlib
2757476 -rw-rw-r-- 2 arathunku arathunku 3958 cze  9 15:20 /tmp/codecode/alacritty0/target/release/deps/libwinpty-996b66e13f96580c.rlib

OK, using rustup.rs curl | sh script now, it is a throwaway vm anyway. :)

Can reproduce the KeyError, yay. Now searching the bug...

@arathunku fix was just merged into 1.1-maint branch.

The fix is needed for this case (because of a bug introduced in 1.1.10 while fixing another bug):

  • one is doing a partial extraction, like borg extract repo::archive base_path
  • hardlink master is selected for extraction
  • hardlink slave is selected for extraction
  • hl master and slave are stored in separate metadata chunks of the archive metadata stream

Workaround:

  • do a full extraction, like borg extract repo::archive, or
  • use borg != 1.1.10 (e.g. 1.1.9)

@ThomasWaldmann, thank you

@arathunku thank you for making a reproduction script!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pierreozoux picture pierreozoux  路  4Comments

TinajaLabs picture TinajaLabs  路  6Comments

enkore picture enkore  路  5Comments

unlandm picture unlandm  路  4Comments

ThomasWaldmann picture ThomasWaldmann  路  6Comments