I am currently having 18 borg-repos. Backups from 6 different servers, 3 repos per server. My idea is to rotate them for different weekdays (Mon+Thu, TueSat, Wed+Fri+Sun) in case a repo breaks. My way of protecting against the downside of deduplication. Even with three mirrors of the data I am saving lots of space. The backup scripts does a create, prune, check.
I have been running this for 1-2 weeks now, with the last 4 days or so using the 1.1.0-b1 version. After I started to use the beta version I started to get the Index object count mismatch from the check-jobs. I currently have the issue with 4 of my 18 repos. The interesting fact is that this far, all of them has a mismatch of one.
At first I only noticed the issue with one host. I deleted a repo to make sure I started over from scratch, and removed the prune job from the backup script, hoping the prune was the issue. This did not help.
Earlier I was using version 1.0.7 and did not get these issues, so I assume it's something with the new code, though I can't be certain.
I assume this isn't a known issue since I couldn't find anything about it. Any ideas what could be the cause? Anything I could do to help finding the issue? Any other ideas?
One of my simplest repos are backed up like this:
${BORG} create --compression lzma,1 borg@${DESTHOST}:${DESTPATH}${BORGREPO}.borg::{now} /etc /usr/local/bin /root /home
Eg, no weird options at all, and it happened to this one early this morning.
:moneybag: there is a bounty for this
Can you try this branch? Maybe that gives some insight.
I'll try to compile it and run a check later today. Have some work I need to finish first though. Thank you again for your prompt assistance.
I followed the guide at https://borgbackup.readthedocs.io/en/stable/installation.html#using-git to build your branch. I did not get any additional information from it though.
BORG_PASSPHRASE='foo' borg-env/bin/borg check /data/borg/thor/MonThu.borg/
Index object count mismatch. 51251 != 51250
Completed repository check, errors found.
UPDATE: Wait a minute. Seems like I am a bit lame here. I just cloned https://github.com/enkore/borg.git and didn't think of the fact that I needed a special branch, which I did not get. I will try to figure it out.
After realizing I built your master branch and not issue/1598 I got some new output:
BORG_PASSPHRASE='foo' ../borg-env/bin/borg check /data/borg/thor/MonThu.borg/
Index object count mismatch.
committed index: 51251 objects
rebuilt index: 51250 objects
ID rebuilt index committed index
0000000000000000000000000000000000000000000000000000000000000000 <not found> (903, 3904605)
Completed repository check, errors found.
I assume it tells you at least a bit more than it tells me.
Thanks.
It seems somehow the manifest was lost.
Can you check whether there is a file called 903 in <repository>/data/<n>/? Are there files beyond (numerically) 903?
Starting from 903 and going up numerically there are these:
-rw------- 1 borg root 3905361 Sep 15 05:00 903
-rw------- 1 borg root 17 Sep 15 05:00 904
-rw------- 1 borg root 851961 Sep 15 05:00 905
-rw------- 1 borg root 17 Sep 15 05:00 906
-rw------- 1 borg root 805 Sep 15 12:45 907
Hm. Could you rebuild from my branch (git pull && <install>, should be commit 7fae010e) and re-run the check with --debug?
Something strange is up. The segment referenced exists (903), and the offset (3904605) is plausible with the segment size (3905361) and that it's the manifest, but check doesn't seem to find the object...
Does running borg debug-get-obj <repository> 0000000000000000000000000000000000000000000000000000000000000000 some-file give you an error?
Of course! I did a "git pull && pip install -e .", I assume that should be sufficient for getting a new "binary".
BORG_PASSPHRASE='foo' ../borg-env/bin/borg check --debug /data/borg/thor/MonThu.borg/
using builtin fallback logging configuration
29 self tests completed in 0.11 seconds
Starting repository check
Read committed index of transaction 906
Segment transaction is 906
Determined transaction is 906
Found 884 segments
Starting repository index check
Index object count mismatch.
committed index: 51251 objects
rebuilt index: 51250 objects
ID rebuilt index committed index
0000000000000000000000000000000000000000000000000000000000000000 <not found> (903, 3904605)
Completed repository check, errors found.
Of course! I did a "git pull && pip install -e .", I assume that should be sufficient for getting a new "binary".
Thanks, yes that works.
Okay, it seems the problem lies elsewhere, since it designates 906 as the transaction (>= 903).
Does running borg debug-get-obj <repository> 0000000000000000000000000000000000000000000000000000000000000000 some-file give you an error?
If it doesn't give you an error, can you strace it to see whether it accesses the 903 file and seeks to offset 3904605? (these should be very near the end of the strace)
It could look something like this: strace -e open,read,lseek borg ...
open("<repo>/data/0/380", O_RDONLY|O_CLOEXEC) = 4
lseek(4, 0, SEEK_CUR) = 0
lseek(4, 0, SEEK_CUR) = 0
lseek(4, 712321, SEEK_SET) = 712321
read(4, "\n~\302X\313\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 715
open("some-file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 5
lseek(5, 0, SEEK_CUR) = 0
object 0000000000000000000000000000000000000000000000000000000000000000 fetched.
How large is the output some-file?
Not sure I understand what we are doing now, to be more precise, the parameters for debug-get-obj, please correct me if some-file should have been something specific, I ran this:
BORG_PASSPHRASE='foo' borg debug-get-obj /data/borg/thor/MonThu.borg/ 0000000000000000000000000000000000000000000000000000000000000000 some-file
Output:
object 0000000000000000000000000000000000000000000000000000000000000000 fetched.
The borg binary used seems to be the one from your repo.
borg --version
borg 1.0.1.dev959+ng7fae010
I assume that we can say that it gave us no error.
Not sure I understand what we are doing now, to be more precise, the parameters for debug-get-obj, please correct me if some-file should have been something specific, I ran this
Thanks, looks good. debug-get-obj takes an object ID, fetches it from the repository and writes it to a file - the second parameter some-file.
Since borg-check reported that this object ID (00....0000 = Manifest) is missing from the repository we now checked whether the object is _actually_ missing. Which doesn't seem to be the case.
So, I'm not sure how this is happening:
So what remains as possible causes?
Hmm. Can you run find <repository>/data -type f | wc -l to see whether the number of segments in the directories matches the number of segments seen by check (Found 884 segments)?
Ah, I get the syntax now. I tried to look at some-file that I created, can't tell if it's garbage or not, it doesn't look like anything I recognize at least. But maybe it's still encrypted? Or maybe it's just a part of a file?
The number of files seems to match the number of segments.
find /data/borg/thor/MonThu.borg/data/ -type f | wc -l
884
Yes, it will still be encrypted. If the repo is not encrypted then one might spot an archive name here or there in the binary blob.
The number of files seems to match the number of segments.
find /data/borg/thor/MonThu.borg/data/ -type f | wc -l 884
I'm officially out of ideas for now as to the cause of this. Maybe I'll have some new ones later.
Thanks for the report and your help debugging this :)
It's me who owes you a thank you. I will see if I can manage to make a reproduction case on a smaller amount of data, and data that I don't mind distributing. It doesn't really make it easier when I sit on the data and you the knowledge.
Reproducing it is (too) easy. I can't say what's triggering it though, because it seems like some random factor.
Create a test folder
mkdir test && cd test
Init a repo, using borg version 1.0.7
borg init -e none foo.borg ; mkdir data
Create and backup some random data, using borg version 1.1.0-b1
for i in $(seq 1 3) ; do dd if=/dev/urandom of=data/$i bs=1k count=10k ; LD_LIBRARY_PATH=/foo /usr/local/bin/borgbackup-1.1.0-b1 create foo.borg::{now} data ; LD_LIBRARY_PATH=/foo /usr/local/bin/borgbackup-1.1.0-b1 check --info foo.borg ; echo ; done
After running the last command 2-5 times I get:
Index object count mismatch. 51 != 50 or similar
I have reproduced it several times now actually, on two systems. One using Debian Testing, the other using Ubuntu 16.10. I did not manage to reproduce the case when creating the borg-repo with version 1.1.0-b1.
I tried running LD_LIBRARY_PATH=/foo BORG_PASSPHRASE='foo' /usr/local/bin/borgbackup-1.1.0-b1 upgrade /data/borg/thor/MonThu.borg on one of my broken repos. That seemed to work, but a check failed even after that. Maybe it's too late for the upgrade. But if the versions aren't compatible I assume they should warn the user and reject the new backups.
I guess a solution for me would be to delete all repos I have and create new ones, but if I had several months of backup I wouldn't really be happy about that. In this actual case I don't really care since it's only 1-2 weeks of data. I can probably recreate them in 15-20 minutes as well.
They are (or should be) compatible, no upgrade is needed.
I'll try your reproduction instructions later.
I do get the issue even though the repo is created with 1.1.0-b1, for one of my servers. I can not tell why though.
I can't repo that on my Ubuntu though, I created over 500 backups without success. But as soon as I create the repo with 1.0.7 it takes me less than 10 backups every time.
I managed to reproduce it, the reason why it's happening with 1.0 repos but rarely with 1.1 repos is that an incorrect condition caused an inconsistency in the repository. That condition depends on moving objects around, which is more likely with the smaller segments used in 1.0 repos by default, while the larger segments in repos created by 1.1 are less likely to trigger the issue.
Thanks for all the details you provided and a reliable way of reproducing it, that helped a lot :)
If it only affects the manifest, then this is self-healing with the next create/delete/prune operation with a fixed version or 1.0.x. If it affects data chunks (which can only happen when stuff is deleted + bad luck, with just create it won't happen) then a manual repair / specifically tailored for this issue (not borg-check --repair) may be required.
Just to make clear: that corruption issue only affects master / 1.1.0b1 and not 1.0.x, right?
Yes.
It may be a good idea to avoid deleting stuff with 1.1.0b1, although I'm not sure if it is possible to recreate the constellation required to trigger the bug; I wasn't able to anyway.
To clarify: if an object is deleted+putted in the same session (or replaced), then there is a chance that an additional delete tag was written for that object after the last put. The index still points to the put, which is correct, while the log would say the object is deleted. Imho there is no operation other than the manifest updating that does this in Borg, and that is self-healing as described above, unless you lose the index (in which case --repair is needed to rebuild the manifest).
@enkore don't forget to claim the bounty.