Borg: recreate leaves emtpy segments behind

Created on 21 Jul 2017  路  27Comments  路  Source: borgbackup/borg

After running the recreate command with borg-linux64_1.1.0b6 there are a lot of emtpy segment files left in the data dir

ls -lh borg/data/0
[...]
-rw-rw-r-x 1 myuser localsamba 17 Jul 21 12:23 4338
-rw-rw-r-x 1 myuser localsamba 17 Jul 21 12:24 4340
-rw-rw-r-x 1 myuser localsamba 17 Jul 21 12:24 4342
-rw-rw-r-x 1 myuser localsamba 17 Jul 21 12:24 4344
-rw-rw-r-x 1 myuser localsamba 17 Jul 21 12:24 4346
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 13:31 4348
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 13:32 4356
-rw-rw-r-x 1 myuser localsamba 31M Jul 21 13:32 4357
[...]

Full paste:
https://pastebin.com/6y0Gxy0X

They stay there even after a new delete or create into the repository.
The only way for me to fix this was was to run borg check --repair
At the end of the repository check the "empty" 17 byte segments were deleted.

ls -lh

-rw-rw-r-x 1 myuser localsamba 201M Jul 21 10:55 4032
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 10:55 4033
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 10:55 4034
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 13:31 4348
-rw-rw-r-x 1 myuser localsamba 201M Jul 21 13:32 4356
-rw-rw-r-x 1 myuser localsamba 31M Jul 21 13:44 4360

I changed the max_segment_size = 209715200 in the config
not sure if this might be causing the problem

Bountysource bug repository

Most helpful comment

Great work, thanks! :) I'm looking forward to 1.2.

All 27 comments

iirc, they are not really empty, but contain a commit tag.

That's a limitation in the current compaction algorithm which doesn't track whether it has any data to commit. If a segment is completely superseded (as recreate may do), then a commit-only segment shows up.

Edit: This is completely false.

A followup on this one: I tested borg 1.1 RC1 and even normal backup leave empty segment files behind:

ls -lah /mnt/samba/Backup/mypc/borg/data/1
insgesamt 21G
drwxrwxr-x 2 user localsamba    0 Sep  8 22:42 .
drwxrwxr-x 2 user localsamba    0 Aug 30 22:39 ..
-rw-rw-r-x 1 user localsamba   17 Aug 30 22:39 10001
-rw-rw-r-x 1 user localsamba   17 Aug 30 22:39 10003
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:31 10005
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:31 10007
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:36 10009
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:36 10011
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:36 10013
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:36 10015
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:36 10017
-rw-rw-r-x 1 user localsamba 502M Aug 31 22:37 10018
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:37 10019
-rw-rw-r-x 1 user localsamba 505M Aug 31 22:37 10020
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:37 10021
-rw-rw-r-x 1 user localsamba 501M Aug 31 22:37 10022
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:37 10023
-rw-rw-r-x 1 user localsamba 502M Aug 31 22:37 10024
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:37 10025
-rw-rw-r-x 1 user localsamba 330M Aug 31 22:37 10026
-rw-rw-r-x 1 user localsamba   17 Aug 31 22:37 10027
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:31 10029
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:31 10031
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10033
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10035
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10037
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10039
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10041
-rw-rw-r-x 1 user localsamba   17 Sep  1 22:36 10043
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:31 10045
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:31 10047
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:36 10049
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:36 10051
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:36 10053
-rw-rw-r-x 1 user localsamba   17 Sep  2 22:36 10055
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:31 10057
-rw-rw-r-x 1 user localsamba 130M Sep  3 22:31 10058
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:31 10059
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10061
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10063
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10065
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10067
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10069
-rw-rw-r-x 1 user localsamba   17 Sep  3 22:36 10071
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:31 10073
-rw-rw-r-x 1 user localsamba 139M Sep  4 22:32 10074
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:32 10075
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10077
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10079
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10081
-rw-rw-r-x 1 user localsamba 185M Sep  4 22:37 10082
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10083
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10085
-rw-rw-r-x 1 user localsamba   17 Sep  4 22:37 10087
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:31 10089
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:31 10091
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:36 10093
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:36 10095
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:37 10097
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:37 10099
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:37 10101
-rw-rw-r-x 1 user localsamba 501M Sep  5 22:39 10102
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:39 10103
-rw-rw-r-x 1 user localsamba 504M Sep  5 22:39 10104
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:39 10105
-rw-rw-r-x 1 user localsamba 502M Sep  5 22:39 10106
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:39 10107
-rw-rw-r-x 1 user localsamba 503M Sep  5 22:40 10108
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10109
-rw-rw-r-x 1 user localsamba 501M Sep  5 22:40 10110
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10111
-rw-rw-r-x 1 user localsamba 504M Sep  5 22:40 10112
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10113
-rw-rw-r-x 1 user localsamba 501M Sep  5 22:40 10114
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10115
-rw-rw-r-x 1 user localsamba 501M Sep  5 22:40 10116
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10117
-rw-rw-r-x 1 user localsamba   17 Sep  5 22:40 10119
-rw-rw-r-x 1 user localsamba 217M Sep  6 22:32 10120
-rw-rw-r-x 1 user localsamba   17 Sep  6 22:32 10121
-rw-rw-r-x 1 user localsamba   17 Sep  6 22:32 10123
-rw-rw-r-x 1 user localsamba 200M Sep  7 22:31 10124
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:31 10125
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:31 10127
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:31 10128
-rw-rw-r-x 1 user localsamba 502M Sep  7 22:32 10129
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:32 10130
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:32 10131
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:32 10132
-rw-rw-r-x 1 user localsamba 503M Sep  7 22:33 10133
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:33 10134
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:33 10135
-rw-rw-r-x 1 user localsamba 503M Sep  7 22:34 10136
-rw-rw-r-x 1 user localsamba 503M Sep  7 22:34 10137
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:34 10138
-rw-rw-r-x 1 user localsamba 502M Sep  7 22:35 10139
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:35 10140
-rw-rw-r-x 1 user localsamba 502M Sep  7 22:35 10141
-rw-rw-r-x 1 user localsamba 502M Sep  7 22:36 10142
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:36 10143
-rw-rw-r-x 1 user localsamba 501M Sep  7 22:38 10144
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10146
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10148
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10150
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10152
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10154
-rw-rw-r-x 1 user localsamba   17 Sep  7 22:38 10156
-rw-rw-r-x 1 user localsamba 362M Sep  8 22:35 10157
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:35 10158
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:35 10160
-rw-rw-r-x 1 user localsamba 504M Sep  8 22:35 10161
-rw-rw-r-x 1 user localsamba 505M Sep  8 22:35 10162
-rw-rw-r-x 1 user localsamba 502M Sep  8 22:36 10163
-rw-rw-r-x 1 user localsamba 505M Sep  8 22:36 10164
-rw-rw-r-x 1 user localsamba 502M Sep  8 22:36 10165
-rw-rw-r-x 1 user localsamba 503M Sep  8 22:37 10166
-rw-rw-r-x 1 user localsamba 504M Sep  8 22:37 10167
-rw-rw-r-x 1 user localsamba 505M Sep  8 22:38 10168
-rw-rw-r-x 1 user localsamba 501M Sep  8 22:39 10169
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:40 10171
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:40 10173
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:41 10175
-rw-rw-r-x 1 user localsamba 194M Sep  8 22:41 10176
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:41 10177
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:41 10179
-rw-rw-r-x 1 user localsamba 2,3K Sep  8 22:41 10180
-rw-rw-r-x 1 user localsamba   17 Sep  8 22:41 10181

Looking at the whole Repository its 113 Gb with a 500Mb segments so you expect it to have around 68 segment files however there are 700 in total in one month of backup.

I could imagine with a lower segment size and bigger throughoutput this to be growing into million of files. Which might be enough for filesystems to handle but not for other usecases e.g. syncing to cloud storage.

I think this should be a top priority.

@enkore did you already find the root cause? if so, what is it?

1088, commit entries do not add to compact or segments; the fix is easy but of course renumbers segments, so basically all repository tests need to be checked and adjusted. Didn't feel like doing that.

Just move it to 1.1.x milestone?

After reading https://github.com/borgbackup/borg/issues/1088 I get the feeling that this is a won't fix?

I have ~ 980 17 byte segments out of a total of 6887 segments (14,2%), it seems like a lot of tiny files with little purpose..

Edit: This is just from creating regular backups with borg create.

1088 is a comment from over one year ago before there was any experience how it would work out over time. A fix has been proposed but needs adjustments to the tests to get merged.

would a bounty help with this?

would a bounty help with this?

I'm willing to help with 20$. Currently I have 28k 17 byte files out of 34k files (80%).

@klausenbusk

I'm willing to help with 20$. Currently I have 28k 17 byte files out of 34k files (80%).

Bounties here: https://www.bountysource.com/issues/47467654-recreate-leaves-emtpy-segments-behind

Bounties here: https://www.bountysource.com/issues/47467654-recreate-leaves-emtpy-segments-behind

Great, I have added a 20$ bounty. @ThomasWaldmann would you mind adding the Bountysource label?

On a new repo just created with 1.1.4 and 18 archives 20% are 17 byte files:

# find -type f | wc -l
3227
# find -type f -size 17c | wc -l
631

On a old repo created with 1.0 and 3073 archives 11% are 17 byte files:

# find -type f | wc -l
277395
# find -type f -size 17c | wc -l
30091

Great, I have added a 20$ bounty. @ThomasWaldmann would you mind adding the Bountysource label?

A quick update. I now have 120k small 17 byte files (95% of the files/doing backup every two hours)

On a new repo just created with 1.1.4 and 18 archives 20% are 17 byte files:

This mine 1.1 created repo is now at over 50%:

# find -type f | wc -l
6464
# find -type f -size 17c | wc -l
3251

fixed by PR #3970.

@enkore if you like, claim for finding/fixing the bug, I'll claim for finishing the work.

https://www.bountysource.com/issues/47467654-recreate-leaves-emtpy-segments-behind

Great work, thanks! :) I'm looking forward to 1.2.

With borg 1.2 and borg compact --cleanup-commits, you will be able to clean up all those little 17byte files (which are commit-only segments).

Careful testing on a non-production repo (or a copy of a prod repo) are welcome.

I am on Borg 1.1.7 and have accumulated 100s of 17 byte commit files. After reading this discussion, I used check --repair to eliminate all but the last commits created by the last backup. Subsequent backups replace/added new commit files.
I then copied the repo to a temp directory, deleted all but the latest commit files and was able to access archives (old and newest) and match them with the source of the backup using rsync with no errors.
Is this a simple, effective way to eliminate unneeded files or am I risking disaster?
Edit: Later, after further experiments, the only 17 byte file necessary to access the repo and its achives is the last one, numbered the same as the hints/index/integrity.# files. Am I missing something?

I'ld rather not suggest manually deleting files from a repo. If somebody is not as careful as you and double-checks everything, there could be easily some damage.

borg 1.2 will have a cleanup command, just wait for it.

Still waiting for 1.2 to clean up those 17 byte commit files, so I devised this solution that deletes all but the last commit file (highest number), the only one needed for repo integrity:
find /path/to/repo/ -type f -size 17c | sort -r | xargs grep -v | xargs rm -f
Before trying, please backup your repo with rsync, then test with borg check -v --verify-data /path/to/backuprepo
I have added this to my backup script.

I don't think I can recommend deleting files from a repo using a shell script.

So please wait for 1.2 and use the cleanup command it has for that.

I have a repo where, after some pruning, 112071 out of 112133 (99.94%) of segments are 17 bytes in size.
With 1.2 still not being released, I'd rather stick with 1.1 for now, but I need to clean this up at least manually for now.
Is it safe to assume that all but the highest-numbered 17-byte file can be removed?

Note that the shell one-liner in https://github.com/borgbackup/borg/issues/2850#issuecomment-484299671 would also delete the last one, if you have more than 1000 segments, because it sorts them wrongly!

This might be more appropriate:
find -type f -size 17c | cut -d/ -f4 | sort -n | head -n-1 | while read id; do echo rm data/$(($id / 1000))/$id; done
(echo added there to generate a script that you should do some manual sanity checks on before running it)

I'd suggest to download the latest 1.2 test binary and run it like:
borg compact -v --show-rc --show-version --cleanup-commits repo
I am doing it once a week, while still using 1.1 for all other regular operations.

I have been using 1.1 and the script mentioned above since April without incurring any errors. My repos are tested regularly by borg check -v verify-data and by mounting the repo and rsyncing with the backed-up data.
The reason I want to eliminate many thousands of accumulated (useless) files is that I sync my repos to other devices using Syncthing, which scans and indexes each file. Useless 17-byte files simple create unnecessary overhead, bloating the database.
The only 17-byte file necessary to access the repo and its achives is the last one, numbered the same as the hints/index/integrity.# files.
BTW, my backup script stops ST before running borg backup and the cleanup script, then restarts ST and exits. Job done.

@TheSeven you are correct. My script (# 2850) failed at commit 99999/10000 boundary and deleted the last 17b commit file as you said it would. I recovered my repo using borg --repair, checked the repo with borg check --verify-data, and all is ok again. My subsequent backup succeeded using my script, and the commit numbers rolled on passed the 10000 mark. (The highest number 17b commit file is now 10058.) I guess the next problem will occur at 100000. Could you explain why my sort gives the wrong answer at 999/1000 etc?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

qknight picture qknight  路  6Comments

htho picture htho  路  5Comments

rugk picture rugk  路  4Comments

unlandm picture unlandm  路  4Comments

chebee7i picture chebee7i  路  5Comments