Borg: --one-file-system check is applied too late

Created on 25 Feb 2018 · 19Comments · Source: borgbackup/borg

I have an autofs mountpoint at /mnt/remote-pc, which when activated mounts a CIFS share from remote-pc.
I run borg create --one-file-system [...] /.

As long as the remote machine is up, everything works fine. /mnt/remote-pc is not backed up and borg exits with status 0.

But if remote-pc is down, the underlying CIFS mount fails and returns e.g. "No such device" (ENODEV). The result is that borg exits with status 1 and my backup fails. (I consider any backup that exits with status != 0 as failed.)

I'm hoping borg can learn to do the --one-file-system check earlier, to not activate autofs mountpoints.

Workaround: Add each autofs (or other fs that may fail) mountpoints to --exclude list.

(Borg version 1.1.4 on NixOS.)

Source

bjornfor

👍1

Most helpful comment

Shouldn't this be added to the FAQ?

milkey-mouse on 11 Mar 2018

👍3

All 19 comments

Please read the docs about what rc 1 means.

Is your backup archive incomplete (like not having files that are accessible and on same filesystem)?

ThomasWaldmann on 26 Feb 2018

Return code 1 means warning (operation reached its normal end, but there were warnings – you should check the log, logged as WARNING).

In this case, my backup archive is complete. But what if there were e.g. permission errors causing the backup to be incomplete? Those are also rc 1. That's why I don't dare make rc 1 be "success" in the general case.

I'd like to automate away the need to look at the backup log every time the remote system is down. Currently that means adding --exclude. I made this issue because it can be argued that --one-file-system should be able to handle it.

bjornfor on 26 Feb 2018

Not sure what the problem is. If you run borg as normal user (not root), you will usually just backup that user's files and there won't be permission problems. If you intend to backup other users files or just everything, you will run borg as root and there won't be permissions problems either.

If you fail to do it like that, borg will warn you about it and tell you to read the logs (via rc 1) and you will find the problem there and be able to fix it.

In your case, maybe running 2 separate backups could be helpful:
one for the primary system (and exclude the autofs directory) and one for the remote-pc.
then you will have the warning/failure only in the log of the system that really had an issue.

Alternatively, catch rc 1 and do post-processing on the log, deciding which warnings you want to ignore and which not.

borg does the one-file-system check rather early (iirc it is just from the very first stat() result: when the device number switches to a different one, then it is considered to be not on same fs and would be skipped). but I guess that stat() is enough to trigger your automounter and if the remote-pc is down, the stat() will fail.

any failure in a single source fs operation will be dealt with by logging a warning, setting rc to 1, skipping that fs item and continuing with the next fs item (to backup as much as possible). so, this is not a real failure (rc 2, like a fatal exception that kills borg, so it does not reach the normal end of the backup), but "just" a warning and rc 1.

warnings can be harmless (like if you do not care that your remote pc does not get backed up sometimes, if it is off) or not (maybe you really need your remote pc to be backed up). borg can not decide that and that's why there is an extra rc for that.

so, i don't see anything right now that we could improve here.

ThomasWaldmann on 26 Feb 2018

Yes, there are various ways to work around this. I find the easiest / safest for now is adding "--exclude mountpoint" for each remote filesystem I have, to prevent unwanted warnings and exit != 0. However, I thought the whole point of --one-file-system was to save me of such manual work.

so, i don't see anything right now that we could improve here.

Perhaps it's the system APIs make this difficult.

So stat() fails with ENODEV on an autofs mountpoint when the underlying mount fails. Ideally, there is a way to detect that this is a different filesystem, and thus not having to not emit warnings (since --one-file-system implies ignoring other filesystems).

If this turns out to be "impossible", then fine, the issue can be closed. But I'm a bit confused that you don't understand why I made this issue in the first place.

bjornfor on 27 Feb 2018

I'm not an expert on autofs, but I think this is impossible.

Until a program tries to access /mountpoint and autofs kicks in, there is no other filesystem. When you list the parent directory, the mountpoint within is just an ordinary directory.

Or, to put it another way, how is borg supposed to tell if /mountpoint is on another filesystem without statting it? And if statting it gives an error, borg correctly reports it.

jdchristensen on 27 Feb 2018

How about pre-populating the internal --exclude list with the output of (pseudo code) mount --types autofs?

bjornfor on 27 Feb 2018

Won't that only list filesystems that are already mounted? Your issue concerns filesystems that aren't yet mounted, but which autofs tries to mount when you access them. Aside from parsing the autofs config file, I'm not sure if there is a way for borg to know about them.

jdchristensen on 27 Feb 2018

On my system, where systemd is in charge of automounts, autofs is always mounted. When a specific mountpoint is trigged, the real fs is mounted on top. When the real fs disappears, the original autofs is still there. I think it can work.

bjornfor on 27 Feb 2018

Ok, I guess I was misremembering how autofs works, as it's been a while since I used it. I'll leave this to the borg developers to think about. It's probably not easy to handle inside borg, so probably the best thing is to just manually exclude the mountpoints.

jdchristensen on 27 Feb 2018

I'm adding this to my borg backup script:

# Prevent borg from looking at autofs mountpoints. (Because if the
# underlying filesystem is not mounted, stat() returns ENODEV, borg logs
# it with warning and exit with status 1. Even with --one-file-system.)
autofs_excludes=$(cat /proc/mounts | while read src mountpoint fstype rest; do
    test "$fstype" = autofs && printf "%s %q\n" --exclude "$mountpoint"; done)

borg create --one-file-system $autofs_excludes [...]

EDIT: Fix quoting with printf %q.

bjornfor on 27 Feb 2018

@bjornfor guess your scripted solution is as good as it gets if we do not want to add platform / other software specific code to borg.

We do not only support Linux, but also BSD, OS X, OpenIndiana, Cygwin, Win 10 Linux subsystem.

ThomasWaldmann on 28 Feb 2018

Shouldn't this be added to the FAQ?

milkey-mouse on 11 Mar 2018

👍3

Well, until now, there was only one asking that, so not a FAQ yet.

If people having the same issue find this ticket, just vote it up, so we see the "F". ;)

ThomasWaldmann on 11 Mar 2018

It's not only autofs that triggers this. I mounted a FUSE filesystem in $HOME/mnt and now my system wide borg backup exits with status 1 because it has no permission to read the mountpoint. Yes, permission denied even when borg is run as root.

It can be worked around, like before, by adding "--exclude mountpoint" options to borg, or mounting FUSE with "-o allow_other" or "-o allow_root". I find neither satisfactory.

bjornfor on 13 Jul 2018

👍1

It's not only autofs that triggers this. I mounted a FUSE filesystem in $HOME/mnt and now my system wide borg backup exits with status 1 because it has no permission to read the mountpoint. Yes, permission denied even when borg is run as root.

It can be worked around, like before, by adding "--exclude mountpoint" options to borg, or mounting FUSE with "-o allow_other" or "-o allow_root". I find neither satisfactory.

Same here. Mounting a single directory in my $HOME with sshfs or afuse breaks my borg scripts completely. It's particularly disturbing because, in my case, it means it stops backups from being purged, because I rely on the status code of borg to continue with the purging after the backup. It seems logical, after all, that backups are not purged if the backup fails (I think?).

This also implies that a regular user can mark an entire backup run as failing, even if it runs as root. That seems unreasonable. Also note that --exclude didn't quite work until #3209 was fixed, so it's not a complete workaround for everyone.

More generally, I've found it difficult to rely on borg status codes. It happens quite often that files would disappear on the filesystem during the backup, which would make the backup "fail". Lack of snapshotting support in Linux doesn't help here either...