Sometimes the ability to not only exclude patterns but also explicitly specify includes can be useful.
I propose a new optional pattern prefix '+' that may be used with --exclude or in --exclude-from files to define include-patterns. (Just like include/exclude pattern rules in rsync.)
These patterns may be used to explicitly include files in a backup that would otherwise be excluded by a following pattern.
Some use cases:
--exclude +/data/docs/pdf --exclude /data/docs/
(maybe --include /data/docs/pdf should be introduced to improve readability.)
global.inclexcl:
/tmp
*.tmp
/scratch
local.inclexcl:
+/scratch
/home/dir_to_exclude
$ borg create --exclude-from local.inclexcl --exclude-from global.inclexl backup /
equivalent:
global.inclexcl:
-sh:/tmp
-fm:*.tmp
-fm:/scratch
local.inclexcl:
+fm:/scratch
-/home/dir_to_exclude
(Actually the same approach is already used by the --keep-tag-files option.)
Filenames can start with + though.
Filenames can start with + though.
Use -+file or pm:+file to escape and exclude a file named +file.
It's just like you have to escape files named xx:file:
Explicit style selection is necessary when a non-default style is desired or when the desired pattern starts with two alphanumeric characters followed by a colon (i.e. aa:something/*).
@leo-b It still would be a breaking change, although probably nobody's excluding a file starting with +.
@PlasmaPower I agree it should be noted in the release notes for academic purposes... ;-)
@leo-b don't you find giving include patterns with --exclude weird somehow?
That weirdness is shared with some other programs as well (.*ignore files are notorious for this).
I'm not sure if this is really needed. It seems to me like a fairly rare use case and since both exclude patterns and path patterns support regexpes the functionality is already there (although more complex to use).
@ThomasWaldmann as suggested: we could introduce an --include option to make it more intuitively.
Of course you can use regex negative lookahead assertions to except files from exclusion but those regexes will become really complicated and unreadable.
Besides my second usecase (deploying a global default exlude-list with the ability to override it with a per-client filter-file) wouldn't be possible with existing functionality.
Other tools with include/exclude syntax:
... and probably many more. :-)
Just curios if I can include certain file types right now, aka run borg like this:
/usr/local/sbin/borg create -s -C zlib,6 -v $REPOSITORY::hostname-date +%Y-%m-%d/var/lib/vz/dump/vzdump-lxc-102-*
I'm interesting if this can work:
vzdump-lxc-102-*
_Unquoted_ star-patterns and other glob patterns like in your example are interpreted by the shell
@enkore thanks. is there some more explanation somewhere? couldn't find anything in the docs: https://borgbackup.readthedocs.io but maybe I am searching for the wrong terms?
These are documented in the shell manual e.g.
http://wiki.bash-hackers.org/syntax/pattern
https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html
http://zsh.sourceforge.net/Intro/intro_2.html
i also vote for this feature. another possible syntax is not to make -e more complicated but add another one -i / --include (or similar) with same patterns semantic as -e and let them be written in order:
-e ~/docs # exclude docs \
-i ~/docs/pdf # but retain pdfs \
-i ~/docs/pictures # retain also pictures \
-e ~/docs/pdf/useless-pdf # but exclude useless pdfs
then processing should be easy. just go from bottom to top and do first action that matches
another possibility would be to allow delegation of file selection to external program. like taking them from pipe or doing something similar to find -exec
I think having include / exclude rules would be nice to have (we somehow have them in a limited way via borg create repo:arch path1 path2... (pathX is a include) and --exclude(-from), but everything could be prettier and more regular somehow).
What I'ld find awkward is to have include rules via --exclude or --exclude-from.
Or patterns in such a file being exclude by default and include if prefixed by "+".
So, to unblock this, we need to make up something that doesn't have these weirdnesses:
--xxx and --xxx-from (find some reasonable word for xxx) that refers to patterns that can be include or exclude (or maybe even other things by using more prefixes).--xxx -foo would mean exclude foo, --xxx +foo would mean include foo--xxx-from FILE would contain lines like:# comment
+ includepattern
- excludepattern
X anything else than + and - can be used for future extension
then my spec proposition is:
foo can be rule, pattern, path, filter-rule, selector, selection (filter is already taken)
--foo and --foo-from. every entry has to be prefixed with + or -, then pattern (in same format as for exclusions), that is:
--foo +~/Documents \
--foo -'re:/home/user/*.log'
when used together with --exclude, --foo is treated as added 'later', that means, has higher priority, therefore can be used to fine-tune all currently existing commands with --exclude without a need to rewrite them completely
when read from file, space is allowed between +/- and the pattern (is it possible also with command line?):
+ ~/Documents
- 're:/home/user/*.log'
(i never used exclusions from file so i don't know if quoting rules are the same)
order of patterns is important. later patterns are more important (have higher priority) and override the previous ones (lower priority). when combining different ways of providing pattern, the order is as follows:
--exclude-from and --exclude (lowest priority)--foo-from--foo (highest priority; will override previous foos and exclusions)-e and --exclude become deprecated
Another thing that comes to mind: maybe borg create should be able to work without pathN commandline args if just everything is given in that --xxx-from file?
may be a bit misleading. what should be backed up if there is only - /tmp? all except tmp? so we implicitly add + /. but what if there is only + ~/Documents? should we also implicitly add + /? of course not. so we require to have + ... as a first param? but what if we get + 're:(!(/tmp|/proc))'? should we also implicitly add + /?
i think some clear starting points are needed. i'm not saying it won't work without explicit path, i'm just saying it seems to be much more complicated to design it in a way that is clear for users.
It's not misleading, it is just a logical conclusion if one goes away from that file being an _exclude_ file (not sure if _filter_ is much better), but also having includes now.
In general, borg does not hold you back from doing stuff that does not make sense, so let's not consider cases that just do not make sense.
So, assume that we start from nothing. Nothing included, nothing excluded, no paths on borg commandline:
borg create repo::arch --xxx-from filename
filename contents:
+ /
- /dev
- /proc
- /sys
- /home # we hate all users ;)
+ /home/love1 # except that one
+ /home/love2 # except that one
- /home/*/.cache # can be rebuild
...
I find this quite straightforward. Also, one has all in one place.
As each line is a pattern, maybe --patterns-from FILE and --pattern PAT?
you're right, starting from nothing is essentially adding - / at the beginning and is pretty clear.
i don't know how it's implemented but what will borg do if it gets only + 're:...'? will it scan entire folder structure starting from /? including /proc, /run etc? is it efficient enough? if it's not a problem, then yeah, seems like the spec is complete
Well, we do not have include patterns yet, so that never was a problem yet.
It does not only happen for the initial include pattern, it's logically the same if you need to consider such includes "below" excluded paths.
so i considered path arguments as starting point. i assume currently only path directories are scanned and matched against exclude patterns. i was thinking about keeping same rule with inclusions. that is, if we have:
borg create ... \
/a \
/b \
--pattern +/c
then folder /c would NOT be included because it's not inside any of the starting points (/a and /b). this actually makes complexity linear to what user wants to back up. otherwise you're right: full scan would be always needed
Maybe we could compute the starting points ("roots") from the given patterns (not from regexes ofc).
For these starting points it would do a full recursive descent.
i think every time we encounter + 're:...'we have to take / as root. simple case:
+ ~/Documents
- ~/Documents/pdf
+ 're:/home/user/Documents/pdf/(a|b)'
borg will have to analyse regex and expand shell variables like ~ to know that root is ~/Documents. looks like huuuge task. to point roots we can add third marker r in addition to + and - but that's essentially same as path, just in different place
How about (when searching for roots):
won't work because:
+ 're:/logs-2016-..-..'
but you can tell users that only patterns like + simple-path will be considered roots. for me it would be bad UX because it makes some functionality implicit. i would advice to require users to provide roots explicitly. in any form you prefer (param path or pattern r) but explicit
or you can add option to provide roots explicitly and infer roots if they are not provided using above algorithm (flawed in general but probably working in most practical cases)
I like the naming --pattern and --patterns-from .
Most tools (rsync, rdiff-backup, duplicity) use this algorithm:
I think we should adopt this logic, since users are accustomed to it.
So your (@ThomasWaldmann ) example:
+ / [...] - /home # we hate all users ;) + /home/love1 # except that one + /home/love2 # except that one - /home/*/.cache # can be rebuild ...
... would look like that:
+ /
[...]
- /home/*/.cache # can be rebuild ...
+ /home/love1 # include that directory
+ /home/love1/* # ... and its contents
+ /home/love2 # include that directory
+ /home/love2/* # ... and its contents
- /home/* # but exclude all other directorys under /home (not /home itself!)
I think trying to detect if a pattern is a prefix-pattern of another could be rather complicated and error-prone. So in my opinion an explicit marker that defines roots would be preferable.
As an alternative a simple rule that would be easy to implement and to understand by users could be: Use every include pattern up to the first exclude pattern as roots.
@leo-b if first matching pattern wins, then you will back up your entire system because you have + / as a first pattern
and after you fix it (move root to the end), we're talking about exactly same format - just reversed
@piotrturski You are right. The implicit appending of a * is to blame for that:
Because + / ends with a slash, it is internally translated to + fm:/*.
Unfortunately this makes it impossible to add only the root directory (without its subdirs) using the _fm:_ syntax. (+ /. would probably work but this is rather unintuitive.)
Maybe the implicit addition of the star should be eliminated?
Another reason why selecting the root should not be done implicitly via include patterns. ;-)
# bla -> comment
r[oot] path -> root def (path = really only a path, no patterns)
+ pattern, i[nclude] pattern
- pattern, e[xclude] pattern
Empty lines ain't do nothing.
Example
# Backup user homes, but no caches and no Downloads
root /home
exclude /home/*/.cache
exclude /home/*/Downloads
Thomas' example
root /
[...]
# can be rebuild
- /home/*/.cache
# they're downloads for a reason
- /home/*/Downloads
# susan is a nice person
+ /home/susan
# he's a cat, so backup his files
+ /home/richard
# all others can go to hell
- /home/*
So what are the semantics of root?
root is really just [PATH [PATHs...]] from borg-create.
looks good. two things should be clarified:
suggestion @enkore : change
By default all files are considered
to:
"By default all files under roots are considered"
And what will happen if there is multiple root lines? will it be error? will we sum them? will patterns under one root be separated from patterns under another?
@enkore Your root proposal looks good to me. :-)
However, I think your example needs two small refinements..
root / [...] # can be rebuild - /home/*/.cache # they're downloads for a reason - /home/*/Downloads # susan is a nice person + /home/susan # he's a cat, so backup his files + /home/richard # all others can go to hell - /home
The last line - /home would match and exclude the home directory itself, so borg wouldn't even traverse into the directory. Thus the upper lines won't ever be used. You could use - /home/ (or equivalently - /home/*) instead.
Besides, + /home/susan would include the directory but not its contents. So an additional + /home/susan/ would be necessary, otherwise - /home/* would match inside susans home.
Thus I'd suggest:
root /
[...]
# can be rebuild
- /home/*/.cache
# they're downloads for a reason
- /home/*/Downloads
# susan is a nice person
+ /home/susan
+ /home/susan/*
# he's a cat, so backup his files
+ /home/richard
+ /home/richard/*
# all others can go to hell
- /home/*
By default all files are considered
to:
"By default all files under roots are considered"
Kind of implied, but yes.
And what will happen if there is multiple root lines? will it be error? will we sum them? will patterns under one root be separated from patterns under another?
I'd say that only patterns declared after a root occurred are considered for a root.
Whether in this:
root a
- pat1
- pat2
root b
- pat3
- pat4
pat3, pat4 also apply to root a can be debated either way.
The last line - /home would match and exclude the home directory itself, so borg wouldn't even traverse into the directory. Thus the upper lines won't ever be used.
Yep, has to be '- /home/*` or similar.
Besides, + /home/susan would include the directory but not its contents. So an additional + /home/susan/ would be necessary, otherwise - /home/* would match inside susans home.
In fnmatch there shouldn't be a difference between these two variants. As soon as a directory is included it's contents are recursively included as well as long as no negative qualifier (--one-file-system, excludes) apply.
Besides, + /home/susan would include the directory but not its contents. So an additional + /home/susan/ would be necessary, otherwise - /home/* would match inside susans home.
In fnmatch there shouldn't be a difference between these two variants. As soon as a directory is included it's contents are recursively included as well as long as no negative qualifier (--one-file-system, excludes) apply.
Yep. But - /home/* would match its contents, so it would be excluded.
Why would /home/* match anything like /home/xyz/1234?
Because when using _fnmatch_ (which is the default selector), * matches a path separator.
If you'd use - sh:/home/*, it wouldn't match.
Oh well. That's kinda stupid. Dangerous, even. (--exclude /something/* and --exclude '/something/*' mean wildly different things)
So we'd have to do
...
- sh:/home/*
So I'd say that the default for these pattern-file-thingies should be sh, not fnmatch, because fnmatch is a stupid default for anything new.
Agreed.
(Though it's confusing to have different defaults for --exclude(-from) and --pattern(s-from).)
Yes. I guess fnmatch was modelled to behave like gitignore, which is somewhat similar, allowing things like *.crap matching globally, while allowing patterns with a prefix (although weird, shell-unlike).
another option would be to just drop backward compatibility and design it in a way that doesn't confuses anyone. then we would be able to just take existing filtering scheme from other backup tools. i just don't know if it's an acceptable solution
Hm.
Not sure if it's a good idea, but how 'bout making the default behave like:
I don't really see anyone writing /home/*/junk and expecting that /home/user/subdir/junk matches that.
Or what @piotrturski says and use an existing convention, sanity assumed.
i would advise to do as little implicit actions as possible. if someone is doing real backup, he creates a script for it. it's really no problem to for a user to add sh: or fn: It's much less error prone then changing wildcard meaning in the pattern based on other part of that pattern
That behaviour is relatively similar to rsync's pattern matching (not sure if that's a good thing -- the rules are fairly complex _INCLUDE/EXCLUDE PATTERN RULES_), which generally seems to "do what you expect".
The whole mechanism in rsync is pretty similar to what we are discussing here. Ie it has the "roots" specified from the command line, by default everything would be considered, and the include/exclude patterns refine on that.
So I'm wondering if we want to make that compatible. There are these types of rules in rsync:
exclude, - specifies an exclude pattern.
include, + specifies an include pattern.
merge, . specifies a merge-file to read for more rules.
dir-merge, : specifies a per-directory merge-file.
hide, H specifies a pattern for hiding files from the transfer.
show, S files that match the pattern are not hidden.
protect, P specifies a pattern for protecting files from deletion.
risk, R files that match the pattern are not protected.
clear, ! clears the current include/exclude list (takes no arg)
merge, dir-merge would raise an error (not supported). Hide, show, protect, risk are no-op. We would add root directive.
Risk: maybe rsync adds some stuff in the future (likelyhood?).
Maybe there are some rant-blog-posts about this? Would help to evaluate sucks-y-ness...
So, borgsync-style would look like this (assuming --one-file-system)
root /
- *.o
- *.S
- *.someKindOfFileIdontlike
# can be rebuild
- /home/*/.cache
- /home/**/.LocalProjectCADcache
# they're downloads for a reason
- /home/*/Downloads
# susan is a nice person
+ /home/susan
# he's a cat, so backup his files
+ /home/richard
# all others can go to hell
- /home/*
However, this is completely different from anything in Borg, and doesn't really work with the existing xyz: stuff. OTOH could call this kind of pattern rsync: and make it the new default.
Risk: maybe rsync adds some stuff in the future (likelyhood?).
it's not a risk. borg won't use rsync's lib to do parsing/matching, right? so even when they change their syntax, it won't afect borg in any way. if you write in a doc that borg uses same syntax as rsync, just add rsync version and don't name the prefix rsync
this is completely different from anything in Borg, and doesn't really work with the existing xyz: stuff. OTOH could call this kind of pattern
rsync:
exactly. make it another prefix. but remember this is completely separate task from exclude. i suggest starting from the exclude because it gives users much more powerful tool. and improving selectors' expressiveness may be a long iterative process (with users' feedback)
I'd vote for keeping it rather simple and implement only rsyncs include/exclude pattern rules without modifiers and without any merge-filter rules.
IMHO the default (at least for the new --pattern and --patterns-from options) should be sh: syntax. If possible, I'd change the default for the other patterns too, for consistency.
I've added a new pull request #1971 based on our discussion:
Two new options --pattern and --patterns-from instead of modifying the behavior of --exclude and --exclude-from
The new options default to shell-style (sh:) patterns. (Much cleaner inclusion of subdirs inside excluded dirs. Besides this leads to a behavior similar to rsync include/exclude patterns.)
allow configurations of additional roots with --pattern and --patterns-from using a syntax like:
R /path
guess this should be closed?
Closed, since follow up PR #1971 was already merged.
Most helpful comment
I've added a new pull request #1971 based on our discussion:
Two new options
--patternand--patterns-frominstead of modifying the behavior of--excludeand--exclude-fromThe new options default to shell-style (
sh:) patterns. (Much cleaner inclusion of subdirs inside excluded dirs. Besides this leads to a behavior similar to rsync include/exclude patterns.)allow configurations of additional roots with
--patternand--patterns-fromusing a syntax like: