With the new convention to use the capitalized version of a short flag to indicate the opposite it's too bad that -E is already used to mean --encoding, as I would like to suggest an "inverse pattern" mode where only lines/words (depending on other parameters as normal) matching pattern e but not matching pattern E are included in the result set.
Andrew, I know you are loathe to add more ! support but given the pre-existing -E, perhaps a -e !PATTERN?
The name of the flag is really not the interesting part of this feature request. The interesting part is the request to support more sophisticated boolean tests.
I think if we were to decide to do this, then it needs to be part of a larger story that encompasses more sophisticated expressions. We also need to address the fact that, today, we can actually express quite a bit, but it requires piping. Namely, piping permits expressing "and". Piping plus the -v flag permits any arbitrary boolean expression you might want. For example, rg foo | rg -v bar says "show lines matching foo but do not contain bar," which is exactly your feature request.
git grep has support for this via -not, -and and -or. I don't know if I'm willing to add this to ripgrep. There must be a point at which we say, "piping is good enough."
An alternative way to implement this feature is in the regex engine itself (since intersection and complement are available as operations on regular languages), but this is extremely non-trivial to do.
I try not to speak in absolutes, but, "I don't want to add anything else that uses ! in a shell" is as close to an absolute that I can get. Let's drop that idea.
I understand completely. I currently pipe (to grep, I didn't realize I could pipe to rg itself!) but was wondering from a performance perspective basically about using the regex engine itself to optimize the search with the additional boolean constraints.
Thanks.
but was wondering from a performance perspective basically about using the regex engine itself to optimize the search with the additional boolean constraints.
Well, the "best" way is to, as I hinted at, build complement and intersection into the regex engine. But as I said, this is extremely non-trivial to do efficiently. If we were to implement this, then we'd need an algorithm that selects the (attempted) optimal matching path given all of the boolean conditions. e.g., if you said "x and not y and not z," then ripgrep would search for x and only apply the y and z blacklist on matches to filter them out. If you had x or y or z, then ripgrep would, as it does today, combine them into one regex joined by |. If you had not x and not y and not z, then ripgrep behave as it would today if you ran rg -v x and then use the y and z blacklists to filter our matches. If you had not x or not y or not z, then ripgrep could behave as it does today if you ran rg -v 'x|y|z'. And so on...
It is plausible that this would result in a performance improvement. But you can't just throw that out there as a benefit and expect it to stick. :-) Performance does not exist in a vacuum. Pipelines tend to be constructed in a way that iteratively reduces the search space, which in turn makes performance less and less of an issue. The interesting bits are probably pipelines that start with an inverted match on a rarely occurring pattern, which would not reduce the search space much. Regardless, I personally find this to be a somewhat flimsy motivation for a feature like this unless someone can convince me otherwise. IMO, if we add a feature like this, it should be primarily for the UX.
Example of using git grep with AND patterns:
git grep -e pattern1 --and -e pattern2 --and -e pattern3
Example of AND operation using Rust's regex engine:
rg -N '(?P<p1>.*pattern1.*)(?P<p2>.*pattern2.*)(?P<p3>.*pattern3.*)' file.txt
@kenorb That's presumably not the same as what git grep does. git grep -e pattern1 --and -e pattern2 will match pattern2pattern1 but (.*pattern1.*)(.*pattern2.*) will not. The standard way to perform "and" queries in ripgrep is with piping, as I mentioned above in my comment.
I quite like the simplicity and "natural feel" of using rg foo | rg bar to do the equivalent of git grep -e foo --and -e bar. The only significant difference is the color.
git grep -e foo --and -e bar

rg string | rg query

See, no highlight of the word string in the rg pipe.
@peterbe You should be able to fix that by adding --color always to your first invocation of ripgrep. Not ideal of course.
I don't even know if it's possible with pipes but if you could know that that the next pipe is another rg the --color always could be on by default. One can dream.
Piping loses the file headers.
rg abc
a.txt
4: ...abc...xyz...
7: ...abc...
b.txt
3: ...abc...xyz...
rg abc | rg xyz
4: ...abc...xyz...
3: ...abc...xyz...
That example doesn't look right. It should retain file names not as headers but in each line in standard grep format.
Sorry my bad. It looks like this:
rg abc | rg xyz
a.txt: ...abc...xyz...
a.txt: ...abc...xyz...
b.txt: ...abc...xyz...
b.txt: ...abc...xyz...
Still hard to parse when there are many files.
I think it's an example where the built-in op can provide better UX than piping.
Another example is piping with -A or -B.
// want to print a line including "abc" and "xyz" with +- 3 lines
rg abc -A 3 -B -3 | rg xyz -A 3 -B 3 // not what we want
That's certainly part of an argument in favor of this, but I will not allow that argument to be used as a hammer. Taken to its logical conclusion, ripgrep should bundle every conceivable transform on its data. At some point, people need to become OK with piping ripgrep's output and dealing with the different format. Different people will have different opinions on where that line is drawn.
I have definitely wished for an easy way to preserve headers when piping rg to rg. Maybe a flag for "header passthrough" would be useful on its own.
I have definitely wished for an easy way to preserve headers when piping rg to rg. Maybe a flag for "header passthrough" would be useful on its own.
That would be nice but won't work in all cases. E.g., consider
rg -C5 foo | rg -v bar
Now the context lines around the matched lines in the first rg call are being matched by the second rg call and your output may end up being a bit of a mess and not what you might expect.
IMO, if we add a feature like this, it should be primarily for the UX.
Looking at a few now-closed duplicate issues, what most people want is just "a and not b" with all of headers/context preserved which might make sense to special-case if that's much simpler that the general case.
Files looks like this:
a.txt
4: ...abc...
30: ...xyz...
b.txt
4: ...abc...
.....
(no 'xyz' in content)
How to find files like a.txt with 'abc' and 'xyz' in different lines?
Use multiline search.
On Fri, Feb 22, 2019, 19:35 amitbha notifications@github.com wrote:
Files looks like this:
a.txt
4: ...abc...
30: ...xyz...b.txt
4: ...abc...
.....
(no 'xyz' in content)How to find files like a.txt with 'abc' and 'xyz' in different lines?
โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/BurntSushi/ripgrep/issues/875#issuecomment-466595243,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAb34iFvSILtyapoZbiWQTX9675DE3n0ks5vQIzHgaJpZM4TEQ9s
.
Use multiline search.
โฆ
On Fri, Feb 22, 2019, 19:35 amitbha @.*> wrote: Files looks like this: a.txt 4: ...abc... 30: ...xyz... b.txt 4: ...abc... ..... (no 'xyz' in content) How to find files like a.txt with 'abc' and 'xyz' in different lines? โ You are receiving this because you commented. Reply to this email directly, view it on GitHub <#875 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAb34iFvSILtyapoZbiWQTX9675DE3n0ks5vQIzHgaJpZM4TEQ9s .
Thanks for reply.
I tried rg -U --multiline-dotall -e 'abc.*xyz, the right files were found. But there were too many outputs like:
4: ...abc...
5: xxxxx
6: xxxxx
...
29: xxxxx
30: ...xyz...
rg -U --multiline-dotall -e 'abc.*xyz | rg abc
No filename and line-numbers.
rg -U --multiline-dotall -l -e 'abc.*xyz' | rg 'abc' -
No result. How to read path from pipe?
rg -U --multiline-dotall -l -e 'abc.*xyz' | while read line; do rg 'xyz' "$line"; done
Almost done! But filenames are missing. ๐
rg -U --multiline-dotall -l -e 'abc.*xyz' | while read line; do echo "$line"; rg 'xyz' "$line"; echo; done
Done! ๐
Please skim the options in the man page. Use the -n and --with-filename
flags.
On Sat, Feb 23, 2019, 03:25 amitbha notifications@github.com wrote:
Use multiline search.
โฆ <#m_6621645017383223918_>
On Fri, Feb 22, 2019, 19:35 amitbha @.*> wrote: Files looks like
this: a.txt 4: ...abc... 30: ...xyz... b.txt 4: ...abc... ..... (no 'xyz'
in content) How to find files like a.txt with 'abc' and 'xyz' in different
lines? โ You are receiving this because you commented. Reply to this email
directly, view it on GitHub <#875 (comment)
https://github.com/BurntSushi/ripgrep/issues/875#issuecomment-466595243>,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAb34iFvSILtyapoZbiWQTX9675DE3n0ks5vQIzHgaJpZM4TEQ9s
.Thanks for reply.
I tried rg -U --multiline-dotall -e 'abc.*xyz, the right files were
found. But there were too many outputs like:4: ...abc...
5: xxxxx
6: xxxxx
...
29: xxxxx
30: ...xyz...rg -U --multiline-dotall -e 'abc.*xyz | rg abc
No filename and line-numbers.rg -U --multiline-dotall -l -e 'abc.*xyz' | rg -e 'abc' -
No result. How to read path from pipe?rg -U --multiline-dotall -l -e 'abc.*xyz' | while read line; do rg -e
'xyz' "$line"; done
Almost done! But filenames are missing.๐
โ
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/BurntSushi/ripgrep/issues/875#issuecomment-466628741,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAb34jwonl0CGHe9DS2PCPvcqLH8d2rFks5vQPr0gaJpZM4TEQ9s
.
rg -U --multiline-dotall -l -e 'abc.*xyz' | while read line; do rg --with-filename 'xyz' "$line"; echo; done
Got it!
๐
Friendly note: the utility of this feature is not in question. More comments explaining how useful this is or the kinds of problems it solves that aren't solved well by the status quo aren't necessary. The key thing blocking this feature is the potentially immense complexity that it adds not only to the implementation, but to the UX. It requires serious design work first, and it's still not clear to me that this is a feature I want to add.
It is well known that git grep supports this stuff. If it does what you want, then just use that.
Please consider a utility rg --compile-expr a -and b -and c generates relevant DFA.
Usage something like rg --dfa $(rg --compile-expr a -and -not b). This will seal complexity only in the compile-expr option. Rest UX will remain identical.
Also piping is problematic for huge files as data is being copied again for every pipe.
Piping is also an issue when using e.g. --heading
@zachriggle That's already been mentioned.
re --and, I'm not sure if this is blasphemy or even correct at all and I'm probably missing edge cases but we could demorgan it...
$ echo -e 'Hello, foo\nBye, baz\nHello, james\nHello, baz\nbaz likes yellow' | \
rg --pcre2 '^(?!((?!.*baz.*$)|(?!.*ello.*$)))'
Hello, baz
baz likes yellow
for matching any line containing baz and ello. perhaps a useful stop-gap for anyone desperate for a work-around?
@hraban If you just want a simple and query, then I'd probably recommend just doing
$ echo -e 'Hello, foo\nBye, baz\nHello, james\nHello, baz\nbaz likes yellow' | rg baz | rg ello
With the downsides of course being that you lose the nice formatting and highlighting of baz.
I'm going to suggest that maybe this issue and https://github.com/BurntSushi/ripgrep/issues/473 should be two separate issues.
Personally I'm not that interested in using complex boolean or regex patterns with ripgrep. I just want to be able to specify multiple patterns. Perhaps this could just be specified with a new flag like
rg --patterns "level=error" --patterns "requestID"
Maybe that's too simplistic, but I've been using rg nearly since it was started and I've never had any desire for anything besides a simple 'and' match on multiple patterns.
@sparrc Conceptually, you might be right. But in terms of implementation, I don't think there is much of a difference, so I'm treating them the same. Also, ripgrep _does_ have the ability to search multiple patterns (using the same exact flags as grep). It's just that it's a "or" match.
On top of that, the reason why just wanting "and" match is a little weird is because you can do it with pipelines: rg level=error | rg requestID. It's just that the UX isn't quite as good...
@BurntSushi it's not just the UX (which is a major, unfixable problem IMHO. UX issues are much more important than "real" bugs, say, 100% slowdown of some cases).
One of the main reasons for me to use ripgrep, and one of its advantages is speed, so I'm picking it when I'm searching large files. Using multiple pipes slows things down in some cases, as it copies the data, adds syscalls, etc.
This is not 100% the same search, and of course I picked a 3GB file with search terms appearing in most lines, but
$ time ./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg DMOD.\*LOW dr_agg >/dev/null
real 0m5.772s
user 0m5.017s
sys 0m0.754s
$ time ./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg DMOD.\*LOW dr_agg >/dev/null
real 0m5.749s
user 0m4.987s
sys 0m0.760s
$ time ./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg DMOD dr_agg |./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg LOW >/dev/null
real 0m6.330s
user 0m7.147s
sys 0m2.781s
$ time ./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg DMOD dr_agg |./ripgrep-12.1.1-x86_64-unknown-linux-musl/rg LOW >/dev/null
real 0m6.168s
user 0m7.245s
sys 0m2.777s
1 second hardly matter, but it is not uncommon for me to search 300GB of file.
+1 from me for multiple "AND" searches
@gd4c Please don't post +1 comments. They are noise that makes it into my inbox. If you feel obligated to +1 something, then use GitHub's emoji reactions. See also: https://github.com/BurntSushi/ripgrep/issues/875#issuecomment-466774329
Sorry. I initially upvoted the initial post, but it wasn't what I was after (which kind of evolved into the thread). I just wanted to make it clear what I was thumbing up.
Continuing from #1149
The implementation complexity of more sophisticated boolean matching is precisely my main argument against it. And it requires a very thorough specification of behavior.
And the upsides are limited. Yes, you can't get the "nice" output when using pipelines, but you can still use pipelines and the output is still serviceable.
Is it much trouble to ask you to give us an idea of what you have in mind?
I would imagine it like this:
That would be enough for me, and I suspect many of the people above.
@gd4c rg foo | rg bar will only print lines that contain both foo and bar.
True, but without the easily readable formatting. ๐
Regular grep can do that for the matter. It is just that ripgrep is nicer to use which drives this request โ to increase the nice UX rather than include an otherwise impossible feature!
In any case, if you can find an easy way to do it, that'd be great. If you consider it too much trouble, there are (less nice) workaround we can use!
What's confusing?
Also, forgot to say that piping to rg searches lines in previous stdout, not the matching _files_!
The upsides and downsides here are well known. I've stated repeatedly what the problems are with rg foo | rg bar. They don't need to keep being repeated. So I'm confused at why you're rehashing things.
Adding this feature reflects _significant_ work. The first step is to come up with a comprehensive UX specification of behavior. _That_ would be useful. Further argumentation about _why_ ripgrep should have this feature is _not_ useful. It's just noise and it's just filling up my inbox.
I said about as much almost a year and a half ago, so now I'm just repeating myself. And I'm confused at why I need to do it.
Sorry for the misunderstanding. I recognize that you understand its usefulness and that the issue is the complexity. I was just responding to your solution above.
You made a great tool and I am grateful!
Cheers!
My current work-around for this is effectively use rg -l to find all OR'ed matches, and then pass off to git grep.
A statement that looks roughly like this:
$ my-grep -e foo --and -e bar --and --not '(' -e fizz -e buzz ')'
Gets translated roughly to:
rg -e foo -e bar -l -0 | xargs -0 git grep --threads 12 --no-index -e foo --and -e bar --and --not '(' -e fizz -e buzz ')'
There's a LOT of extra plumbing in my shell script to achieve better performance (e.g. don't have ripgrep search for expressions in an --and --not ( -e fizz -e buzz ) block, but ultimately rg -l -0 | xargs -0 git grep --no-index works pretty effectively, and is much faster than git grep by itself if you make use of e.g. rg type filters (e.g. rg -t c -t py).
This also allows you to specify some git grep specific formatting, like --show-function, in addition to those that rg also supports like --break --heading --line-number.
Most helpful comment
@peterbe You should be able to fix that by adding
--color alwaysto your first invocation of ripgrep. Not ideal of course.