A recurring workflow of mine is to search within an existing list of files.
Currently I'm living by
$(generate list of files) | while read f; do rg pattern "$f"; done
which is both inconvenient and inefficient.
Ack does provide a --files-from option. Implementing it would allow me to type
rg pattern --list-from <(gen list)
to fulfill my needs.
It seems like there are a few ways to do this without building it into ripgrep. Here are a couple:
[andrew@Cheetah 273] echo test > foo
[andrew@Cheetah 273] echo test > bar
[andrew@Cheetah 273] echo test > baz
[andrew@Cheetah 273] cat > file-list <<EOF
> foo
> bar
> baz
> EOF
[andrew@Cheetah 273] xargs rg test < file-list
foo
1:test
bar
1:test
baz
1:test
[andrew@Cheetah 273] rg test $(cat file-list)
bar
1:test
foo
1:test
baz
1:test
I don't think either of these approaches is less efficient than what ripgrep would do if it were built-in. The only caveat here is that if your file list is big enough, you'll need to use xargs, which will split up the argument list correctly.
Could you explain in more detail why these approaches don't work for you?
Sure. None of your proposed alternatives work with filenames containing spaces.
Try e.g.
echo test > "foo a"
echo "foo a" > file-list
xargs rg test < file-list
foo: No such file or directory (os error 2)
a: No such file or directory (os error 2)
I guess the standard solution to that is to delimit your file paths will a NUL terminator (e.g., find ./ -print0) and then tell xargs to read them using xargs -0.
If you aren't generating files with find (or some other tool that can be made to emit NUL terminators), then it seems like you should be able to use xargs -d'\n' rg test < file-list?
Fair enough. So, for the record, instead of the syntax I proposed
rg pattern --list-from <(gen list)
I can achieve the same results using
xargs -d'\n' -a <(gen list) rg pattern
Not as convenient, but I can very well live with that.
Thanks!
Yes, I think I'd prefer that at this point.
Popping up a level, do also note that ripgrep provides the -g/--glob flag, which allows you to apply ad hoc filters on which files/directories are searched. This obviously only works for simplistic cases where your rules are simple, but it does cover a lot of the simpler uses of find ./ ... | xargs grep ....
@BurntSushi My use-case is similar to his where I compile a list of files I'm interested in and search only those files instead of letting ripgrep loose on my entire project which would take a lot longer. Like you suggested, I've been using xargs -d '\n' rg PATTERN < FILELIST.
Next, I wanted to search only some specific filetypes (say C++ source files) within FILELIST so I tried to add a -tcsrc (csrc is a type I created which is defined in ~/.ripgreprc config file) but that doesn't work as ripgrep seems to ignore any glob/type arguments if provided with an explicit list of files to search from. So I ended up doing xargs -d '\n' rg PATTERN < <(rg '\.(cc|cpp)$' FILELIST) to pre-process the FILELIST before running.
This is kinda bad as I've defined the csrc type elsewhere but I'm not able to use it in this context. Is there a better way to go about this? It'd be nice if ripgrep filters the list of files provided using the type/glob argument if one is provided eg. xargs < FILELIST rg -tcsrc PATTERN
@kshenoy ripgrep has, and probably always will, explicitly ignore any filtering for file paths that are explicitly given on the command line. I realize that for your particular niche case, this isn't what you want, but to do otherwise would grossly complicate the already complex filtering logic that ripgrep performs. e.g., running rg foo blah.py and getting nothing back even if there was a match because blah.py is in your .gitignore would be quite an egregious UX fail. You might instead argue that file type filtering is different because it's explicitly provided on the command line, but it's still something that violates what is now a pretty iron clad rule: "If you give a file path to ripgrep, it will search it."
My use-case is similar to his where I compile a list of files I'm interested in and search only those files instead of letting ripgrep loose on my entire project which would take a lot longer.
You might instead consider using a .ignore or .rgignore file to dictate which files should be skipped when searching your project.
"If you give a file path to ripgrep, it will search it."
That's a reasonable rule to follow. I agree that doing anything else would involve prioritizing between different ways to include/exclude files. Thanks for the clarification.
You might instead consider using a .ignore or .rgignore file to dictate which files should be skipped when searching your project.
I did consider doing that. However, we use Perforce at work and it's easier to compile a list to search through using p4 have ... than to compile a list to ignore. I opted to create a wrapper around rg which adds the --files-from option similar to ack. It may be a little over-engineered :) but it seems to work. Any suggestions for improvement are welcome.
it would be nice to be able to pipe a list of files to ripgrep. right now, I searched for a second pattern in files matching a first pattern with
rg "pattern2" --files-without-match $(rg "pattern1" --files-with-matches)
when it would be nice to do the following because I usually think of the first pattern first
rg "pattern1" --files-with-matches | rg "pattern2" --files-without-matches
although, this use case is unique since I'm using --files-without-matches, which doesn't work with xargs since xargs calls a different ripgrep process for each file, and so ripgrep will end up printing a bunch more files than I intended it to
Could you explain in more detail why these approaches don't work for you?
What about windows?
Main issue that there is no xargs.
And if you try to add all files to command line:
rg "pattern" C:/longpath1/file1 C:/longpath2/file2 ... C:/longpath200/file200
then it exceeds maximum command length and doesn't work.
I want to search all vimhelp files provided in vim runtime path and there are a lot of files (including various plugin documentation).
I'm re-opening this because it seems impossible or difficult to work around this when xargs is not present.
What should the flag name for this be? --files-from has been proposed. Is that the best name?
Also, should files specified via this method be subject to smart filtering or globs? Files specified on the command line are not, so I would think these shouldn't be either. That is, files to be searched via this method should act as if they were given on the command line.
I'm re-opening this because it seems impossible or difficult to work around this when xargs is not present.
That's a great news !
What should the flag name for this be?
--files-fromhas been proposed. Is that the best name?
I proposed files-from after Tar:
man tar|rg -s -A8 -- '-T, --files-from'
-T, --files-from=FILE
Get names to extract or create from FILE.
Unless specified otherwise, the FILE must contain a list of names separated by ASCII LF (i.e. one name per line). The names read are handled the same way as command line arguments. They undergo quote removal and
word splitting, and any string that starts with a - is handled as tar command line option.
If this behavior is undesirable, it can be turned off using the --verbatim-files-from option.
The --null option instructs tar that the names in FILE are separated by ASCII NUL character, instead of LF. It is useful if the list is generated by find(1) -print0 predicate.
Files specified on the command line are not, so I would think these shouldn't be either.
Seconded.
file, rsync, and (as mentioned in the OP) ack also use --files-from.
file basically treats the paths as if they were given on the command line. rsync treats them kind of like includes relative to the source directory, but it also applies include/exclude patterns to them. ack treats them like how rg would treat paths given on the command line — glob patterns and type filters don't apply to them. I guess that's a strong precedent
@okdana Aye. I also think that if the list of files is given explicitly like this, then users can use other mechanisms of filtering very easily before passing the file to ripgrep. For example, you might use git ls-files to get a list of files tracked by git instead of needing to rely on ripgrep's smart filtering.
And also, come to think of it, if we did allow gitignore or other filters to apply to the list of files given, that would probably prevent this feature from being implemented in any reasonable time frame. gitignore matching, for example, is pretty heavily coupled to directory traversal. Applying -g/--glob rules would probably be easy though.
find / | rg pattern --file-from
rg --file-from=FILE
find / | rg pattern --file-from should start processing right away in a streaming fashion, and not wait for stdin to be closed (likewise with rg --file-from=FILE)@timotheecour I would expect you to have to write find / | rg pattern --files-from -, where the - is an idiom for opening stdin.
Executing the search before stdin is closed is interesting. That will require some re-factoring inside ripgrep, since right now, it stores the complete set of paths to search in memory. (Because it was always in memory via CLI arguments.) I agree that streaming is probably the right option, although that may be an enhancement that comes after the initial feature lands, depending on how difficult that refactoring is.
enhancement that comes after the initial feature lands
totally fine, then let's keep this issue open till then :-)
No, if that happens, then I'll close this issue and open a new one.