Does fd support the equivalent of find PATH -name NAME -print -quit, which finds the first match, prints the result, and terminates?
I looked into some closed issues and found fd --max-buffer-time=0 NAME PATH | head -n 1, which takes 0.9s real time compared to 0.5s with find PATH -name NAME -print -quit. Am I missing something?
Thank you for your feedback.
I managed to find a similar example on my filesystem where I could reproduce your results. I think the problem is that piping into head -n 1 doesn't necessarily immediately shut down the process.
As a demonstration, let's look at find first. I am using hyperfine for running the benchmarks:
hyperfine --warmup 3 \
'find -iname "*flow.yaml"' \
'find -iname "*flow.yaml" | head -n1' \
'find -iname "*flow.yaml" -print -quit'
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| find -iname "*flow.yaml" | 2.558 卤 0.023 | 2.523 | 2.597 | 21.7 |
| find -iname "*flow.yaml" \| head -n1 | 2.576 卤 0.043 | 2.542 | 2.684 | 21.9 |
| find -iname "*flow.yaml" -print -quit | 0.118 卤 0.002 | 0.114 | 0.122 | 1.0 |
Notice how the variant with | head -n 1 actually takes the same time. Apparently, find just keeps on running in case of a broken pipe (head closes it's STDIN when the necessary number of lines has been read).
With fd, the results look slightly different (note that these are milliseconds, not seconds like above):
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| fd --max-buffer-time=0 flow.yaml | 256.8 卤 2.8 | 253.9 | 263.0 | 1.3 |
| fd --max-buffer-time=0 flow.yaml \| head -n 1 | 191.2 卤 3.5 | 184.4 | 196.6 | 1.0 |
The variant with head -n 1 is slightly faster. However, when I run fd interactively, I can clearly see that it outputs the first result very quickly and only quits when the second result would be about to get printed(!). The reason is that this is the first time that fd notices that its STDOUT pipe is closed (= heads STDIN).
We can demonstrate a similar behavior by running:
(echo first; sleep 1; echo second; sleep 100; echo third) | head -n 1
This command runs one second instead of quitting immediately.
To make sure that this is the actual problem with fd as well, I quickly changed the print_entry_uncolorized function to print an additional newline:
--- a/src/output.rs
+++ b/src/output.rs
@@ -90,5 +90,6 @@ fn print_entry_uncolorized(
let separator = if config.null_separator { "\0" } else { "\n" };
let path_str = path.to_string_lossy();
- write!(stdout, "{}{}", path_str, separator)
+ write!(stdout, "{}{}", path_str, separator)?;
+ writeln!(stdout)
}
With this small modification, fd is suddenly blazing fast (a factor of 10 faster than find instead of a factor 1.6 slower)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| fd --max-buffer-time=0 flow.yaml \| head -n1 | 11.3 卤 1.0 | 8.7 | 15.0 | 1.0 |
Now, this is obviously not something we want to implement in this way. If anybody has any good suggestions on how to "fix" this, please let us know. One potential way could be to test (however that works) if STDOUT has been closed after printing each result. However, it should be checked if this has any performance impact when not piping to head.
If there is no great solution, we should actually think about implementing a --max-results <count> option (see also #476).
One potential way could be to test (however that works) if STDOUT has been closed after printing each result.
I believe you can attempt to write 0 bytes to stdout, and you'll get EPIPE back if the pipe is closed (and you're ignoring SIGPIPE like Rust does by default). It's probably not a good idea to do two write syscalls every time you print something though. And I think it's still racey, since if head hasn't finished reading the first line by the time you do the second write, it won't fail.
So maybe have a timer such that if the main thread hasn't received any files to print in a while, it writes 0 bytes to stdout and exits if that fails. Alternatively don't bother, since no other tool seems to.
Correction: despite what StackOverflow said, writing 0 bytes to a closed pipe does not trigger EPIPE. I'm not sure there's a non-destructive way to find out if the other end of a pipe is closed.
There is a way, at least on Linux: https://stackoverflow.com/a/57959507/502399
On Windows, apparently the write-zero-bytes thing works.
@tavianator Thank you very much for your analysis. I opted to implement --max-results=<count> because that seemed like a much cleaner way of solving this use case.
Please see #555 for benchmark results.
This has now been released in fd v8.0. We also have -1 as an alias for --max-results=1.
Most helpful comment
@tavianator Thank you very much for your analysis. I opted to implement
--max-results=<count>because that seemed like a much cleaner way of solving this use case.Please see #555 for benchmark results.