fd and find looks to do a similar job and are both open source.
Why find can't keep up with fd in terms of performance?
Can't it be updated with the ideas of fd?
What do fd do to be faster that find don't?
I heard that regular expression is really fast in rust but can't it be done in the same way in C?
Is it rust? The support of legacy platform? Or some features that find support and fd don't, that make find so slow?
I think fd is a great project and I love the way we query search.
But I can't understand why the "standard binaries" are so slow. (They can just copy fd right?)
fd and find looks to do a similar job and are both open source.
Yes. If configured correctly, they should do exactly the same job. Note however, that fd ignores entries from the .gitignore as well as hidden files. Typically (but not always, because we need to actually parse .gitignore files - which is additional work) this makes searches faster because the search tree is smaller. For a fair comparison, always use fd with -H and -I (as done in the benchmarks in the README). This way, fd also searches hidden files and does not respect ignore files.
Why find can't keep up with fd in terms of performance?
Mostly due to parallelism. It might be kind of surprising (because disk I/O might be considered something inherently serial), but a directory traversal can benefit from using multiple threads. Also, this way, we can use multiple threads for matching the regular expressions or glob patterns.
Can't it be updated with the ideas of fd?
I don't know, but my guess is that it would be very hard to migrate an old C codebase (that was never intended to be used in a multicore setting) to support parallelism.
I heard that regular expression is really fast in rust but can't it be done in the same way in C?
The regex library in Rust is extremely fast, but it's typically not the regex matching that is really performance relevant. It's mostly just filesystem I/O.
I think fd is a great project and I love the way we query search.
Thank you for the feedback!
But I can't understand why the "standard binaries" are so slow. (They can just copy fd right?)
If by "standard binaries" you mean find - see above. It's probably not that easy.
Thank you for your great and eye opening answer! ☺️
On Sun, Nov 29, 2020, 21:56 David Peter notifications@github.com wrote:
fd and find looks to do a similar job and are both open source.
Yes. If configured correctly, they should do exactly the same job. Note
however, that fd ignores entries from the .gitignore as well as hidden
files. Typically (but not always, because we need to actually parse
.gitignore files - which is additional work) this makes searches faster
because the search tree is smaller. For a fair comparison, always use fd
with -H and -I (as done in the benchmarks in the README). This way, fd
also searches hidden files and does not respect ignore files.Why find can't keep up with fd in terms of performance?
Mostly due to parallelism. It might be kind of surprising (because disk
I/O might be considered something inherently serial), but a directory
traversal can benefit from using multiple threads. Also, this way, we can
use multiple threads for matching the regular expressions or glob patterns.Can't it be updated with the ideas of fd?
I don't know, but my guess is that it would be very hard to migrate an old
C codebase (that was never intended to be used in a multicore setting) to
support parallelism.I heard that regular expression is really fast in rust but can't it be
done in the same way in C?The regex library in Rust is extremely fast, but it's typically not the
regex matching that is really performance relevant. It's mostly just
filesystem I/O.I think fd is a great project and I love the way we query search.
Thank you for the feedback!
But I can't understand why the "standard binaries" are so slow. (They can
just copy fd right?)If by "standard binaries" you mean find - see above. It's probably not
that easy.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/sharkdp/fd/issues/693#issuecomment-735398837, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAWX4PICQR6LZ3FKWOVKMXDSSJHJZANCNFSM4UGONADA
.
Most helpful comment
Yes. If configured correctly, they should do exactly the same job. Note however, that
fdignores entries from the.gitignoreas well as hidden files. Typically (but not always, because we need to actually parse.gitignorefiles - which is additional work) this makes searches faster because the search tree is smaller. For a fair comparison, always usefdwith-Hand-I(as done in the benchmarks in the README). This way,fdalso searches hidden files and does not respect ignore files.Mostly due to parallelism. It might be kind of surprising (because disk I/O might be considered something inherently serial), but a directory traversal can benefit from using multiple threads. Also, this way, we can use multiple threads for matching the regular expressions or glob patterns.
I don't know, but my guess is that it would be very hard to migrate an old C codebase (that was never intended to be used in a multicore setting) to support parallelism.
The regex library in Rust is extremely fast, but it's typically not the regex matching that is really performance relevant. It's mostly just filesystem I/O.
Thank you for the feedback!
If by "standard binaries" you mean
find- see above. It's probably not that easy.