Fd: Why fd is so much faster than find?

Created on 29 Nov 2020 · 2Comments · Source: sharkdp/fd

fd and find looks to do a similar job and are both open source.
Why find can't keep up with fd in terms of performance?
Can't it be updated with the ideas of fd?
What do fd do to be faster that find don't?
I heard that regular expression is really fast in rust but can't it be done in the same way in C?
Is it rust? The support of legacy platform? Or some features that find support and fd don't, that make find so slow?

I think fd is a great project and I love the way we query search.
But I can't understand why the "standard binaries" are so slow. (They can just copy fd right?)

question

Source

stephane-archer

Most helpful comment

fd and find looks to do a similar job and are both open source.

Yes. If configured correctly, they should do exactly the same job. Note however, that fd ignores entries from the .gitignore as well as hidden files. Typically (but not always, because we need to actually parse .gitignore files - which is additional work) this makes searches faster because the search tree is smaller. For a fair comparison, always use fd with -H and -I (as done in the benchmarks in the README). This way, fd also searches hidden files and does not respect ignore files.

Why find can't keep up with fd in terms of performance?

Mostly due to parallelism. It might be kind of surprising (because disk I/O might be considered something inherently serial), but a directory traversal can benefit from using multiple threads. Also, this way, we can use multiple threads for matching the regular expressions or glob patterns.

Can't it be updated with the ideas of fd?

I don't know, but my guess is that it would be very hard to migrate an old C codebase (that was never intended to be used in a multicore setting) to support parallelism.

I heard that regular expression is really fast in rust but can't it be done in the same way in C?

The regex library in Rust is extremely fast, but it's typically not the regex matching that is really performance relevant. It's mostly just filesystem I/O.

I think fd is a great project and I love the way we query search.

Thank you for the feedback!

But I can't understand why the "standard binaries" are so slow. (They can just copy fd right?)

If by "standard binaries" you mean find - see above. It's probably not that easy.

sharkdp on 29 Nov 2020

🚀2 ❤1 🎉1 👍1

All 2 comments

fd and find looks to do a similar job and are both open source.

Yes. If configured correctly, they should do exactly the same job. Note however, that fd ignores entries from the .gitignore as well as hidden files. Typically (but not always, because we need to actually parse .gitignore files - which is additional work) this makes searches faster because the search tree is smaller. For a fair comparison, always use fd with -H and -I (as done in the benchmarks in the README). This way, fd also searches hidden files and does not respect ignore files.

Why find can't keep up with fd in terms of performance?

Mostly due to parallelism. It might be kind of surprising (because disk I/O might be considered something inherently serial), but a directory traversal can benefit from using multiple threads. Also, this way, we can use multiple threads for matching the regular expressions or glob patterns.

Can't it be updated with the ideas of fd?

I don't know, but my guess is that it would be very hard to migrate an old C codebase (that was never intended to be used in a multicore setting) to support parallelism.

I heard that regular expression is really fast in rust but can't it be done in the same way in C?

The regex library in Rust is extremely fast, but it's typically not the regex matching that is really performance relevant. It's mostly just filesystem I/O.

I think fd is a great project and I love the way we query search.

Thank you for the feedback!

But I can't understand why the "standard binaries" are so slow. (They can just copy fd right?)

If by "standard binaries" you mean find - see above. It's probably not that easy.

sharkdp on 29 Nov 2020

🚀2 ❤1 🎉1 👍1

Thank you for your great and eye opening answer! ☺️

On Sun, Nov 29, 2020, 21:56 David Peter notifications@github.com wrote:

fd and find looks to do a similar job and are both open source.

Yes. If configured correctly, they should do exactly the same job. Note
however, that fd ignores entries from the .gitignore as well as hidden
files. Typically (but not always, because we need to actually parse
.gitignore files - which is additional work) this makes searches faster
because the search tree is smaller. For a fair comparison, always use fd
with -H and -I (as done in the benchmarks in the README). This way, fd
also searches hidden files and does not respect ignore files.

Why find can't keep up with fd in terms of performance?

Mostly due to parallelism. It might be kind of surprising (because disk
I/O might be considered something inherently serial), but a directory
traversal can benefit from using multiple threads. Also, this way, we can
use multiple threads for matching the regular expressions or glob patterns.

Can't it be updated with the ideas of fd?

I don't know, but my guess is that it would be very hard to migrate an old
C codebase (that was never intended to be used in a multicore setting) to
support parallelism.

I heard that regular expression is really fast in rust but can't it be
done in the same way in C?

The regex library in Rust is extremely fast, but it's typically not the
regex matching that is really performance relevant. It's mostly just
filesystem I/O.

I think fd is a great project and I love the way we query search.

Thank you for the feedback!

But I can't understand why the "standard binaries" are so slow. (They can
just copy fd right?)

If by "standard binaries" you mean find - see above. It's probably not
that easy.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/sharkdp/fd/issues/693#issuecomment-735398837, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAWX4PICQR6LZ3FKWOVKMXDSSJHJZANCNFSM4UGONADA
.

stephane-archer on 30 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Ignoring files added with 'git add --force'

christianbundy · 3Comments

--changed-within/--changed-before not working with directories

mrzool · 4Comments

Centos version

ariecattan · 3Comments

Using fd, --exec, and cd

mathomp4 · 3Comments

find vs fd

blueray453 · 3Comments