Fd: Design directions for fd; explicit goals/non-goal w.r.t being a find alternative

Created on 10 Jan 2019 · 9Comments · Source: sharkdp/fd

Piggybacking on the design/refactoring discussion (#382), I have a question related to the philosophical design of fd. One of the things that I believe has really helped with ripgrep's adoption (besides being wicked fast) is its almost completely drop-in compatibility with grep; there's no learning curve associated with it, essentially.

fd, on the other hand, seems to aim to be "find-inspired" rather than a true replacement for find. Is achieving a closer compatibility with find's cli args/syntax/etc a goal at all? Or is it an intentional non-goal? A few things I've noticed just in my playing around the last few days as an example:

Not supporting -argument and requiring --argument. Which is fine, I always thought it was dumb that find did that
Missing mindepth (so there's no way to specify exactly a depth of N)
A complete rewrite of the command is required to translate find . -mindepth 4 -maxdepth 4 $ -path '*foo/bar' -o -name '*ignore*' -o -path '*blah/etc*' $ -prune -o $ -perms 644 -o -perms 664 $ -name '*unicorn-sprinkles*' -exec cool-command-goes-here {} \;

While it might seem contrived, I've legitimately wanted to use every single one of those features just in this past week and having to construct a chain of 2-4 pipes to get that sort of command done just destroys the speed.

I do definitely appreciate the conciseness and ease of use of fd when it comes to quick throw out commands, but if I'm needing to do a more complex setup, I can almost always drill down into a find command and crank out 10x more performance than fd due to the extensibility available. (This is really the sticking point for me; there's not a lot of point in using something that's really fast when... I can always get faster for real-world use cases that frequently appear for me)

Source

jared-w

Most helpful comment

Piggybacking on the design/refactoring discussion (#382), I have a question related to the philosophical design of fd. One of the things that I believe has really helped with ripgrep's adoption (besides being wicked fast) is its almost completely drop-in compatibility with grep; there's no learning curve associated with it, essentially.

fd, on the other hand, seems to aim to be "find-inspired" rather than a true replacement for find. Is achieving a closer compatibility with find's cli args/syntax/etc a goal at all? Or is it an intentional non-goal?

This is explained in the very first paragraph of the README: fd [...] does not seek to mirror all of *find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.*

Not supporting -argument and requiring --argument. Which is fine, I always thought it was dumb that find did that

Good, me too :smile:. You can actually use -exec instead of --exec, but that's the only option that supports this. Following the normal convention with short -s flags and long --long flags, allows users to combine several short options (like fd -HIL ...).

Missing mindepth (so there's no way to specify exactly a depth of N)

(see #384) Yes. There are many options which are not supported. We are adding new arguments to fd from time to time, but I want to see a good argumentation why we need a specific option before considering to add it. Every new command line option increases the amount of code and documentation. It increases the probability of bugs. It makes maintenance harder. It makes testing harder as there are typically a lot of new combinations that need to be tested.

And coming back to the sentence in the README. We are not aiming to support every possible use case that can be covered with find.

A complete rewrite of the command is required to translate find . -mindepth 4 -maxdepth 4 $ -path '*foo/bar' -o -name '*ignore*' -o -path '*blah/etc*' $ -prune -o $ -perms 644 -o -perms 664 $ -name '*unicorn-sprinkles*' -exec cool-command-goes-here {} \;

Please show me actual real world use cases.

This is really the sticking point for me; there's not a lot of point in using something that's really fast when... I can always get faster for real-world use cases that frequently appear for me

Again, please let us know what these use cases are. We can then discuss if there are better ways to do this with fd. If these are use cases that belong to the "80%", we can certainly also discuss if we need to change something in fd in order to support them.

-exec cool-command-goes-here {} ;

Not to mention -exec cool-command-goes-here {} +.

fd supports both. -x cool-command-goes-here is equivalent to finds -exec cool-command-goes-here {} ; and -X cool-command-goes-here is equivalent to finds -exec cool-command-goes-here {} +. The latter is only available on master, but will be released soon.

sharkdp on 10 Jan 2019

👍3

All 9 comments

-exec cool-command-goes-here {} ;

Not to mention -exec cool-command-goes-here {} +.

dgutov on 10 Jan 2019

Piggybacking on the design/refactoring discussion (#382), I have a question related to the philosophical design of fd. One of the things that I believe has really helped with ripgrep's adoption (besides being wicked fast) is its almost completely drop-in compatibility with grep; there's no learning curve associated with it, essentially.

fd, on the other hand, seems to aim to be "find-inspired" rather than a true replacement for find. Is achieving a closer compatibility with find's cli args/syntax/etc a goal at all? Or is it an intentional non-goal?

Not supporting -argument and requiring --argument. Which is fine, I always thought it was dumb that find did that

Missing mindepth (so there's no way to specify exactly a depth of N)

And coming back to the sentence in the README. We are not aiming to support every possible use case that can be covered with find.

A complete rewrite of the command is required to translate find . -mindepth 4 -maxdepth 4 $ -path '*foo/bar' -o -name '*ignore*' -o -path '*blah/etc*' $ -prune -o $ -perms 644 -o -perms 664 $ -name '*unicorn-sprinkles*' -exec cool-command-goes-here {} \;

Please show me actual real world use cases.

This is really the sticking point for me; there's not a lot of point in using something that's really fast when... I can always get faster for real-world use cases that frequently appear for me

-exec cool-command-goes-here {} ;

Not to mention -exec cool-command-goes-here {} +.

sharkdp on 10 Jan 2019

👍3

This is explained in the very first paragraph of the README: fd [...] does not seek to mirror all of find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.

I was mostly seeking clarification on whether or not achieving a higher compatibility with the syntax of find was a non-goal; I recognize that the feature-creep will be much less aggressive. For example, fd -x thing without supporting fd --exec thing makes it harder to adopt fd after knowing learning find.

In fact, it feels like almost none of the options of fd or find are similar in any way shape or form, which makes being comfortable with fd a lot harder than I'd like. I was able to show my coworker ripgrep and they picked it up quite quickly after being skeptical; now they're a convert. I definitely can't do the same with fd, especially since we frequently ssh into old linux servers and need to know all of the old find syntax anyway. Hell, I still give up sometimes and write the find command if I can't figure out how to write it in fd the first two times; "intuitive", after all, is merely what you're familiar with :)

Please show me actual real world use cases.

That find command actually is the real world use case I wanted. Names changed, but that's about it :) writing it in find was over 10 times faster than anything I could write with fd; mostly because find allowed me to search for things with specific file permissions without having to exec a ls -ld on every file and then filter the resulting information.

Again, please let us know what these use cases are.

Aggressively pruning the search space is usually required to hit a lot of speed with find; it'd be nice to be able to prune that search space within fd as well, so I wouldn't have to do fd (args) | rg constantly to get anything done. This is where I seem to hit most of my limitations with fd vs find.

That all being said, using fd as a general purpose non-power tool for quick one-off commands is a fantastic experience, and I greatly appreciate the project as it stands today, don't get me wrong. It would just make my life easier if I could use a more familiar syntax with fd.

jared-w on 11 Jan 2019

I was mostly seeking clarification on whether or not achieving a higher compatibility with the syntax of find was a non-goal;

It's not a non-goal, but it's also not our primary target to be 100% compatible with find.

I recognize that the feature-creep will be much less aggressive. For example, fd -x thing without supporting fd --exec thing makes it harder to adopt fd after knowing learning find.

fd -x and fd --exec is the same thing?

In fact, it feels like almost none of the options of fd or find are similar in any way shape or form, which makes being comfortable with fd a lot harder than I'd like.

That's just not true:

-type => --type
-size => --size
-follow => --follow
-print0 => --print0
-maxdepth => --max-deph or --maxdepth
-exec => --exec

find . -mindepth 4 -maxdepth 4 $ -path '*foo/bar' -o -name '*ignore*' -o -path '*blah/etc*' $ -prune -o $ -perms 644 -o -perms 664 $ -name '*unicorn-sprinkles*' -exec cool-command-goes-here {} \;
That find command actually is the real world use case I wanted. Names changed, but that's about it :)

Ok. This is definitely a case which I would put in the "other 20%" or rather the "other 1%" of use cases. It will probably never be supported by fd. I don't see any big problem with this. If you are already very familiar with find, that's great. I don't see a big problem in using both tools alongside each other. fd is an "alternative to find", not a a "replacement for find".

so I wouldn't have to do fd (args) | rg constantly to get anything done

We have discussed having an option for multiple search patterns in another ticket. Would that help? What are you trying to do exactly?

sharkdp on 11 Jan 2019

fd -x and fd --exec is the same thing?

My apologies that was a poor example; I meant to give an example of an option that had a different name in find than in fd. eg: -d being maxdepth in fd vs exact depth in find, -E vs -prune (both with very different syntax), no equivalent to -name in find (because you don't need it in fd, but it makes translating find /my/dir -name '*file*' slightly less intuitive than fd /my/dir --name '*file*')

That's just not true:

You're right; I retract that statement after reviewing the man files closer. My sentiment would've been more accurately reflected by saying that find has many options that fd has no direct translation for. It's not a design flaw, it just makes using fd as a find replacement/substiution harder.

We have discussed having an option for multiple search patterns in another ticket. Would that help? What are you trying to do exactly?

In this particular use case I would've benefited from having more ways to prune down the search space without resorting to filtering with an external tool. As far as missing options that I reach for, querying file information (user/owner/group/perm) and an exact depth specifier are basically all I'm missing, though I use exact depth far more often than the user/owner/group and very rarely reach for perm. As such, I generally use find over fd whenever I need a list of filenames to pipe to something but I need to prune results in a non trivial way.

That being said, the number of searches that I run which are completely impossible in fd is quite small; even smaller if you account for the fact that find requires more aggressive pruning and complex queries to get the same results in the same speed as fd (one reason why it offers so many options for pruning...)

(Although I have used find over fd explicitly to avoid parallelism in --exec once when I was rsyncing a bunch of directories to the same target directory, but that's also a fairly weird thing to do...)

Overall, I was mainly opening the ticket to see if adding alternative flags for options to bring it closer to the 'standard syntax' for find so it could be used closer to a drop-in way was something of interest. Seems like it's not, but that's fine and certainly not a mark against a tool for being unwilling to adopt highly questionable design choices for the sake of familiarity :)

jared-w on 12 Jan 2019

and an exact depth specifier are basically all I'm missing, though I use exact depth far more often than [..]

I'd be very interested in examples where you need at an exact depth. I have never felt the need for that...

user/owner/group

There is an ongoing discussion here: #307 and here: #328.

Although I have used find over fd explicitly to avoid parallelism in --exec once when I was rsyncing a bunch of directories to the same target directory, but that's also a fairly weird thing to do...

You should be able to use --exec-once for this. Alternatively, you can set the number of threads to one (--threads 1 or -j1).

sharkdp on 13 Jan 2019

I'd be very interested in examples where you need at an exact depth. I have never felt the need for that...

I use this whenever I need to detect all of the "directories which contain x" without having false positives. A good example from work is having a giant git folder with all of our repos, but wanting to go through and get all of the (insert $backend) projects to loop over them in the shell. They all have a fairly specific structure. Usually $repo_name/www/(framework-specific-folder)/... but anything other than "exact-depth" will trigger false positives. This gets even more crucial when we have multi-technology repos (eg wordpress + react + php/composer). How do you look for, say, index.js without getting a million false positives?

You should be able to use --exec-once for this. Alternatively, you can set the number of threads to one (--threads 1 or -j1).

Nice! I must've missed this the first few looks through the man page.

My questions have essentially been answered; are you fine with me closing the ticket out? I'd hate to leave it open needlessly. Additionally, I could open a pull request to expand the readme or some docs with "how to write the equivalent find command in fd" if that would be helpful?

jared-w on 13 Jan 2019

A good example from work is having a giant git folder with all of our repos, but wanting to go through and get all of the (insert $backend) projects to loop over them in the shell. They all have a fairly specific structure. Usually $repo_name/www/(framework-specific-folder)/... but anything other than "exact-depth" will trigger false positives.

I see - that makes sense, thank you.

One way in that fd could help here is the -p/--full-path option which lets you search the full path instead of just the filename. This would allow you to do something like:

fd -p 'www/framework'

fd -p 'framework/index\.js'

or something like

fd -p '^/full/path/to/repos/[^/]+/www/framework'

If we already had #284 implemented, you could potentially also use globs:

fd -g -p 'repos/*/www/framwork'

You should be able to use --exec-once for this. Alternatively, you can set the number of threads to one (--threads 1 or -j1).

Nice! I must've missed this the first few looks through the man page.

You couldn't find it.. --exec-once is a new feature which has not been released, yet.

My questions have essentially been answered; are you fine with me closing the ticket out?

Yes, thank you very much for the feedback!

sharkdp on 13 Jan 2019

👍1

Just found this by accident.

-X/--exec-batch is now available (what was discussed as --exec-once in this thread)
-g/--glob is now available, so fd -g -p 'repos/*/www/framwork' would work.