Since fd was first published, the feature to hide Git-ignored files by default has always been controversial. It's the number one pitfall for new users, as witnessed by the numerous issues that have been opened over time (even though this is the first point in the Troubleshooting section). Even experienced users will likely run into this from time to time.
We have had past discussions about this (see #179, #220, #18), but I'm not so sure anymore if this default is the best possible option for the "average user".
I thought it might make sense to discuss this again and see what others think. Whatever we choose as the default, it will always be easy for users to select a different default via an alias.
Pro current behavior (do not show .gitignored entries by default):
.gitignore files into account. .gitignored directories tend to contain huge amounts of automatically generated build artifacts or downloaded dependency files. Pruning these directories from the search tree typically results in a faster search overall. There are counterexamples to this where the parsing of long .gitignore files takes longer than actually traversing these directories..gitignored results are not "interesting" to the user (however, see counterpart below).fd without any arguments, I typically don't want to see .gitignored files.Cons:
-I or -u. There are a lot of valid use cases where users are - in fact - interested in results from ignored directories or files. Personally, I would estimate that I use -uu or -HI in roughly 20% of my searches, which is quite high.I want to add that the nature of the files in .gitignore depends a lot on the nature of our projects. In my case for example most of the time the ignored files are files with sensitive information (not crap) that I want to be able to search with fd.
But I understand that for other people often the files in .gitignore have to be ignored by fd as well.
For this reason the desired default behavior is not the same for everyone.
In my opinion, the default behavior should be "search all", because it's easier to figure out why there are too many results than it is to figure out why there are missing results.
But such a change in behavior will not be backward compatible, which is never good. To overcome this, people must be allowed to easily return to the old default behavior. Hence the need to be able to configure the default behavior of fd (#362).
because it's easier to figure out why there are too many results than it is to figure out why there are missing results.
I think this is an excellent point.
But such a change in behavior will not be backward compatible, which is never good.
Agreed. This is probably the main reason why I never changed the behavior. Still, if it should turn out that 90% of users would like a different default, I'm more than willing to make a breaking change.
To overcome this, people must be allowed to easily return to the old default behavior. Hence the need to be able to configure the default behavior of
fd(#362).
Okay, I have reopened the ticket once again. Let's discuss this aspect in #362. There are other ways to configure the default behavior as well (aliases, wrappers, environment variables).
Personally, I think the pros outweigh the cons. But setting up an alias is easy, so I wouldn't be too upset if this changed.
I am a new user and think this is a cool feature but I also think it should be the default as it is not obvious from any short description and is not intuitive.
"Powerusers" could easily setup an alias as was commented earlier.
I would add my vote to search all files except hidden files by default.
Hello.
I'd like to make a point that fd is a general-purpose file-searching utility that is not git specific, so having it to take into account .gitignore files, laying around in the filesystem, does not feel right.
In fact, I've stumbled upon this issue the very first time I tried fd: I've tried to find something, starting from a non-git directory in subdirectories which happened to be git repos and found nothing, although I knew it was there. After that, the very next thing I did is patched fd locally, so it wouldn't read .gitignore files by default.
I think that for this issue and for #362 the question is:
Should fd behaves as "general" or "git-style" tool ?
What I mean by this is summarized in the following table
| | general tool | git style tool |
|:--------------|:-------------:|:-----------------------------------:|
| examples | find,grep | git,rg |
| ignore files | no | yes |
| configuration | flags only | flags, files, environment variables |
Actually fd is in between the two worlds. And respecting the ignore files without the configuration feature is bad, IMO. So fd should choose the red pill or the blue pill ;)
Or may be we don't have to choose and we can have both.
I think that :
fd should be a general tool (by default)--git that makes it act in a "git style"fdg should be another compiled version from the same code but with different default behavior, that is equivalent to fd --git and heaving --no-git flag to switch it to "general tool".My arguments for creating the additional fdg are the following:
--no-git option as default, the other with --git as default.powershell and cmd included)fd we should setup the aliases, so the advantage of "no install, just download and use it" is not valid any morefdfdg or even to rename fdg fo fd and continue using it as before (respecting the ignore files by default)fd (this would be done using the configuration file in fdg)I like that idea, with one caveat. Instead of having a separately compiled version of fd, fdg should just be a symlink to fd, and fd check the name that it is called with, and if it is "fd" use the general behavior, and if it is "fdg" use the "git" behavior. Or alternatively distribute OS-specific wrapper scripts for fdg (for example that does something like exec fd --ignore, or whatever the windows equivalent is).
@tmccombs The problem with all workarounds : aliases, scripts, symlinks, ... are the same :
Did you have arguments against heaving fdg in addition ? Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.
The only drawback I see is that having two versions can confuse the novice user which one to take. But this can be easily solved by promoting fd and talking about fdg only in a section at the end of the README.
Sorry, I am suggesting that the symlink or shell would be distributed with fd. Having separate, but nearly identical binaries means the package is larger (longer to download and more space on disk). fd is small enough it's probably not that big of a deal for most people, but it still feels wrong to me.
Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.
It means builds take longer, which affects anyone who builds it from source.
Sorry, I am suggesting that the symlink or shell would be distributed with fd.
This means that all scripts should be tested and maintened and they will raise more issues here.
I have very bad experience with the provided scripts for windows for example with conda (python package manager).
Providing additional scripts is the end of "single executable tool".
Having separate, but nearly identical binaries means the package is larger (longer to download and more space on disk). fd is small enough it's probably not that big of a deal for most people, but it still feels wrong to me.
I don,t know what you call "the package". The builds are larger, but the sources are not. And this argument is valid for all os specific builds. If we want smallers builds it is easy : provide only one (or zero) builds and tell the peopel to build their own, but this is not very user firnedly.
Once the CI/CD tool configured to compile both tools (for all systems) this do not create more work for the maintainers.
It means builds take longer, which affects anyone who builds it from source.
No. You build only the version that you need. This is already the case for all os specific builds.
Only the CI/CD builds all the versions.
And if we have to choose only one type of build (general vs git style) which one this should be ?
Giving a good solution only for one category of users and telling to the other category "get by with scripts" is not very user friendly, IMO.
This means that all scripts should be tested and maintened and they will raise more issues here.
The script I'm proposing is extremely simple, it would just call fd with an option and the arguments passed to the script. If we had a separate executable, that would have to be tested and maintained too :).
I don,t know what you call "the package".
I mean the zip, tarball, or deb you download from the releases page, or a package you install with a package manager like apt, dnf, pacman, homebrew, winget (maybe?), etc.
Providing additional scripts is the end of "single executable tool".
From your next paragraph, I think what you mean by this is that you need to install more than just a single executable. But afaik, this is not a design goal of fd. And fd is not currently distributed as just an executable. all packages include command completions scripts, and on linux they also include man pages.
The builds are larger, but the sources are not. And this argument is valid for all os specific builds. If we want smallers builds it is easy : provide only one (or zero) builds and tell the peopel to build their own, but this is not very user firnedly.
So you're saying we would have twice as many packages on the release page and in package managers/app stores? I really don't like that idea. I think it would make it even more confusing for users to know which package to install/download, and it roughly doubles the amount of work for packagers to maintain the package for package managers like apt, dnf, pacman, etc.
No. You build only the version that you need.
You're assuming no one would want both. This also means complicates building from source, since the user now needs to be able to configure which version they want to build somehow (the OS they want is implicit in the OS they are running the build from, unless they are cross-building).
And if we have to choose only one type of build (general vs git style) which one this should be ?
Giving a good solution only for one category of users and telling to the other category "get by with scripts" is not very user friendly, IMO.
That is not what I'm suggesting.
If using a script really is such a problem for windows, why not the symlink/command name approach?
fd would have something like:
let cmdName = std::env::args().next().unwrap();
if cmdName.ends_with("fdg") {
// use git-style default
} else {
// use "general" default
}
Then on unix-like systems, the package would just have a symlink from fdg to fd (or vice versa). If windows doesn't have an equivalent it could just have a copy of fd called fdg, or have a seperate package if that makes more sense, and/or have instructions to rename the file in the readme depending on what functionality you want.
We already have _~/.fdignore_. Maybe this file could somehow 'include' gitignore (via something like _@~/.gitignore_ or some other character/directive)? With this approach showing git-ignored files could be enabled via default, also allowing user to add his git-ignored entries that are already in _~/.gitignore_ (or _./.git/ignore_ when inside repository) in an easy way?
Personally, I wasn't even aware that git-ignored files are omitted: https://imgur.com/a/UlLD8ED
For now my _.fdignore_ contains mostly 100% of _.gitignore_ + other patterns. It would be great to be able to 'include' the file as a whole, not to copy it's content manually.
Maybe this file could somehow 'include' gitignore (via something like @~/.gitignore or some other character/directive)
respecting .gitignore is more than just respecting a global ~/.gitignore file though. It's also uses .gitignore files in ancestor directories, and descendent directories (scoped to those directories). You _could_ have some syntax in .fdignore to enable --ignore-vcs, but that isn't really an include, so much as a flag. And we would still probably want a way to disable that with --no-ignore-vcs on the command line to overrule that, but keep the other ignores from .fdignore, which seems kind of weird to me.
FWIW, ripgrep (which I imagine a lot of fd users also use) also respects gitignore by default.
Thought dump:
git is primarily concerned about the contents of files, their state, but not their presence.
This means that .gitignore files are also about the state of the files, but not their presence.
ripgrep, just like git, also primarily concerned about the contents of files,
and this shared concern makes its choice of consideration of .gitignore files understandable,
although it could also be opt-in.
fd, on the other hand, is not concerned about the state of the files, but is concerned about their presence,
what differs from concerns of git and ripgrep, what makes its consideration of .gitignore files slightly less fitting.
fd, on the other hand, is not concerned about the state of the files, but is concerned about their presence
That's a good way of looking at it, my stance is pretty neutral now. As a long-time user I'd have no problem making an inverse alias to fdh: aliased to fd --no-ignore-vcs -H -E '.git/', but I still feel there's value in performant (I assume most fd users have git repos in their filesystems) defaults. Plus, it's already established and would take some effort to flip.
Just to throw my opinion into the ring. I'm in favor of changing the default.
When I use fd with no options/arguments other than a pattern, 99% of the time I'm just using it to quickly narrow down the list of files I need to look at to find what I want. In that case I'm okay if I get some things that I don't care about in the search, but I'm much more annoyed if I don't find something that's actually there because I forgot to specify that I wanted to search .gitignored files as well. @sharkdp said that he adds -H or -u around 20% of the time, meaning those flags mattered 20% of the time. But I'm willing to bet that if those flags were enabled by default then they would have to be disabled much less than 20% of the time.
Also, from a scripting/reducing noise perspective, normally when when I'm doing something more precise than just quickly narrowing down a list of files, I'm more willing to add flags and check the documentation in order to narrow down the search results to be only what I care about.
And concerning adding an fdg binary (or symlink), I don't see how that's better than just adding an optional flag. It feels like it would complicate CI and packaging a lot for something that essentially just flips a flag on by default.
I have an additive suggestion, which could leverage or make the suggestion obsolete: Add an according description to tldr.
rg/ripgrep has a description rg -uu pattern, which is the second result and thus searchable in 1s.
20% typing the thing would then overall still mean less time.
Bonus is that -uu could be established as use hidden github stuff or "do more work".
One client for tldr is tealdeer.
Some thoughts on a couple of points brought up in this thread, though nothing terribly new.
I'm also in favour of changing the default, because IMO paths being present .gitignore mean literally what it says on the tin - "not interesting _for the purposes of version control_" and I wish that tools like fd as well as ripgrep did not overload this definition to mean roughly "not interesting to search/scan in by default". I, like many others here, have had false negatives due to this. It's (subjectively!) a bit sad that in order to be confident in a negative search result one has to either provide extra flags, or rerun the search using a "legacy" tool.
.gitignore with regards to other similar filesfd, or to be more precise the ignore crate that it uses, appears to only support git's ignore files, which means fd's behaviour for say Mercurial users will be different. Firefox, arguably the poster child for Rust, uses Mercurial for example.
This applies to grep, and is likely Linux (or perhaps even Debian) specific:
root@9fb4e89aea1b:/# man grep | head -4
GREP(1) User Commands GREP(1)
NAME
grep, egrep, fgrep, rgrep - print lines that match patterns
So, using aliases for commonly-used flags is at the very least nothing new.
Some thoughts on a couple of points brought up in this thread, though nothing terribly new.
1. Conflating _git_ ignore with general ignoreI'm also in favour of changing the default, because IMO paths being present
.gitignoremean literally what it says on the tin - "not interesting _for the purposes of version control_" and I wish that tools likefdas well asripgrepdid not overload this definition to mean roughly "not interesting to search/scan in by default". I, like many others here, have had false negatives due to this. It's (subjectively!) a bit sad that in order to be confident in a negative search result one has to either provide extra flags, or rerun the search using a "legacy" tool.1. Special treatment of `.gitignore` with regards to other similar files
Did you ever run grep on repos with huge binary files (>5 GB) or big amount of files ignored by .gitignore ?
Especially binary data (without newline) use linear time and that is why ripgrep has another default than grep.
For usage for fd of many, many files inside .gitignore ie compiling Linux Kernel the same argument can be made.
fd, or to be more precise theignorecrate that it uses, appears to only supportgit's ignore files, which meansfd's behaviour for say Mercurial users will be different. Firefox, arguably the poster child for Rust, uses Mercurial for example.
Argument of authority is no technical argument on usage. And you cant make everyone happy for using the tool. Here a short catalogue for decision making:
1. Prior art re: aliases/separate binariesThis applies to
grep, and is likely Linux (or perhaps even Debian) specific:root@9fb4e89aea1b:/# man grep | head -4 GREP(1) User Commands GREP(1) NAME grep, egrep, fgrep, rgrep - print lines that match patternsSo, using aliases for commonly-used flags is at the very least nothing new.
How should this be maintained and name-clashes prevented ? cfdisk, df, efi,rfkill are already used. Do you have specific names in mind?
This applies to
grep, and is likely Linux (or perhaps even Debian) specific:root@9fb4e89aea1b:/# man grep | head -4 GREP(1) User Commands GREP(1) NAME grep, egrep, fgrep, rgrep - print lines that match patternsSo, using aliases for commonly-used flags is at the very least nothing new.
On Ubuntu at least, egrep, fgrep and rgrep are simply shell scripts that run grep with the given options. I would be ok with that, or with a symlink approach. But I'd rather not have seperate but nearly identical compiled binaries.
(repo maintainer - this reply certainly cuts close to being off-topic, please feel free to remove it)
Did you ever run
grepon repos with huge binary files (>5 GB) or big amount of files ignored by.gitignore?
Especially binary data (without newline) use linear time and that is whyripgrephas another default thangrep.
For usage forfdof many, many files inside.gitignoreie compiling Linux Kernel the same argument can be made.
I'd like to refer back to the original issue, which talks about picking the best default for the "average user", not about removing support for ignoring based on .gitignore altogether. Overall, the ability to piggyback on .gitignore is tremendously helpful. However, I have been bitten by it myself and have seen others in the same situation, hence my _personal_ stance of "don't look at .gitignore by default". It's an anecdotal account and you can absolutely find more valid critiques against it, but I don't think that's gonna advance the overall discussion much more.
Argument of authority is no technical argument on usage. And you cant make everyone happy for using the tool. Here a short catalogue for decision making:
My argument (2) is definitely a weak one, and you're right that it's impossible to make everyone happy (e.g. look at us having this very discussion!). I mentioned it because fd (ripgrep is arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports only git. From fd --help:
--no-ignore-vcs flagAgain, this is only a quibble.
How should this be maintained and name-clashes prevented ?
cfdisk,df,efi,rfkillare already used. Do you have specific names in mind?
I don't have any specific suggestions here. My aim is only to highlight that using aliases shell scripts to invoke certain flags is something that's already ships with popular Linux distros.
fd (ripgrep is arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports only git
I believe this is for forwards compatibility. So if at some future point the ignore crate adds support for additional VCS systems (such as mercurial), then the --no-ignore-vcs flag will just work, without having to add a new --no-ignore-hg flag and similar for each VCS system that is added.
* grep ignores binaries by default as well * most repos don't contain 5GB binaries, nor nearly as many build artifacts as the LInux kernel
Not, when you did not write a binary-data header or alike.
My argument (2) is definitely a weak one, and you're right that it's impossible to make everyone happy (e.g. look at us having this very discussion!). I mentioned it because
fd(ripgrepis arguably more guilty, but that's offtopic) talks about version control ignore files in a general sense, but in fact supports onlygit. Fromfd --help:* "<...>that would otherwise be ignored by '.*ignore' files" * the very name of `--no-ignore-vcs` flagAgain, this is only a quibble.
True.
@sharkdp
Why is checking.*ignore* for files/folders to ignore not possible? The filepaths needs to be checked against the .gitignore anyway or is there a limit to 1 or 2?
I don't have any specific suggestions here. My aim is only to highlight that using ~aliases~ shell scripts to invoke certain flags is something that's already ships with popular Linux distros.
I dont like them at all and prefer shell aliases.
If you do make this change, _please_ consider using separate flags for .gitignore, .fdignore, etc. I have run into valid use cases for (observe .fdignore, ignore .gitignore) and visa-versa.
Examples:
-I/--ignore -- Ignore file patterns in .gitignore and .fdignore
-Ig/--ignore-gitignore -- Ignore file patterns in .gitignore
-If/--ignore-fdignore -- Ignore file patterns in .fdignore
-N/--no-ignore -- do Not ignore file patterns in .gitignore and .fdignore
-Ng/--no-ignore-gitignore -- do Not ignore file patterns in .gitignore
-Nf/--no-ignore-fdignore -- do Not ignore file patterns in .fdignore
Unfortunately these all use double-negatives, and there is a potential confusion about the double-meaning of "ignore .gitignore" (ignoring the .gitignore file and ignoring the files within it have opposite meaning). Other terms that may be less confusing:
There is precedent for fine-grained ignore params in ripgrep:
(--no-ignore-dot, --no-ignore-global, --no-ignore-vcs, etc.)
[ If the above comment about supporting non-git repositories is implemented, then Ig might become Iv (for vcs) ]
I agree with having more granular control over ignore files. However, I'd rather use flags consistent with what ripgrep uses, both because of consistentency and because I think they are a little less confusing.
I think the point of having a tool like this is that it's opinionated. If I have to start adding flags to reach the default behavior/length of find, I might as well use find
Like rg and the rest of the modern tools, what makes them great is their defaults. If the only advantage is a very minor speed bump, people would just use the preinstalled tools they already know.
The fact that it ignores hidden and gitignored files is in the main bullet point feature list. If one doesn't bother to read that...
Wow, thanks for pinning this issue. I manage my dotfiles by creating a git repo in ~ and then adding * to ~/.gitignore. I was _so_ confused at why I wasn't getting any results in my home directory.
I think we should follow the convention from grep and git grep:
fd does not exclude files from .gitignoregit fd excludes files from .gitignoreWorkarounds: I need to make sure that any time fd is invoked (by any script!) it's called with --no-ignore-vcs. I'd love to be able to set this in a configuration file or environment variable.
Wow, thanks for pinning this issue. I manage my dotfiles by creating a git repo in
~and then adding*to~/.gitignore. I was _so_ confused at why I wasn't getting any results in my home directory.I think we should follow the convention from
grepandgit grep:* `fd` does not exclude files from `.gitignore` * `git fd` excludes files from `.gitignore`Workarounds: I need to make sure that any time
fdis invoked (by any script!) it's called with--no-ignore-vcs. I'd love to be able to set this in a configuration file or environment variable.
This is not a convention.
git grep is a subcommand of git while grep is a completely different programm.
This is not a convention.
This is a convention -- the git-grep binary is provided by the Git project, but from the perspective of a user it's clear that git-grep takes into account .gitignore whereas grep does not. I'm suggesting that Git-specific usage should be provided under git-fd rather than fd. In summary:
foo -- should aim to be as general-purpose as practicalgit-foo -- should aim to be as Git-specific as practical
git grepis a subcommand of git whilegrepis a completely different programm.
Yes, I agree.
fd does not exclude files from .gitignore git fd excludes files from .gitignore
I have a few issues with this:
git grep works very differently than fd. While git grep runs grep on each file that is checked in (basically equivalent to git ls-files | xargs grep), fd does file traversal itself, in parallel, which s where you get a lot of the speed. fd (and rg) in a directory that contains multiple git repos and still respect .gitignore files. In short, I don't think it is any better of an option than making --no-ignore-vcs the dfault rather than --ignore-vcs.
Most helpful comment
Just to throw my opinion into the ring. I'm in favor of changing the default.
When I use fd with no options/arguments other than a pattern, 99% of the time I'm just using it to quickly narrow down the list of files I need to look at to find what I want. In that case I'm okay if I get some things that I don't care about in the search, but I'm much more annoyed if I don't find something that's actually there because I forgot to specify that I wanted to search .gitignored files as well. @sharkdp said that he adds
-Hor-uaround 20% of the time, meaning those flags mattered 20% of the time. But I'm willing to bet that if those flags were enabled by default then they would have to be disabled much less than 20% of the time.Also, from a scripting/reducing noise perspective, normally when when I'm doing something more precise than just quickly narrowing down a list of files, I'm more willing to add flags and check the documentation in order to narrow down the search results to be only what I care about.
And concerning adding an
fdgbinary (or symlink), I don't see how that's better than just adding an optional flag. It feels like it would complicate CI and packaging a lot for something that essentially just flips a flag on by default.