Vscode: Searching large projects is too slow

Created on 18 Nov 2015 · 54Comments · Source: microsoft/vscode

Ubuntu 12.04, vscode 0.10.1

I have found Go to file's indexing against a full Chromium workspace to be very slow. It took ~40 seconds to find "Tab.java" whereas a simple find command took less than a second:

$ time find . -name "Tab.java"
./chrome/android/java/src/org/chromium/chrome/browser/tab/Tab.java

real    0m0.559s
user    0m0.268s
sys 0m0.284s

Note that the workspace is on an SSD.

bug search

Source

Tyriar

👍32 ❤3

Most helpful comment

If fuzzy search can be brought to parity with other editors it would make VS Code the hands-down choice for me. I still prefer it for smaller projects, but for larger ones I really have no choice but to use Sublime or Atom.

Not sure what the timeline looks like on this.

jack-guy on 3 Jun 2016

👍13

All 54 comments

I can confirm that this issue also exists on windows in the newest february 2016 release. This is a usability break for large projects

steinemann on 15 Mar 2016

👍12

@bpasero could a find for (term)* be run in parallel and those result(s) shown while the fuzzy search is happening to speed up an ideal search?

Tyriar on 16 Mar 2016

👍1

@Tyriar in my experiments using find or any other external process did not yield significant speed improvement over what we do now. the reason is that we do quite some heavy pattern matching and other things with the result before it even reaches the user. The better fix is to keep the list of paths in memory once you ran the search once and reuse that information.

bpasero on 17 Mar 2016

Check this post on reddit by author of sublime text on how they totally nail it https://www.reddit.com/r/programming/comments/4cfz8r/reverse_engineering_sublime_texts_fuzzy_match/

haisum on 30 Mar 2016

@haisum thanks so much for sharing, what a coincidence that I am currently looking into improving our scoring algorithm.

Note however that this is not really talking about how to make find in files fast, rather how to score the results for the quick open box.

bpasero on 30 Mar 2016

@bpasero yup I checked, it's not in much depth but I thought it may help in getting some perspective.

haisum on 30 Mar 2016

Let me know if I am link spamming here but I think this also seems relevant example. https://github.com/wincent/command-t/blob/master/doc/command-t.txt. It's open source and super fast vim plugin for file searching.

haisum on 30 Mar 2016

No worries, keep 'em coming :+1:

bpasero on 30 Mar 2016

You guys thought about using something like lucene to do that sort of
pattern matching? Just index all the source in the background. Pretty quick
and portable (but we don't use it for source indexing)

On Wed, Mar 30, 2016 at 9:55 PM, Benjamin Pasero [email protected]
wrote:

No worries, keep 'em coming [image: :+1:]

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Microsoft/vscode/issues/55#issuecomment-203329911

Justin Romaine
Senior Systems Architect
Spark Dental Technology
justin.[email protected]
ph 021 764 506
hm 09 445 9166

justin-romano on 30 Mar 2016

The fact that we need to search source code might make Lucene a less ideal candidate. We also need to support regular expression searches.

bpasero on 31 Mar 2016

true.

On Thu, Mar 31, 2016 at 6:05 PM, Benjamin Pasero [email protected]
wrote:

The fact that we need to search source code might make Lucene a less ideal
candidate. We also need to support regular expression searches.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
https://github.com/Microsoft/vscode/issues/55#issuecomment-203754273

Justin Romaine
Senior Systems Architect
Spark Dental Technology
justin.[email protected]
ph 021 764 506
hm 09 445 9166

justin-romano on 31 Mar 2016

Please also look into how fzf is implemented. It has a simpler scoring system (just plain subsequence search I believe) but it's extremely fast. It doesn't even build an index AFAIK.

FWIW this would be the killer feature that would most certainly get me to switch to VS Code. But in any case thanks for your hard work.

sunnyps on 15 Apr 2016

@bpasero would it be interesting to have something like: https://github.com/ggreer/the_silver_searcher as a dependency and use it? Seems powerful for file searches.

justMy2Cents

mAiNiNfEcTiOn on 21 Apr 2016

👍1

Not sure what the timeline looks like on this.

jack-guy on 3 Jun 2016

👍13

The fact that we need to search source code might make Lucene a less ideal candidate. We also need to support regular expression searches.

Would it be reasonable to layer in different kinds of results at different times (with some sort of indicator that a search is still progressing)? Even fuzzy matching file names has a fair bit of delay for medium sized projects (with, say, node_modules/ _excluded_)

nevir on 4 Jul 2016

I have added a unit test for measuring file search performance with a large workspace, instructions and results from the optimization with #9380 are in #9545.

chrmarti on 21 Jul 2016

👍1

I'm using "insider build" and using vscode to browse Linux kernel over sshfs.
Even when entering full path like 'mm/slab.c', file search (using Ctrl + P) takes a long time. Also, it seems there is no caching of file paths, so repeated searches in same sub-paths remain slow.

In comparison, sublime text over sshfs is able to fuzzy find files almost instantaneously. It must be caching FS tree. Hitting sshfs (or any network mount) for every find request is not feasible.

nitingupta910 on 21 Jul 2016

👍7

＋1

dpull on 8 Aug 2016

👎1

Please don't comment "+1". Use the reaction button on the comment to show
your approval/appreciation.

On Aug 8, 2016 4:28 PM, "Acai" [email protected] wrote:

＋1

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/vscode/issues/55#issuecomment-238254831,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA9KSJ-l3mTnqiPvafgosqzsBZdo9Qinks5qdz0cgaJpZM4Gk1Ax
.

mAiNiNfEcTiOn on 8 Aug 2016

👍6

Moving to Christoph who is making great progress on tuning this (already for last milestone, continuing in this milestone) 👍

bpasero on 10 Aug 2016

🎉8 👍1

For the fuzzy finding, over in Nuclide, we use fuzzy-native, which are Node bindings for wincent/command-t with multithreading support. It's crazy fast.

zertosh on 23 Aug 2016

👍2 ❤1

Thanks for sharing @zertosh!

Tyriar on 23 Aug 2016

Our fuzzy sorting is reasonably fast at this point. It might still be an overall improvement if we can make it faster (time permitting).

Most of the time is currently spent in the file traversal. I've just switched over to use native commands (find/dir) to do that, yet it remains the main part where time is spent.

My measurements using Pythons os.walk() indicate that it would be faster to use a native module that uses Posix readdir (and a similar API on Windows, like Python does). Implementing this ourselves is out of scope at this time, it would likely benefit the wider node community.

chrmarti on 24 Aug 2016

I find this post very interesting: https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

bpasero on 25 Aug 2016

There is an npm package implementing a simplified version of Boyer-Moore mentioned in the above article: https://www.npmjs.com/package/streamsearch

chrmarti on 26 Aug 2016

👍1

Our full text search matching actually uses RegExp matching on a full line (see https://github.com/Microsoft/vscode/blob/master/src/vs/workbench/services/search/node/textSearch.ts#L121) and part of the reason is that we allow for searches with regular expressions.

I wonder how much faster BM would be in this case given that RegEx matching is highly optimized. We should just start an experiment and measure this 👍

One possible optimization is to not split the file into lines but only do it if a match is found.

bpasero on 27 Aug 2016

Late in the game but how about https://github.com/monochromegane/the_platinum_searcher ?

rubymaniac on 31 Aug 2016

Is a project's directory tree currently being cached in memory? I suspect that a lot of the slowness I'm experiencing with fuzzy search when I use sshfs is that VSCode is remaking requests to build the directory structure unnecessarily. I could be wrong about that though.

jack-guy on 31 Aug 2016

It is cached for quick open (= the file picker with fuzzy matching) but currently not when doing full text searches (planned next).

bpasero on 31 Aug 2016

👍1

I'm seeing the same problem, (Ubuntu 16.04, VSCode 1.5.2, a project with ~1500 non-ignored files). As others have reported, this drastically slows down development time, and so I'm regrettably switching back to ST3 until this is fixed. This is a real shame because I'm really liking the other features of vscode - particularly the node debugger.

Is there any way to see where the fuzzy matcher is taking the most time? In ST there is a console you can open which outputs useful debug information.

fiznool on 13 Sep 2016

@fiznool That's unexpected at this point, could you open a bug report with the output from time (find -L . -type f | wc -l); echo $? when run in your project folder?

chrmarti on 13 Sep 2016

@chrmarti this initially takes a very long time, but this is because I have (amongst other things) a node_modules folder with tens of thousands of files. I expected VSCode to not recurse into here since I have excluded this folder from my project, am I right in thinking that the Quick Find should not search in folders which are excluded from the settings.json configuration?

fiznool on 13 Sep 2016

@fiznool A change I included was to use 'find' to retrieve all files and then apply the exclusion filter on top. 'find' seemed to be fast enough to make up for the additional load, that's not always the case as we now discover. I'm tracking this as #11874.

chrmarti on 13 Sep 2016

@chrmarti as per #11874 has this now been fixed?

I updated vscode this morning to v1.5.3 and the problem still appears to persist.

fiznool on 27 Sep 2016

@fiznool That fix will be in 1.6.

chrmarti on 27 Sep 2016

👍1

I have a relatively large project, that is on a remote drive (mounted in windows) and the file search (ctrl+e) does not find many of the files. Is there maybe a way to increase some timeouts on indexing or something like that, that might be causing this? Or is there a limit on number of indexed files?

VSCode Version: Code 1.6.1 (9e4e44c19e393803e2b05fe2323cf4ed7e36880e, 2016-10-13T16:21:53.542Z)
OS Version: Windows_NT ia32 10.0.14393

PunchyRascal on 3 Nov 2016

@PunchyRascal Opened #14913 to track your issue.

chrmarti on 3 Nov 2016

Though everything has gotten significantly more snappy in recent releases (great work guys!), my experience with VS Code is that it still doesn't:
1) Crawl the directory tree of a project in the background to cache for fuzzy search
2) Cache files as they're revealed in the Explorer pane tree

I think adding one or both of these would greatly improve the overall fuzzy search experience. :)

jack-guy on 26 May 2017

👍8

I mount filesystem via Fuse and it's slow to retrieve all file names via the network. I love Sublime Text because it caches file names and creates search index (in background) that allows me to jump to any file blazing fast. Can you please do the same here, in the VS Code? If you will do it - it will help me switch to the VS Code, because I like autocompletion and other things in it. But super slow (2+ minutes waiting time) for Just To File - it's what stops me from using VS Code right now in our big project.

Do you have plans to deliver this improvement with caching the file tree?

P.S. I use Python extension. Maybe it somehow slows done Jump To File functionality?

P.P.S. I found that after opening a workspace and waiting few minutes - Jump To File now works pretty good. Why not save the cache to use it when I open this workspace next time? I see that it's no native Projects support (and it potentially allows to create a separate folder for the project and save cache and settings there). Why not to use a Project Manager extension as a native solution for all VS Code users?

1st on 12 Jun 2017

👍2

Can't believe something as important as this is still an issue.

I'm using Windows 64-bit insider builds, working on a project mounted on a network share.

Sublime text finds files instantly, project consists of only a few thousand files.

VSCode takes minutes :(

crewone on 27 Jul 2017

👍8

node_modules typically inflates the number of files in a project. Its a sane optimization to give an option to opt-out from search or only opt-in if the file path entered has node_modules in it

sagiavinash on 3 Aug 2017

👍1

I observed a significant improvement in speed of search and quick open when we switched from samba share to NFS share. Now it takes at about 5 - 7 seconds as opposed to ~30 with samba.

PunchyRascal on 3 Aug 2017

Could be my imagination, but did this receive some TLC in the November VS Code update? Doesn't seem to be anything about fuzzy search in the release notes, but my network-mounted drive seems to be indexed in the background and actually fuzzy-searchable now.

jack-guy on 20 Dec 2017

👍1

I thought it was my new machine, but I too have experienced lightning fast searching in the last weeks, even on samba share.

PunchyRascal on 21 Dec 2017

I can reproduce the performance improvement too. This is on the chromium "src" repository with third_party folder excluded. Search across files seems faster too.

sunnyps on 22 Dec 2017

I have opposite anecdotal results, The search index for me is now terrible, and only seems to include files I've already opened. (Version 1.19.1, problem since 1.19)

jamie-pate on 2 Jan 2018

👍1

@jamie-pate Please open an issue (Help > Report Issues to include your setup info) for us to investigate.

chrmarti on 3 Jan 2018

Actually I think it's the workbench.action.quickOpen which was broken/slow, not sure how it's related to search. I left it open overnight and it seems to be working now :flushed:

jamie-pate on 3 Jan 2018

For me it was also very slow, but when going through my settings I found I had set "search.quickOpen.includeSymbols" to true. After setting it to false it becamse a fast as I was used to. Performance becamse slow when opening a project of ~18k files with that setting set to true.

joostmeulenbeld on 2 Feb 2018

"Go to file" is ridiculously slow for too.
I have a Windows 7 machine with SSD and my workspace has 38,426 Files, 11,228 Folders. Searching in VSCode regularly takes 10+ seconds to find a file where Eclipse search is instant! The only time VSCode shows the result quickly is if it's still in the recently opened list.

I was able to alleviate the problem just a little bit by excluding some of my workspace folders by adding them to the search.exclude list in Settings.

Please fix this, it's one of my most frequently used functions!!!

stanislavgeorgiev on 2 Feb 2018

Closing this because quite a lot has happened since 2015 (search has been rewritten entirely twice since then).

Anything else that's slower than it should be, please file new issues.

roblourens on 30 Apr 2018

@roblourens The "Go to file" function still has terrible performance for large projects with 30,000 files. I don't understand how is that possible because even "Find in Files" is a lot faster and that one has to check the entire file contents, not only the file name

stanislavgeorgiev on 30 Apr 2018

I feel like the computer hardware performance is important here, too. Since I started using a new notebook, I think the search has sped up significantly. The CPU load is indeed very high during search operations. So maybe bear that in mind.

PunchyRascal on 1 May 2018

I just published a wiki page for troubleshooting search issues which might also be helpful: https://github.com/Microsoft/vscode/wiki/Search-Issues

For anything else, please open a new issue.

roblourens on 1 May 2018

Was this page helpful?

0 / 5 - 0 ratings