Nugetgallery: Support partial search terms in search

Created on 25 Nov 2015  Â·  16Comments  Â·  Source: NuGet/NuGetGallery

Moved from nuget home - https://github.com/NuGet/Home/issues/283 by @csharpfritz

This is in re: this twitter thread: https://twitter.com/sbohlen/status/571267522571468800

Essentially this comes down to the following issue, which (depending on your POV) either merely hamper discoverability or indicate mostly-broken search functionality:
• Partial matches of search terms names are not considered

Unless you know EXACTLY the package name for which you’re searching, it’s nearly impossible to use search to progressively refine/discover the package(s) you’re seeking.

Example: a search for "nhibernate" finds the NHibernate package:
http://www.nuget.org/packages?q=nhibernate

However, a search for "nhib" does not:
http://www.nuget.org/packages?q=nhib

Search Feature

Most helpful comment

What's happening with this issue?
Users are still complaining about how they struggling to "search" for packages in NuGet.
When will this be addressed?

All 16 comments

+1
It is not really a search when you need to know the exact term, that would be a lookup. ;-)
Examples
When looking for Rx-Testing
"Rx-Test" returns 0 results.
"Rx.Test" returns 0 results.
"Rx Test" returns lots of results, but Rx-Testing is not in the first 5 pages of results

/cc @joelverhagen

@johnataylor Am I correct in saying that we'd need to store ngrams for package id's here? Or would running a fuzzy query work?

When no results are found using the standard query, we can run a second query instead? (would need some shingling and tuning of course)

            var fuzzyQuery = new FuzzyLikeThisQuery(200, new PackageAnalyzer());
            fuzzyQuery.AddTerms(q, "Id", 0.7f, 0);
            fuzzyQuery.AddTerms(q, "Title", 0.7f, 0);
            fuzzyQuery.AddTerms(q, "Description", 0.3f, 0);
            fuzzyQuery.AddTerms(q, "Summary", 0.3f, 0);
            return fuzzyQuery;

The ngrams are specifically for the id continuation feature that plugs into the json editor. It might be interesting to have that continuation feature plugged into more places but currently its fair specific.

So this Fuzzy thing might be interesting. Possibly it would simplify the query we create, where we basically OR a bunch of terms together. It would take some experimenting with. The NuGetQuery class is (hopefully) fairly hackable: there is two halves to it: the parsing of the user's input and then the Lucene query generation. This would plug in on that generation side. We think maybe its not exactly right anyhow because currently we lean rather heavily on PhraseQueries because that mimics the regular QueryParser in Lucene but I'm not sure the semantics are exactly correct.

@joelverhagen might be interested in this. We were working together in this area when we cleaned up the query parsing.

@LeeCampbell I know this doesn't solve partial terms, but with this WIP RX Test shows up nicely on top of the list

https://github.com/NuGet/NuGet.Services.Metadata/pull/69

image

For "Rx Test", the result is now on the first page. Still some optimization to do but this is definitely better :-)

(https://www.nuget.org/packages?q=Rx+Test - refresh a couple of times if the package does not show up)

We will be tweaking this further in the coming time.

I just ran into this while looking for MiniProfiler.Mvc4. My search for miniprofiler.mvc yielded zero results while miniprofiler mvc returns a bunch of non-related packages and the one I want is in the middle of page 2. That's not very user friendly.

When you search for "miniprofiler mvc", the search engine performs an "or" search, meaning all packages containing the word "miniprofiler" or "mvc". After that the packages are sorted with a boost by download count. Since packages with "mvc" are more popular, those will appear first.
Partial search is still not supported, so unfortunately miniprofiler.mvc will yield no results.
However, if you search for miniprofiler mvc4 your package will be first.
@yishaigalatzer @qianjun22 , any plans to support partial search?

@xt0rted There are long term plan to support partial search, as this bug indicates.

But I think your issue is different, it is about how we index/search words that include numbers. We should probably break MiniProfiler.Mvc4 to Mini, Profiler, Mvc, Mvc4 and thus search would succeed there without partial search support. We already have the feature, but we probably don't break it enough for this particular case.

CC @joelverhagen

When our camel-case logic is executed on the input token, only transitions _from a lowercase letter to an uppercase letter_ result in a "split" (the algorithm processes the term from left to right). So MiniProfiler becomes Mini and Profiler, but Mvc4 does not become Mvc and 4. We should consider this improvement.

We actually have an open PR here: https://github.com/NuGet/NuGet.Services.Metadata/pull/71
This one splits "Mvc4" into "Mvc" and "4", probably need to update this to also take into account the above example.

What's happening with this issue?
Users are still complaining about how they struggling to "search" for packages in NuGet.
When will this be addressed?

@skofman1 I think we are already handling this request as part of https://github.com/NuGet/NuGetGallery/issues/4124. If so, can this be closed?

Yup. Since we are aggregating all search issues in #4124 closing this one.

Was this page helpful?
0 / 5 - 0 ratings