Nugetgallery: Poor/unrelated search results when not using the exact package name

Created on 7 Sep 2020  ยท  3Comments  ยท  Source: NuGet/NuGetGallery

Describe the bug

Searching for the Microsoft.Toolkit.HighPerformance package displays unrelated search results. This happens both on nuget.org and when searching through Visual Studio. It's almost impossible to find that package unless you actually know the exact name.

Not exactly the same, but possibly related to https://github.com/NuGet/NuGetGallery/issues/8130?

To Reproduce

Here's some search results I tried:

โ›” "microsoft high performance" (screen), the package is nowhere to be found
โ›” "microsoft highperformance" (screen), same as above
โ›” "Microsoft HighPerformance" (screen), same
โ›” "microsoft toolkit high performance" (screen), same
โ›” "microsoft toolkit highperformance" (screen), same

โœ… "microsoft.toolkit.highperformance" (screen), works
โœ… "Microsoft.Toolkit.HighPerformance" (screen), works too

Expected behavior

The Microsoft.Toolkit.HighPerformance should be the first result for all these various search queries.

Screenshots

Attached a screen next to each tested query above.

Additional context

In case it helps, right now we have the following tags in the package:

UWP Toolkit Windows core standard unsafe span memory string array stream buffer extensions helpers parallel performance

Though I'd expect the search results to display that package as first result even just based on the package name, as all those sample queries are an exact match for the package name, once tokenized with the . and converted to lowercase.
I wouldn't expect users to search for the package by the exact name (which is also more verbose to type), especially if they just are not aware the package exists at all, so the way the search results are returned right now doesn't help with discoverability.

Thanks! ๐Ÿ˜Š

Search Bug

Most helpful comment

Hi @joelverhagen - I just tried those queries and I got the same results you mentioned, this looks absolutely great!
Thank you so much for looking into this, this is definitely so much better than before! ๐Ÿ˜Š

All 3 comments

Thanks for reporting this! Today, our search rankings heavily prefer packages with many total downloads. This works well for the majority of searches as folks often want heavily downloaded packages like Newtonsoft.Json. However, favoring packages with high total downloads puts new package like Microsoft.Toolkit.HighPerformance at a massive disadvantage, as the term microsoft includes other Microsoft packages with large amounts of downloads.

Furthermore, the term microsoft includes "noisy" results like System.Numerics.Vectors. These "noisy" results aren't actually commonly installed by customers, instead, these are transitive dependencies of other popular packages.

We're considering a few improvements to the rankings algorithm that should help this case:

  1. https://github.com/NuGet/NuGetGallery/issues/7406 - Use recent downloads instead of total downloads for rankings
  2. https://github.com/NuGet/NuGetGallery/issues/7186 - Prefer direct dependencies over transitive dependencies

@Sergio0694, we've rolled out some improvements in this area. Could you try out your queries? from what I can tell:

microsoft high performance - rank 9
microsoft highperformance - rank 1
Microsoft HighPerformance - rank 7
microsoft toolkit high performance - rank 1
microsoft toolkit highperformance - rank 1
microsoft.toolkit.highperformance - rank 1
Microsoft.Toolkit.HighPerformance - rank 1

Seems better to me, overall, but I was wondering what you think!

Hi @joelverhagen - I just tried those queries and I got the same results you mentioned, this looks absolutely great!
Thank you so much for looking into this, this is definitely so much better than before! ๐Ÿ˜Š

Was this page helpful?
0 / 5 - 0 ratings