Nugetgallery: SupportedFrameworks sent back by the V2 search endpoint is incomplete

Created on 25 Aug 2016  Â·  2Comments  Â·  Source: NuGet/NuGetGallery

When using the V2 search service, I have noticed that some packages do not have the SupportedFrameworks field populated.

Example: packageid:Newtonsoft.Json

{
  ...
  "SupportedFrameworks":[],
  ...
}

Others seem fine (validity of results should be examined): packageid:WindowsAzure.Storage

{
  ...
  "SupportedFrameworks": [
    "net40",
    "net40-client",
    "netstandard1.3",
    "aspnet50",
    "win8",
    "wp8",
    "wpa"
  ],
  ...
}

As a side note, the targetFramework field sent in by the gallery and by the NuGet client seems to be ignored by the search service.

/cc @yishaigalatzer

Search Priority - 3 Bug

Most helpful comment

Long answer coming :smile:

Why is data not in search output?

To answer why the output data is not in search, a short answer could be: because the data is not in the catalog as such.

  • https://api.nuget.org/v3/catalog0/data/2016.06.27.12.35.49/newtonsoft.json.9.0.1.json
  • https://api.nuget.org/v3/catalog0/data/2016.08.11.01.57.01/windowsazure.storage.7.2.0.json

Why not? Because it's not added in the sparql query we run.

But that's okay! Really, it is! Because we switched Catalog2Lucene to make use of a newly implemented CatalogPackageArchiveReader and fetch the supported frameworks from the dependencies and file list, which is more reliable anyway. So no worries, supportedFrameworks does not need a place in the catalog.

If you run this repro code (in the NuGet.Services.Metadata project, you will see that supportedFrameworks are all parsed okay, and added to the search index the way we want and expect.

var httpClient = new HttpClient();
var catalogJson = await httpClient.GetStringAsync(new Uri("https://api.nuget.org/v3/catalog0/data/2016.06.27.12.35.49/newtonsoft.json.9.0.1.json"));
var catalogJObject = JsonConvert.DeserializeObject<JObject>(catalogJson, new JsonSerializerSettings
{
    DateParseHandling = DateParseHandling.DateTimeOffset
});

var md = CatalogPackageMetadataExtraction.MakePackageMetadata(catalogJObject);

Console.WriteLine(md["supportedFrameworks"]);

Or, in easy bullet points:

  • Catalog2Lucene is working correctly
  • Newly uploaded packages are processed correctly and their supported frameworks are stored in the index correctly

So _why_ is data not in search output? And why is suportedFrameworks in search for one but not the other?

Sql2Lucene does add suportedFrameworks when we regenerate the index from database, but only for packages where this is actually stored. Last re-index was done on DB, and the two example packages were added from DB.

  • For WindowsAzure.Storage, this was done correctly.
  • For Newtonsoft.Json, this has not been executed correctly, because of this line of code which seems to ignore all frameworks if one of them is null, instead of just skipping the null one and adding all others. So that means this data is not in the database... I checked for Newtonsoft.Json and found that the first one is indeed null, so no supported frameworks are stored.

    • Note that this was by design, but it makes no sense.

    • This needs fixing, here's a new issue: https://github.com/NuGet/NuGetGallery/issues/3215.

newtonsoftnono

(screenshot from a repro, but the logic is the same)

So let's expand our easy bullet points:

  • Catalog2Lucene is working correctly
  • Packages that are uploaded prior to a database-based reindex using Sql2Lucene may not alwaus have their supported frameworks stored correctly, because the data sometimes is not in the database due to this bug
  • Newly uploaded packages are processed correctly and their supported frameworks are stored in the index correctly

Why is data not used for search filtering?

Filtering seems to have been disabled for V2 search at some point and was not ported into consolidated search.

This can of course at some point be added again. This is quite a tricky one to do right though.

  • The easy route is to just add a filter and filter on whatever data comes in from the request. Awesome, as search does not have to understand frameworks, just understand that a filter has to be applied. Not so awesome, as search would not know how frameworks tie together, which ones are compatible, …
  • The complex route is adding NuGet.Frameworks, and have search use a full-blown framework string to search for + expand the search filter into all compatible frameworks. Awesome: smart search! Not so awesome: makes the filter quite complex. We can transform this into a bitset though, making the filter a tad faster. Other not so awesome: we have a dependency on NuGet.Frameworks. If a new framework is added, this means search needs to be updated and deployed.

All 2 comments

Long answer coming :smile:

Why is data not in search output?

To answer why the output data is not in search, a short answer could be: because the data is not in the catalog as such.

  • https://api.nuget.org/v3/catalog0/data/2016.06.27.12.35.49/newtonsoft.json.9.0.1.json
  • https://api.nuget.org/v3/catalog0/data/2016.08.11.01.57.01/windowsazure.storage.7.2.0.json

Why not? Because it's not added in the sparql query we run.

But that's okay! Really, it is! Because we switched Catalog2Lucene to make use of a newly implemented CatalogPackageArchiveReader and fetch the supported frameworks from the dependencies and file list, which is more reliable anyway. So no worries, supportedFrameworks does not need a place in the catalog.

If you run this repro code (in the NuGet.Services.Metadata project, you will see that supportedFrameworks are all parsed okay, and added to the search index the way we want and expect.

var httpClient = new HttpClient();
var catalogJson = await httpClient.GetStringAsync(new Uri("https://api.nuget.org/v3/catalog0/data/2016.06.27.12.35.49/newtonsoft.json.9.0.1.json"));
var catalogJObject = JsonConvert.DeserializeObject<JObject>(catalogJson, new JsonSerializerSettings
{
    DateParseHandling = DateParseHandling.DateTimeOffset
});

var md = CatalogPackageMetadataExtraction.MakePackageMetadata(catalogJObject);

Console.WriteLine(md["supportedFrameworks"]);

Or, in easy bullet points:

  • Catalog2Lucene is working correctly
  • Newly uploaded packages are processed correctly and their supported frameworks are stored in the index correctly

So _why_ is data not in search output? And why is suportedFrameworks in search for one but not the other?

Sql2Lucene does add suportedFrameworks when we regenerate the index from database, but only for packages where this is actually stored. Last re-index was done on DB, and the two example packages were added from DB.

  • For WindowsAzure.Storage, this was done correctly.
  • For Newtonsoft.Json, this has not been executed correctly, because of this line of code which seems to ignore all frameworks if one of them is null, instead of just skipping the null one and adding all others. So that means this data is not in the database... I checked for Newtonsoft.Json and found that the first one is indeed null, so no supported frameworks are stored.

    • Note that this was by design, but it makes no sense.

    • This needs fixing, here's a new issue: https://github.com/NuGet/NuGetGallery/issues/3215.

newtonsoftnono

(screenshot from a repro, but the logic is the same)

So let's expand our easy bullet points:

  • Catalog2Lucene is working correctly
  • Packages that are uploaded prior to a database-based reindex using Sql2Lucene may not alwaus have their supported frameworks stored correctly, because the data sometimes is not in the database due to this bug
  • Newly uploaded packages are processed correctly and their supported frameworks are stored in the index correctly

Why is data not used for search filtering?

Filtering seems to have been disabled for V2 search at some point and was not ported into consolidated search.

This can of course at some point be added again. This is quite a tricky one to do right though.

  • The easy route is to just add a filter and filter on whatever data comes in from the request. Awesome, as search does not have to understand frameworks, just understand that a filter has to be applied. Not so awesome, as search would not know how frameworks tie together, which ones are compatible, …
  • The complex route is adding NuGet.Frameworks, and have search use a full-blown framework string to search for + expand the search filter into all compatible frameworks. Awesome: smart search! Not so awesome: makes the filter quite complex. We can transform this into a bitset though, making the filter a tad faster. Other not so awesome: we have a dependency on NuGet.Frameworks. If a new framework is added, this means search needs to be updated and deployed.
Was this page helpful?
0 / 5 - 0 ratings