Nugetgallery: Persist development dependency information in the DB

Created on 22 Mar 2018  路  14Comments  路  Source: NuGet/NuGetGallery

This is useful for the <PackageReference> tab on the package details since the code snippet is has some additional properties when the package is a development dependency.

See: https://github.com/NuGet/NuGetGallery/pull/5656

To roll this out we would need to turn it on for existing packages then perform some kind of data fix for the existing development dependencies.

Note that this data is in the catalog leaves.

Gallery UI Priority - 3 Feature Verified-Dev Verified-Int Verified-Prod

Most helpful comment

The backfill on PROD is complete.

All 14 comments

I've picked this up again after #5656 was merged. I think I should be able to get the metadata into the DB for new packages, but I might need some assistance or at least some pointers on how to "fill in" the metadata for existing packages. I don't think both issues necessarily have to be solved at the same time, though.

The work is at https://github.com/NuGet/NuGetGallery/compare/dev...khellang:developement-dependency-metadata

I don't think both issues necessarily have to be solved at the same time, though.

Agreed. We are already in the inconsistent state so we are free to separate the "new package" case and "existing package" case.

The "reflow" behavior should be able to fill this in if you put the metadata population in the right place (haven't looked at your branch).

We should consider doing just a database fix-up though since reflow also produces a catalog entry which is unnecessary in this case.

Consider an approach like this, which was used for repository URL info.
https://github.com/NuGet/NuGetGallery/blob/master/src/GalleryTools/Commands/BackfillRepositoryMetadataCommand.cs

Yes, your change will make reflow fill in the property. See in the ReflowPackageService.
https://github.com/NuGet/NuGetGallery/blob/0ca0368afbfc1bbf7133e2d72389fc168d35f6ce/src/NuGetGallery/Services/ReflowPackageService.cs#L71-L76

You will need a migration and ensure the assoicated SQL generates a default value on the column of false.

Oh, and I think there is a bit more metadata needed for the .csproj. This is produced by VS:

    <PackageReference Include="Nerdbank.GitVersioning" Version="3.0.6-beta">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
    </PackageReference>

Thanks for the pointers @joelverhagen!

I think this would solve https://github.com/NuGet/NuGetGallery/issues/2836 as well.

a default value on the column of false

Ah, had it as nullable bool and just coalesced it to false, that way you could see which entries actually have developmentDependency set and which entries are missing. Do you think it's OK to just default to false?

The "reflow" behavior should be able to fill this in if you put the metadata population in the right place

This reflow-flow, is that triggered manually? Per package? 馃

Do you think it's OK to just default to false?

Yes, after we have completed the data fix, it makes no sense to have a null value for the bool. In the generic mental model for a NuGet package, it is either a dev dependency or it is not. No null state exists. Representing this state doesn't give us anything extra except for perhaps a marker that the package needs to be reflowed... but this can be determined in a different method such as comparing the bool determined by the .nuspec with the bool in the DB.

For an example of what the migration should roughly look like:
https://github.com/NuGet/NuGetGallery/blob/0ca0368afbfc1bbf7133e2d72389fc168d35f6ce/src/NuGetGallery/Migrations/201709111714021_AddPackageStatusKey.cs#L10-L12

Of course we want a bool instead of an int but you get the idea.

This reflow-flow, is that triggered manually? Per package? 馃

I think the tool should be written to have a "discover" phase that finds all dev dependency packages (perhaps via the catalog) and a "update" phase which sets the true bool on the entity. The "update" phase is very easy. Just EF update. You should consider whatever batching approach used in the tool I linked above.

The "discover" phase is a bit more interesting. The example I gave you above for the repository URL backfill checks all of the .nuspec to determine the Truth. You could go this route or you could consider a forward scan of the catalog. You can look for the developmentDependency bool, e.g.
https://api.nuget.org/v3/catalog0/data/2019.02.04.00.03.43/nerdbank.gitversioning.3.0.6-beta.json

Personally I would just copy and adapt the BackfillRepositoryMetadataCommand which has these two phases and gets the developmentDependency value by downloading and reading the .nuspec for all packages, via the .nuspec endpoint on the package content protocol.

Yes, after we have completed the data fix, it makes no sense to have a null value for the bool.

Agreed. I was just thinking if it makes sense to do it in a three-step process: add nullable bool column, backfill dev dependency data, then make the column non-nullable. But if we don't need the nullable-ness for not-yet-backfilled entries, it makes sense to just have false as a default value 馃憤

Should DevelopmentDependency be added to/used in the following types?

  • PackageHistory - I'm thinking yes, but it should probably be nullable, as I don't know how to back-fill those values.
  • AuditedPackage - This is just a matter of copying over the value from Package. Probably wouldn't be used anywhere (yet), but it's nice for consistency.
  • V1FeedPackage/V2FeedPackage - Is there any point?
  • PackageIndexEntity (and stored/retrieved in LuceneIndexingService/LuceneSearchService/ExternalSearchService) - It's possible to round-trip the value in Lucene, but I'm not sure how to handle missing values in the index. I'm not really sure how the ExternalSearchService works and where the data comes from.

PackageHistory - I'm thinking yes, but it should probably be nullable, as I don't know how to back-fill those values.

I would leave that one alone. It was used for package edit and has not been cleaned up yet.

AuditedPackage - This is just a matter of copying over the value from Package. Probably wouldn't be used anywhere (yet), but it's nice for consistency.

Yes, I would add it here. Some auditing implementations just JSON serialize the package so it will automatically pick up more properties.

V1FeedPackage/V2FeedPackage - Is there any point?

No, we are trying to leave the V1 and V2 protocols as-is from now on.

PackageIndexEntity (and stored/retrieved in LuceneIndexingService/LuceneSearchService/ExternalSearchService) - It's possible to round-trip the value in Lucene, but I'm not sure how to handle missing values in the index. I'm not really sure how the ExternalSearchService works and where the data comes from.

I wouldn't worry about this one. The search results do not use this value.

This is deployed to PROD but I haven't run the backfill tool yet. This means new packages have show the proper thing in the PackageReference tab but old package don't yet.
Congratulations to DotNet.Build.EazfuscatorNet 2.1.0-preview.3.0.19274.7 for being the first to get this feature 馃.

I'll close this issue when I've completed running the backfill tool.

Awesome! Let's hope the backfill command works as it should 馃槄

I guess the logical next question is; should we show this metadata anywhere else? It it interesting information?

The tool worked great on our DEV and INT environments 馃憤.

Perhaps we could plumb this to the VS UI. This would mean enhancing the package metadata endpoint and surfacing it in the client.
https://docs.microsoft.com/en-us/nuget/api/registration-base-url-resource

Perhaps we should come up with some concrete ideas and make new issues.

The backfill on PROD is complete.

W00t! 馃帀 Thanks for walking me through this 馃憣

Was this page helpful?
0 / 5 - 0 ratings