We just saw one user reporting a Bug to Paket: https://github.com/fsprojects/Paket/issues/3035
We got some clear analysis and fiddler captures indicating that this is possibly a bug in the V2 Server API.
Basically you can easily reproduce by opening Postman, sending the request https://www.nuget.org/api/v2/FindPackagesById()?id='NLog'&$skip=100 a couple of times and saving the response.
After comparing the responses you can see that some of them contain the entry for the NLog version 4.4.12 and some do not.
Do we need to provide some context to NuGet API for pagination?
This seems to be quite a serious issue no matter the answer, because
a) Even without context I'd expect a deterministic answer / an error message
b) If no context is needed I hope you can see how this makes it basically impossible to write a working client.
Note: The above request url is basically the next link from the following request: https://www.nuget.org//api/v2/FindPackagesById()?semVerLevel=2.0.0&id='NLog'. This request is not "stable" as well.
Maybe there is some ongoing kind of server upgrade and the response differs depending on which servers answer the request? If yes is there any way to make a client work in these kind of scenarios?
You can find the requests and the responses in the paket issue. I'll attach example responses and headers here as well.
myrequest_next_no_4_4_12.txt
myrequest_next_no_4_4_12.xml.txt
myrequest_next_with_4_4_12.txt
myrequest_next_with_4_4_12.xml.txt
I've dug into this a bit and it seems related to a Lucene index rebuild that was executed last week. We have deployed the new index to one of our search services... but not the other. I looked at the gallery code and it looks like FindPackagesById() internally asks that versions be ordered by "relevance". This doesn't seem appropriate for FindPackagesById() considering how this endpoint is used (restore operations instead of discovery operations).
I have to dig in some more, but my guess is that the new Lucene index (which has some tweaked tokens, ToLowerInvariant instead of ToLower) has a different "relevance" ordering when all of the packages have the same ID. This different should be resolved tomorrow as part of completing the rollout of the new index.
Thanks for reporting this. I'll keep investigating and let you know when I've discovered more.
@matthid, from my end, it looks like the issue is mitigated. I rolled back the affected search service.
Please let me know you run into any other issues.
@joelverhagen thanks for quick fix. My question is: how can we make sure this does not happen again. It's a real serious issue for us
how can we make sure this does not happen again
Better testing, I think. Specifically, we need some tests that verify package versions with the same ID have an order that is well defined. Today, the order is based off of download count then Lucene document ID (which changes periodically due to unlist/relist or ops actions). This is not great for an endpoint that requires paging, as you can see!
Also, I think this has been a problem whenever we rebuild the Lucene index (which we have done countless times over the years) but has been unobserved since we typically deploy our search services with the new index around the same time (... not weekend break in between...).
I've filed https://github.com/NuGet/NuGetGallery/issues/5432 to fix the root cause. Please feel free to re-open this issue if you encounter this issue in production again.
There is another related issue: we asked for "semverlevel 2" but the next
link doesn't include that parameter anymore. In other situations this may
lead to issues as well.
Am 11.02.2018 21:22 schrieb "Joel Verhagen" notifications@github.com:
Closed #5431 https://github.com/NuGet/NuGetGallery/issues/5431.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/NuGet/NuGetGallery/issues/5431#event-1468575296, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AADgNLasobFk6gSuk9kmMGz_E6S7s-dsks5tT0vtgaJpZM4SBWLG
.
@forki, makes sense. Could you file a separate issue with example requests and responses? It helps our planning if there is granular item per problem (as this will probably be a couple fix in a different part of our service than the original issue above). Thanks and nice catch! 👍
Thanks for taking care of this so quickly
Ok I submitted https://github.com/NuGet/NuGetGallery/issues/5433 because this might lead to some subtle issues like we already experienced here.
Most helpful comment
I've dug into this a bit and it seems related to a Lucene index rebuild that was executed last week. We have deployed the new index to one of our search services... but not the other. I looked at the gallery code and it looks like
FindPackagesById()internally asks that versions be ordered by "relevance". This doesn't seem appropriate forFindPackagesById()considering how this endpoint is used (restore operations instead of discovery operations).I have to dig in some more, but my guess is that the new Lucene index (which has some tweaked tokens,
ToLowerInvariantinstead ofToLower) has a different "relevance" ordering when all of the packages have the same ID. This different should be resolved tomorrow as part of completing the rollout of the new index.Thanks for reporting this. I'll keep investigating and let you know when I've discovered more.