Nugetgallery: NuGet.org statistics have been delayed by more than 24 hours, many times, as of late

Created on 29 Nov 2017  路  10Comments  路  Source: NuGet/NuGetGallery

Tracking issue to understand what we should be doing here: Monitoring as well as reducing the lag.

Statistics

Most helpful comment

You asked on Twitter what I do with stats, and how often I check them:

  • I check multiple times per day until I've seen them updated for that day, then stop until the following morning

    • I shout at you guys on Twitter when they don't update, as you've noticed. :)

  • I record daily stats in an Excel spreadsheet, because I find the nuget page rather sparse

What I would like to see is more akin to how you would track a sports team or distributed computing project:

  • Moving 7 day average
  • Moving 30 days average
  • Predicted download total in the next week, month based on current growth rates. Maybe a far metric like 90 days from now

As the developer of a niche but important package that's steadily growing in popularity, I also care about understanding two categories of use:

  • User initiated a downloaded, e.g. by clicked install in Code or Studio
  • Downloads initiated by a CI build or similar process

I know teasing these apart is imperfect, but having a rough picture informs how I think about the development of new features, documentation, and legacy support.

I also wouldn't mind a high-level breakdown of "user probably manually downloaded the package" actions vs "probably downloaded as part of a CI build". I realize this isn't perfect, but

Now that it's possible to get the download counts for packages, I considered rolling my own statistics aggregator the way people used to do with SETI@Home and Folding@Home, but it's been on the back burner for a while.

Here's my spreadsheet showing the poor reliability of late. Numbers are from ical.net, my nuget package. Zeroes are when stats didn't update.

| Date | Download count | Change | 7d moving average|
|------------------|--------|-----|-----|
| Sun, 29-Oct-2017 | 94,423 | 275 | 437 |
| Mon, 30-Oct-2017 | 94,510 | 87 | 439 |
| Tue, 31-Oct-2017 | 94,658 | 148 | 370 |
| Wed, 1-Nov-2017 | 94,658 | 0 | 297 |
| Thu, 2-Nov-2017 | 95,955 | 1,297 | 403 |
| Fri, 3-Nov-2017 | 97,014 | 1,059 | 466 |
| Sat, 4-Nov-2017 | 97,512 | 498 | 481 |
| Sun, 5-Nov-2017 | 97,636 | 124 | 459 |
| Mon, 6-Nov-2017 | 97,713 | 77 | 458 |
| Tue, 7-Nov-2017 | 98,499 | 786 | 549 |
| Wed, 8-Nov-2017 | 99,091 | 592 | 633 |
| Thu, 9-Nov-2017 | 99,573 | 482 | 517 |
| Fri, 10-Nov-2017 | 99,573 | 0 | 366 |
| Sat, 11-Nov-2017 | 100,253 | 680 | 392 |
| Sun, 12-Nov-2017 | 100,761 | 508 | 446 |
| Mon, 13-Nov-2017 | 101,006 | 245 | 470 |
| Tue, 14-Nov-2017 | 101,693 | 687 | 456 |
| Wed, 15-Nov-2017 | 101,693 | 0 | 372 |
| Thu, 16-Nov-2017 | 103,107 | 1,414 | 505 |
| Fri, 17-Nov-2017 | 103,107 | 0 | 505 |
| Sat, 18-Nov-2017 | 104,521 | 1,414 | 610 |
| Sun, 19-Nov-2017 | 105,108 | 587 | 621 |
| Mon, 20-Nov-2017 | 105,313 | 205 | 615 |
| Tue, 21-Nov-2017 | 106,122 | 809 | 633 |
| Wed, 22-Nov-2017 | 106,122 | 0 | 633 |
| Thu, 23-Nov-2017 | 108,130 | 2,008 | 718 |
| Fri, 24-Nov-2017 | 108,806 | 676 | 814 |
| Sat, 25-Nov-2017 | 109,417 | 611 | 699 |
| Sun, 26-Nov-2017 | 109,783 | 366 | 668 |
| Mon, 27-Nov-2017 | 110,144 | 361 | 690 |
| Tue, 28-Nov-2017 | 110,856 | 712 | 676 |
| Wed, 29-Nov-2017 | 110,856 | 0 | 676 |
| Thu, 30-Nov-2017 | 112,471 | 1,615 | 620|

All 10 comments

You asked on Twitter what I do with stats, and how often I check them:

  • I check multiple times per day until I've seen them updated for that day, then stop until the following morning

    • I shout at you guys on Twitter when they don't update, as you've noticed. :)

  • I record daily stats in an Excel spreadsheet, because I find the nuget page rather sparse

What I would like to see is more akin to how you would track a sports team or distributed computing project:

  • Moving 7 day average
  • Moving 30 days average
  • Predicted download total in the next week, month based on current growth rates. Maybe a far metric like 90 days from now

As the developer of a niche but important package that's steadily growing in popularity, I also care about understanding two categories of use:

  • User initiated a downloaded, e.g. by clicked install in Code or Studio
  • Downloads initiated by a CI build or similar process

I know teasing these apart is imperfect, but having a rough picture informs how I think about the development of new features, documentation, and legacy support.

I also wouldn't mind a high-level breakdown of "user probably manually downloaded the package" actions vs "probably downloaded as part of a CI build". I realize this isn't perfect, but

Now that it's possible to get the download counts for packages, I considered rolling my own statistics aggregator the way people used to do with SETI@Home and Folding@Home, but it's been on the back burner for a while.

Here's my spreadsheet showing the poor reliability of late. Numbers are from ical.net, my nuget package. Zeroes are when stats didn't update.

| Date | Download count | Change | 7d moving average|
|------------------|--------|-----|-----|
| Sun, 29-Oct-2017 | 94,423 | 275 | 437 |
| Mon, 30-Oct-2017 | 94,510 | 87 | 439 |
| Tue, 31-Oct-2017 | 94,658 | 148 | 370 |
| Wed, 1-Nov-2017 | 94,658 | 0 | 297 |
| Thu, 2-Nov-2017 | 95,955 | 1,297 | 403 |
| Fri, 3-Nov-2017 | 97,014 | 1,059 | 466 |
| Sat, 4-Nov-2017 | 97,512 | 498 | 481 |
| Sun, 5-Nov-2017 | 97,636 | 124 | 459 |
| Mon, 6-Nov-2017 | 97,713 | 77 | 458 |
| Tue, 7-Nov-2017 | 98,499 | 786 | 549 |
| Wed, 8-Nov-2017 | 99,091 | 592 | 633 |
| Thu, 9-Nov-2017 | 99,573 | 482 | 517 |
| Fri, 10-Nov-2017 | 99,573 | 0 | 366 |
| Sat, 11-Nov-2017 | 100,253 | 680 | 392 |
| Sun, 12-Nov-2017 | 100,761 | 508 | 446 |
| Mon, 13-Nov-2017 | 101,006 | 245 | 470 |
| Tue, 14-Nov-2017 | 101,693 | 687 | 456 |
| Wed, 15-Nov-2017 | 101,693 | 0 | 372 |
| Thu, 16-Nov-2017 | 103,107 | 1,414 | 505 |
| Fri, 17-Nov-2017 | 103,107 | 0 | 505 |
| Sat, 18-Nov-2017 | 104,521 | 1,414 | 610 |
| Sun, 19-Nov-2017 | 105,108 | 587 | 621 |
| Mon, 20-Nov-2017 | 105,313 | 205 | 615 |
| Tue, 21-Nov-2017 | 106,122 | 809 | 633 |
| Wed, 22-Nov-2017 | 106,122 | 0 | 633 |
| Thu, 23-Nov-2017 | 108,130 | 2,008 | 718 |
| Fri, 24-Nov-2017 | 108,806 | 676 | 814 |
| Sat, 25-Nov-2017 | 109,417 | 611 | 699 |
| Sun, 26-Nov-2017 | 109,783 | 366 | 668 |
| Mon, 27-Nov-2017 | 110,144 | 361 | 690 |
| Tue, 28-Nov-2017 | 110,856 | 712 | 676 |
| Wed, 29-Nov-2017 | 110,856 | 0 | 676 |
| Thu, 30-Nov-2017 | 112,471 | 1,615 | 620|

@rianjs thanks for all the details. This helps a lot

The https://nuget.org/stats page was behind by >24 hr due to some SQL timeouts. The team has discussed some small changes we can do soon to help stabilize the jobs that update the reports for this page, including monitoring.

For per-package stats per @rianjs, this implies a delay earlier in the pipeline. If the delay is in data collection, I'm not sure if we can improve this. If the delay is in our initial aggregation (2/day), then we could potentially increase this frequency... which we are discussing.

Why can't we queue another aggregation job as soon as the old job finishes?

It鈥檚 been about 36 hours since the last update again.

And we鈥檙e at the 36 hour mark again.

Up around 60 hours since the last update. Could be as much as 72.

We are looking into this right now.

Hey @rianjs, a couple days ago we deployed a fix that should improve the overall stability of the statistics pipeline as well as increase the number of times a day we update stats. We should now be updating stats two or three times a day instead of just one. I hope it helps you out. Let us know if you have any other issues.

I鈥檝e noticed the twice a day updates for the last day or so. Thanks!

Was this page helpful?
0 / 5 - 0 ratings