Arcade: [Signing] Collect data for reliability telemetry

Created on 15 Jul 2020  Â·  7Comments  Â·  Source: dotnet/arcade

  • [ ] This issue is blocking
  • [ ] This issue is causing unreasonable pain

We want to be able to collect data during/after signing to track reliability of the functionality. This includes the following data: errors/exceptions, diagnosability logs/telemetry, and probably audit (which files were actually signed and which were not – and why).

We most likely will be able to utilize the Timeline API to capture this data for us. And then build dashboards in Grafana to visualize the data.


Things we want telemetry for (for both post-build and in-build signing, where applicable)

  • Number of files signed
  • Time for unpacking
  • Time for signing

Most helpful comment

App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.

If you think this would work fine with, say 50% of the data, then App Insights is fine. :-) If it's critical to get all of it, other options are better. There are lot of options, but we need to know what questions we might want to answer with the data to make sure we choose the right tech.

I can't tell if this is because we need to do something about it (in which case, shouldn't we just fail the build?). Is it just to validate some vague feelings we have? AI is good for vague feelings. Is it something we need to do big, complicated queries over long periods of time, Kusto is good.

All 7 comments

The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?

The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?

If we can track this info with the console output, I think that will probably work, since it's a console program that will be calling the action that does the signing.

We might need to do some changes to the Arcade's sign task so we log the way we need, but I think this should be possible assuming we can call the API through logs

Looks like Chad found an AppInsights library we can use in our command line tools for collecting telemetry. He's already included this in darc with this PR: https://github.com/dotnet/arcade-services/pull/1300

This will probably be a better solution than using the Timeline API

App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.

If you think this would work fine with, say 50% of the data, then App Insights is fine. :-) If it's critical to get all of it, other options are better. There are lot of options, but we need to know what questions we might want to answer with the data to make sure we choose the right tech.

I can't tell if this is because we need to do something about it (in which case, shouldn't we just fail the build?). Is it just to validate some vague feelings we have? AI is good for vague feelings. Is it something we need to do big, complicated queries over long periods of time, Kusto is good.

Triage: I'll schedule a meeting to discuss this wrt Post-Build signing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JohnTortugo picture JohnTortugo  Â·  35Comments

weshaggard picture weshaggard  Â·  42Comments

rainersigwald picture rainersigwald  Â·  24Comments

vatsan-madhavan picture vatsan-madhavan  Â·  28Comments

riarenas picture riarenas  Â·  49Comments