Arcade: [Signing] Collect data for reliability telemetry

Created on 15 Jul 2020 · 7Comments · Source: dotnet/arcade

[ ] This issue is blocking
[ ] This issue is causing unreasonable pain

We want to be able to collect data during/after signing to track reliability of the functionality. This includes the following data: errors/exceptions, diagnosability logs/telemetry, and probably audit (which files were actually signed and which were not – and why).

We most likely will be able to utilize the Timeline API to capture this data for us. And then build dashboards in Grafana to visualize the data.

Things we want telemetry for (for both post-build and in-build signing, where applicable)

Number of files signed
Time for unpacking
Time for signing

Source

missymessa

Most helpful comment

App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.

If you think this would work fine with, say 50% of the data, then App Insights is fine. :-) If it's critical to get all of it, other options are better. There are lot of options, but we need to know what questions we might want to answer with the data to make sure we choose the right tech.

I can't tell if this is because we need to do something about it (in which case, shouldn't we just fail the build?). Is it just to validate some vague feelings we have? AI is good for vague feelings. Is it something we need to do big, complicated queries over long periods of time, Kusto is good.

ChadNedzlek on 20 Jul 2020

👍2

All 7 comments

Custom Telemetry categories: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/Documentation/Projects/DevOps/CI/Telemetry-Guidance.md#arcade-msbuild-logger-support
AzDO error messages: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/Documentation/Projects/DevOps/CI/Telemetry-Guidance.md#writing-to-the-timeline-api

If you're working in powershell, we have some functions to write in the correct format already: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/eng/common/pipeline-logging-functions.ps1#L15

adiaaida on 15 Jul 2020

The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?

adiaaida on 15 Jul 2020

👍1

The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?

If we can track this info with the console output, I think that will probably work, since it's a console program that will be calling the action that does the signing.

missymessa on 15 Jul 2020

We might need to do some changes to the Arcade's sign task so we log the way we need, but I think this should be possible assuming we can call the API through logs

jcagme on 15 Jul 2020

👍1

Looks like Chad found an AppInsights library we can use in our command line tools for collecting telemetry. He's already included this in darc with this PR: https://github.com/dotnet/arcade-services/pull/1300

This will probably be a better solution than using the Timeline API

missymessa on 20 Jul 2020

App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.

ChadNedzlek on 20 Jul 2020

👍2

Triage: I'll schedule a meeting to discuss this wrt Post-Build signing