We want to be able to collect data during/after signing to track reliability of the functionality. This includes the following data: errors/exceptions, diagnosability logs/telemetry, and probably audit (which files were actually signed and which were not – and why).
We most likely will be able to utilize the Timeline API to capture this data for us. And then build dashboards in Grafana to visualize the data.
Things we want telemetry for (for both post-build and in-build signing, where applicable)
Custom Telemetry categories: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/Documentation/Projects/DevOps/CI/Telemetry-Guidance.md#arcade-msbuild-logger-support
AzDO error messages: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/Documentation/Projects/DevOps/CI/Telemetry-Guidance.md#writing-to-the-timeline-api
If you're working in powershell, we have some functions to write in the correct format already: https://github.com/dotnet/arcade/blob/30e2cca9e6f26a843bad0910879d89035645bd20/eng/common/pipeline-logging-functions.ps1#L15
The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?
The which files were signed and which weren't will be challenging/dunno how to do it. Maybe using warnings for the ones that weren't?
If we can track this info with the console output, I think that will probably work, since it's a console program that will be calling the action that does the signing.
We might need to do some changes to the Arcade's sign task so we log the way we need, but I think this should be possible assuming we can call the API through logs
Looks like Chad found an AppInsights library we can use in our command line tools for collecting telemetry. He's already included this in darc with this PR: https://github.com/dotnet/arcade-services/pull/1300
This will probably be a better solution than using the Timeline API
App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.
If you think this would work fine with, say 50% of the data, then App Insights is fine. :-) If it's critical to get all of it, other options are better. There are lot of options, but we need to know what questions we might want to answer with the data to make sure we choose the right tech.
I can't tell if this is because we need to do something about it (in which case, shouldn't we just fail the build?). Is it just to validate some vague feelings we have? AI is good for vague feelings. Is it something we need to do big, complicated queries over long periods of time, Kusto is good.
Triage: I'll schedule a meeting to discuss this wrt Post-Build signing
Most helpful comment
App Insights isn't super great if you _need_ 100% of the data. It can drop data pretty lazily, because the assumption is that stuff happens often enough that statistically, you'll get what you need.
If you think this would work fine with, say 50% of the data, then App Insights is fine. :-) If it's critical to get all of it, other options are better. There are lot of options, but we need to know what questions we might want to answer with the data to make sure we choose the right tech.
I can't tell if this is because we need to do something about it (in which case, shouldn't we just fail the build?). Is it just to validate some vague feelings we have? AI is good for vague feelings. Is it something we need to do big, complicated queries over long periods of time, Kusto is good.