The latest version of Arcade enables by default a custom MSBuild logger on Windows builds only. In the aspnet/Extensions repo, this impacts build perf impact on Windows builds significantly.
For example, see this PR: https://github.com/aspnet/Extensions/pull/1709. Here are the observed timings:
Build job | Avg Build Duration (n=4)
--|--
Windows (Arcade logger enabled) | 60+ minutes (build timed out)
Windows (Arcade logger disabled) | 11 min
OSX (Arcade logger disabled) * | 8 min
Linux (Arcade logger disabled) * | 11 min
* This logger is only enabled in Windows so far, so haven't observed build perf changes in *nix yet
cc @jaredpar @ChadNedzlek
@natemcmaster do you have an ETL trace that I could take a look at?
I don’t have a trace, but this repros consistently on CI on the aspnet/Extensions repo. I worked around it by breaking the “don’t-touch-this” rule of eng/common/ and setting $pipelinesLog = $false in tools.ps1
cc/ @rainersigwald
Yeah, I suspect switching to INodeLogger is in order. I'll do that, just a sec.
The event forwarding isn't the problem here. Lots of discussion in https://github.com/dotnet/arcade/pull/2811.
How often would ProjectFinished get called for these builds? The telemetry part is making one http call to app insights for every ProjectFinished event. If that is called many times per project for the various targets that project references call that might be the problem.
That was a great insight, @alexperovich! I added a couple of commits to https://github.com/aspnet/Extensions/pull/1719 to disable the telemetry--on Windows I unset one of the is-this-a-build-machine properties, and on *nix I used configured the forwarding logger to NOT send ProjectFinished events. Both builds completed quickly.
You will get a ProjectFinished for the multiplication of:
I hacked a local copy of the logger to write a line every time ReportToAnalytics was called in the Extensions build: CallsToReportToAnalytics.txt. There are about 13,000.
I think we should fix this in two phases:
Does that sound reasonable @ChadNedzlek?
Why are we sending telemetry on every build event? That is definitely something we should not be doing for this exact reason.
Networking during build, or PR, should always be treated with immense skepticism. It will fail the question is just a matter of how often. Rather than sending telemetry during our build we should be logging it somewhere, to disk likely, and then have a follow up stage in our pipelines that uploads it to wherever.
Regardless of the telemetry implementation, it is slightly tangential to my understanding of the initial intent of the Pipeline logger. As far as I understand things, now is a good time to examine how we're handling telemetry. I'd prefer that we step back, disable the telemetry element from the pipeline logger and agree on a reasonable cohesive approach to telemetry in our builds. The initial implementation of the pipeline logger would still provide value as it surfaces contextual error information to developers / AzDo summary page / GitHub checks.
@markwilkie @ChadNedzlek @Chrisboh thoughts?
I agree with @jaredpar and @chcosta: back it out, redesign telemetry.
One easy possibility would be to split the telemetry (as currently written) into its own logger, then replay the .binlog produced during the real build through it in a separate step. Or, even better, as a different phase in a release that's independently retryable and doesn't fail the build itself. That would also allow getting telemetry on the first-phase restore.
My hope is that we can take advantage of the fact that @rainersigwald is local and hash a strawman plan for telemetry out in 30 minutes sometime today. The need for the data is urgent....
Most helpful comment
Why are we sending telemetry on every build event? That is definitely something we should not be doing for this exact reason.
Networking during build, or PR, should always be treated with immense skepticism. It will fail the question is just a matter of how often. Rather than sending telemetry during our build we should be logging it somewhere, to disk likely, and then have a follow up stage in our pipelines that uploads it to wherever.