Runtime: DiagnosticSource for all: Consistency & Accessibility

Created on 8 Apr 2017  路  16Comments  路  Source: dotnet/runtime

Since this involves many repos and projects, I'm filing it here as directed:

I've been working to integrate MiniProfiler with ASP.NET Core, Entity Framework Core, MVC Core, etc. and I've been pointed many times at DiagnosticSource. So while implementing EntityFramework I found several inconsistencies in how DiagnosticListener is used in one repo. Issues for that (so far) are:

But when wrapping up Entity Framework and moving to port MVC from view wrappers (needed in MVC <= 5 days) to DiagnosticListener, I see even more inconsistencies. Here's a summary of the problems I, as a consumer, see with the API as it's applied currently:

  1. The arguments sent to the listener (often via .SubscribeWithAdapter()) is usually an anoymous object. But to consume this object, your parameter types and names on unrelated methods need to match exactly.

    • How are users to even find these arguments, without digging through source?
    • What ensures these aren't breaking? This approach seems very brittle overall.
    • In Entity Framework strong types exist, but they're all internal. Note that the unit tests use them.
    • Note that Entity Framework is working around easy consumption limitations in .NET Core reflection (https://github.com/dotnet/corefx/issues/4672)
    • Also of concern here: you can't use adapters where you can't use IL-emit for them, since that's how they're attaching at runtime.
  2. Identifiers are sporadically available for correlation. For example https://github.com/aspnet/EntityFramework/issues/8007 addresses this issue with data readers in Entity Framework. Without an ID of some sort (EF uses a Guid on connections and commands), the linear nature of DiagnosticListener logging is without a way to correlate without adding far more overhead to the entire chain. For example we're talking about AsyncLocal<T> as a workaround (which adds context switching overhead all the way down the pipe).
    What are we talking about here for 2 events divided that need correlation? I posit that allocating a Guid is almost always the cheapest possible option. Perhaps a correlation method should be built into the diagnostics framework itself.

  3. Why aren't the names exposed? As examples:

    • In ASP.NET MVC, the names are inline (not accessible properties).
    • In Entity Framework Core the properties are cleanly declared, but internal and therefore useless. Note that they're exposed for sanity in the tests, though.

These things together amount to a loose API that's not really friendly for consumption or testing. In many of the above cases, adjustments were made for unit tests to be sane. This consumption is the same set of things any consumer would like to have. I propose we make these changes across the board to MS libraries and enhance usability in 2.0:

  1. Message values get a strong type
  2. These strong types are public (so we can cast and compile against them)
  3. The names of messages issued are public (as constants, so they may be used in [DiagnosticName])

The loose coupling (not even taking a reference on Entity Framework, for example) could still work where it did before with the adapters that exist today. But for those who want compile time checking (often you're taking a listener dependency chain to listen for that specific thing anyway), you'd have everything needed.

This also allows several possibilities with the System.Diagnostics adapters overloads for cleaner code. For example, today I have this:
```c#
[DiagnosticName("Microsoft.EntityFrameworkCore.BeforeExecuteCommand")]
public void OnBeforeExecuteCommand(DbCommand command, string executeMethod, Guid instanceId, bool async)
{
// Available: Guid connectionId, DbCommand command, string executeMethod, Guid instanceId, long startTimestamp, bool async
var timing = command.GetTiming(executeMethod + (async ? " (Async)" : null), MiniProfiler.Current);
if (timing != null)
{
_commands[instanceId] = timing;
}
}

Instead, we could have this (while existing adapters continue to work):
```c#
[DiagnosticName(RelationalDiagnostics.BeforeExecuteCommand)]
public void OnBeforeExecuteCommand(BeforeExecuteCommandMessage message)
{
    var timing = command.GetTiming(message.ExecuteMethod + (message.Async ? " (Async)" : null), MiniProfiler.Current);
    if (timing != null)
    {
        _commands[instanceId] = timing;
    }
}

In that example I upper-cased the names for consistently in C# and framework patterns, but obviously that's an optional v2 break or not to be had. Note that the adapter could convert them camelcase matching for the proxy method if it struck out on an exact match for compatability. Also note that break would happen at compile time, and that it doesn't rely on conventions, the DLR, or runtime reflection to work.

Relevant doc link: Design Notes on Correlation and Timings in DiagnosticSource events

cc @karelz @pakrym @avanderhoorn @davidfowl @terrajobst

Design Discussion area-System.Diagnostics.Tracing question

Most helpful comment

I really want to see this improved, I do. But it doesn't really seem like anyone cares about a coherent story here from the Microsoft side, so I'm closing this out. I don't have the time or energy to run around making PRs that won't be accepted because they'll all need to be breaking.

In .NET Core 2.0: There still isn't any documentation as far as I can tell. I had to go digging through the source to figure out how to consume Entity Framework Core 2.0 events. The identifiers still aren't constants, so you can't implement a simple IObserver<KeyValuePair<string, object>> with a switch. It's a lot of nasty if statements. It's a shame because C# 7 patterns would have made this excellent.

Of course they'd only do that if the payload was strongly typed. While Entity Framework made great progress on this, ASP.NET MVC didn't change a thing. It's still using anonymous types for the payload (example: BeforeView).

Okay fine, so we'll use the DiagnosticAdapter and do a .SubscribeWithAdapater(), so at least we can get past the drastic inconsistencies between Microsoft libraries (I say Microsoft libraries because if we can't even get a consistent story from the team building it, then the ecosystem is hopeless). But no, we can't do that either, because between 2.0.0-preview2-final and the 2.0.0 release, netstandard was removed from the Microsoft.Extensions.DiagnosticAdapter package.

So let's sum it up. To implement a diagnostic listener in .NET Standard 2.0 a developer has to:

  1. Browse the source to see how every person uses it
  2. Put together some ugly code to do it (no constants anywhere - so you either if everywhere or manually lookup and copy even ID stings. AFAIK, there is zero documentation. This is the biggest indicator that no one cares about this API surface.
  3. Do...what with the adapter? @davidfowl suggested I just copy the source for the adapter, and that's likely the easiest route. Is that aun unblocker? Yeah sure. But it's also insane you'd have to do that.

Let me enumerate why I don't honestly think Microsoft as a whole seems to care about this API:

  1. There is no documentation on events from any of the packages (if there is, I can't find it)
  2. Payloads are wildly inconsistent: naming, anonymous vs concrete types.
  3. Event names aren't available as constants
  4. There isn't a single example of real-world usage I can find. Only very trivial examples, and the test cases are just making sure things work, they're polar opposite of usable code (they're iterating over all cases). ASP.NET MVC tests are the closest thing here, but they also rely on DiagnosticAdapter.
  5. Things like ApplicationInsights aren't using it consistently or at all, depending on the area in question.

This API is extremely frustrating, horribly inconsistent, and arguably more broken than before I started. I hope someone actually makes this a first class citizen at some point in the future, but it'll take at least a major version break to do that. And every consumer will need to rewrite their code to use it. Again.

Every time I try to approach this thing it only ends in hopeless frustration. I'm done. Closing this out.

All 16 comments

+ @vancem @brianrob

I'll try to answer some of these:

How are users to even find these arguments, without digging through source?

We should do better do document them, they haven't been so far.

What ensures these aren't breaking? This approach seems very brittle overall.

That's part of the design. We can't break these events because all of the tools that consume them would break. Unit tests ensure that they don't break. It's up to the teams writing these events to understand that it is a contract, we can make that more clear if it isn't already.

How do we ensure a correlation method is consistently available?

Activities solve this but it's not RTM yet, it's in a newer version of DiagnosticsSource and not everything is using it yet (which is an async local). Instead, you can use your own async local to track begin/end events today.

Also of concern here: you can't use adapters where you can't use IL-emit for them, since that's how they're attaching at runtime.

You don't need to use ref emit to consume but because of the loose coupling you would need to fallback to a more manual programming model of KeyValuePair<string, object>.

There is supposed to be a loose coupling between the producer and consumer. ApplicationInsights , Glimpse, MiniProfiler and any other tool that wants to consume diagnostic information should not have to compile against MVC, EF or whatever framework is publishing events. There is a contract but you don't need a compile time dependency on a dll to consume it. That was one of the pillars of diagnostic source when @avanderhoorn and @rynowak designed it in the first place (it was driven by Glimpse). That might answer some of the questions about why types an strings aren't exposed, because even if they were, you were never suppose to use them.

What we could do is publish a bunch of types as source that consumers can use to reason about the events and their arguments. So far we've been pretty inconsistent about that.

/cc @rynowak @avanderhoorn @pakrym @lmolkova

There is supposed to be a loose coupling between the producer and consumer. ApplicationInsights , Glimpse, MiniProfiler and any other tool that wants to consume diagnostic information should not have to compile against MVC, EF or whatever framework is publishing events. There is a contract but you don't need a compile time dependency on a dll to consume i

This is really a feature of https://github.com/aspnet/EventNotification/tree/dev/src/Microsoft.Extensions.DiagnosticAdapter

There are also some limitations around APIs that just aren't that great for use of the adapter. In general collections of polymorphic data don't work well. In general when you're in one of these lame scenarios you need to fall back to dynamic or to manually create proxies.

You could also skip the adapter entirely and use IObservable + dynamic

For MVC we have types and tests that use the adapter. It was ideally part of our roadmap to release these as a package if there was enough usage of the adapter + diagnostic source - but we didn't see any uptake for 1.0.0 and haven't revisited that decision.

As far as breaking changes and contracts go, this is a loosely coupled contract, and it's best-effort. It's on to us to document things and not break them, but that's never 100% when you're talking about major versions. Part of the design of this approach is that you should be able to degrade gracefully if a property you're looking for is missing (for example).

@vancem ping?

I really want to see this improved, I do. But it doesn't really seem like anyone cares about a coherent story here from the Microsoft side, so I'm closing this out. I don't have the time or energy to run around making PRs that won't be accepted because they'll all need to be breaking.

In .NET Core 2.0: There still isn't any documentation as far as I can tell. I had to go digging through the source to figure out how to consume Entity Framework Core 2.0 events. The identifiers still aren't constants, so you can't implement a simple IObserver<KeyValuePair<string, object>> with a switch. It's a lot of nasty if statements. It's a shame because C# 7 patterns would have made this excellent.

Of course they'd only do that if the payload was strongly typed. While Entity Framework made great progress on this, ASP.NET MVC didn't change a thing. It's still using anonymous types for the payload (example: BeforeView).

Okay fine, so we'll use the DiagnosticAdapter and do a .SubscribeWithAdapater(), so at least we can get past the drastic inconsistencies between Microsoft libraries (I say Microsoft libraries because if we can't even get a consistent story from the team building it, then the ecosystem is hopeless). But no, we can't do that either, because between 2.0.0-preview2-final and the 2.0.0 release, netstandard was removed from the Microsoft.Extensions.DiagnosticAdapter package.

So let's sum it up. To implement a diagnostic listener in .NET Standard 2.0 a developer has to:

  1. Browse the source to see how every person uses it
  2. Put together some ugly code to do it (no constants anywhere - so you either if everywhere or manually lookup and copy even ID stings. AFAIK, there is zero documentation. This is the biggest indicator that no one cares about this API surface.
  3. Do...what with the adapter? @davidfowl suggested I just copy the source for the adapter, and that's likely the easiest route. Is that aun unblocker? Yeah sure. But it's also insane you'd have to do that.

Let me enumerate why I don't honestly think Microsoft as a whole seems to care about this API:

  1. There is no documentation on events from any of the packages (if there is, I can't find it)
  2. Payloads are wildly inconsistent: naming, anonymous vs concrete types.
  3. Event names aren't available as constants
  4. There isn't a single example of real-world usage I can find. Only very trivial examples, and the test cases are just making sure things work, they're polar opposite of usable code (they're iterating over all cases). ASP.NET MVC tests are the closest thing here, but they also rely on DiagnosticAdapter.
  5. Things like ApplicationInsights aren't using it consistently or at all, depending on the area in question.

This API is extremely frustrating, horribly inconsistent, and arguably more broken than before I started. I hope someone actually makes this a first class citizen at some point in the future, but it'll take at least a major version break to do that. And every consumer will need to rewrite their code to use it. Again.

Every time I try to approach this thing it only ends in hopeless frustration. I'm done. Closing this out.

Amen @NickCraver !! I wanted to build a .NET version for OpenTracing.io but I pretty much gave up because all of this stuff in System.Diagnostics is extremely frustrating. And thanks to the no breaking changes policy, there's basically no hope - on the contrary, every future addition will just make it worse. Building a new API like OpenTracing probably doesn't help either as it a) is yet another Diagnostics API and b) doesn't have a chance against a built-in API anyway.

IMO the current diagnostic APIs are bad for everyone:

  • The instrumentation code in e.g. ASP.NET Core with its ugly mix of EventSource, DiagnosticSource, Activity and ILogger is just ridiculous - and there isn't even proper support for metrics yet (perf counters, EventCounter)!! Is this really how the community should instrument its libraries?

  • You've already explained very good how hard it is to write profilers. I fully agree with your concerns.

  • And application developers also have to spend a tremendous amount of time to set up a proper logging/diagnostics system where they merge the output of all of these different APIs back together. There's stuff like the new EventFlow-library but TBH that's also just yet another layer and doesn't reduce the complexity as you still have to know everything about the underlying sources.

I'd love to see this discussed properly someday - e.g. in a stand-up but I'm also afraid that this might be hopeless and wasted energy. 馃槥

The instrumentation code in e.g. ASP.NET Core with its ugly mix of EventSource, DiagnosticSource, Activity and ILogger is just ridiculous - and there isn't even proper support for metrics yet (perf counters, EventCounter)!! Is this really how the community should instrument its libraries?

It's fair to complain about that from an extremely high level point of view and it's easy to say "just use a single abstraction" but there are good reasons that those are different. Now if you're saying that we need better guidance then I agree. We've mostly come up with some ad-hoc rules on when to use what but it's not clearly laid out anywhere.

Metrics and Counters are being tackled for 2.1. We've started discussions on this and are currently experimenting with dashboards.

And application developers also have to spend a tremendous amount of time to set up a proper logging/diagnostics system where they merge the output of all of these different APIs back together.

I'd love to see what data you were trying to gather out of applications. It's very easy to say, "just funnel everything into the same sink" but depending on what level of structure you have, that might not make any sense (you might also end up with duplicate entries).

There's stuff like the new EventFlow-library but TBH that's also just yet another layer and doesn't reduce the complexity as you still have to know everything about the underlying sources.

I'm actually a fan of things like event flow (I call it "in-proc logstash for .NET"). There will always be multiple sources of data that need to be crunched into some format and shipped off somewhere. This just makes it easier.

You've already explained very good how hard it is to write profilers. I fully agree with your concerns.

We absolutely suck here. Diagnostic source is the thing you want to use for this and we need to get our act together and just bring some consistency. This seems achievable.

From my end, metrics in other apps using other frameworks, we tend to push them out to statsd, there we do aggregation/bucketing out of proc, and then forward it normally something like telegraf is used, with influxdb. Not sure what the .net story is but looking at this echo system might give some ideas for a xplat story.

It's fair to complain about that from an extremely high level point of view and it's easy to say "just use a single abstraction" but there are good reasons that those are different.

I don't know... I think the problem is that all of the current systems focus too much on output scenarios instead of instrumentation scenarios. EventSource was written to do out-of-process logging, ILogger was also written to do out-of-process logging (who doesn't love some internal competition? :smile:), DiagnosticSource was written to do in-process logging, Activity (being an extension of DiagnosticSource) is something in the middle and PerformanceCounter/EventCounter are for out-of-process.

To cover all needs, one has to call each of them separately. To add context to these calls, one has to know that ILogger has simple-type args and scopes; EventSource has only simple-type args; DiagnosticSource can receive complex types but one has to decide about whether to use anonymous objects or custom types; Activity has all of DiagnosticSource plus baggage/tags [and I don't know much about EventCounter/PerfCounters]. In addition, one has to define some constant event IDs and string identifiers (with different naming schemes of course).

All of this covers only official Microsoft APIs and only the "latest" ones (there's no Console.WriteLine, Trace, TraceSource, etc).

The result is, that "just tracking an incoming request" requires a file with 283 lines. (It contains some legacy events but it doesn't contain metrics calls so it's still fair IMO.)

And everyone who writes a framework that processes incoming stuff (e.g. a messaging framework) has to do exactly the same thing as ASP.NET Core. If she doesn't (and e.g. only uses ILogger), there's automatically a huge part of the community that will have a bad experience as they are forced to use adapters/bridges (because they e.g use mostly EventSource).

I don't want to believe that there's no way to simplify this.

I'd love to see what data you were trying to gather out of applications. It's very easy to say, "just funnel everything into the same sink" but depending on what level of structure you have, that might not make any sense (you might also end up with duplicate entries).

It's not about "funneling everything into the same sink" but about getting everything you want/need from the different sources in the first place. Unfortunately, many libraries don't have documentation about their logging capabilities. With libraries that use ILogger (or any other 3rd party API) there's at least the visible dependency in NuGet but with libraries that use EventSource (like many Azure SDKs do), people basically have to look at the source code to figure out what kind of logging the library offers and what kind of log events there are.

As soon as you have logs from ILogger and EventSource, you need some kind of bridge/adapter. All of this takes time, increases complexity [my main issue] and probably decreases performance [even if just by a little].

Anyway, this is off-topic here. As I've said, I'd like to see some "meta"-discussion about this somewhere someday.

@cwe1ss What's your dream API? Paint me an end to end picture that would be happy with for all actors:

  • Framework authors
  • People consuming logs/metrics/events in applications
  • Consuming data from the runtime itself (including native code)
  • Tools that need to get diagnostics data out of proc
  • Tools that need diagnostics in process (glimpse)

@davidfowl Coming up with "the right" API is obviously a very tough and long-running process so this must be done in a working group. Here's just some of my thoughts:

  • Microsoft should consolidate their server libraries (ASP.NET Core, Azure) to use ONE out-of-process logging standard. The current mix of EventSource and ILogger is unnecessary and counterproductive.
  • "Potentially long-running operations" (like an incoming ASP.NET request, an outgoing HTTP call, something that invokes user code like a controller action, ...") are one of the most important things to monitor in every server-application, so everyone needs them: In-process diagnostics like glimpse, out-of-process loggers, metrics, etc. For this reason, they should only have to be instrumented once with all necessary data and everyone else should just be a subscriber of this one API.
  • I think that this could be an improved version of Activity as it already supports complex in-process args (for stuff like Glimpse) AND simple-type out-of-process args via tags (for stuff like logging, metrics).
  • With metrics being a subscriber of this, they have a lot more possibilities to do e.g. per-URL metrics etc.
  • Subscribers should be able to indicate whether they need the out-of-process tags. This combined with another "IsEnabled"-check for activity tags should help mitigate performance concerns when there's no out-of-process subscribers.
  • Most stuff that now is two separate DiagnosticSource calls (e.g. in MVC) should also be such an Activity. With this being done consistently, tools like Glimpse/MiniProfiler probably only have to subscribe to activities and no longer have to manually track Start/Stop events.

Microsoft should consolidate their server libraries (ASP.NET Core, Azure) to use ONE out-of-process logging standard. The current mix of EventSource and ILogger is unnecessary and counterproductive.

I personally thing that ship has sailed. Both will exist for the forseeable future. Even re-reading https://github.com/aspnet/Logging/issues/332, the problems are real but I'm not convinced of the solution. Lets remove our abstractions and just use ETW directly 馃槃.

What if we just removed all loggers and told everyone to use diagnostic source?

"Potentially long-running operations" (like an incoming ASP.NET request, an outgoing HTTP call, something that invokes user code like a controller action, ...") are one of the most important things to monitor in every server-application, so everyone needs them: In-process diagnostics like glimpse, out-of-process loggers, metrics, etc. For this reason, they should only have to be instrumented once with all necessary data and everyone else should just be a subscriber of this one API.

Another requirement is that it must be free if nobody is listening.

With metrics being a subscriber of this, they have a lot more possibilities to do e.g. per-URL metrics etc.

Yea, I'm not convinced metrics should be subscription only. Various other systems have specific APIs for writing metrics.

I personally thing that ship has sailed. Both will exist for the forseeable future.

I'm not saying that any one of them has to go completely. EventSource probably shines for low-level code and framework internals and ILogger seems to be a nicer fit for libraries/frameworks/applications. Maybe it's possible to move all Azure libraries to ILogger?!

Another requirement is that it must be free if nobody is listening.

You already (almost) get that with the existing DiagnosticSource.IsEnabled() check.

Yea, I'm not convinced metrics should be subscription only. Various other systems have specific APIs for writing metrics.

I'm not saying there shouldn't be a metrics API. I'm just saying that instrumenting operations doesn't require calling the metrics API explicitly. This is something that should happen behind the scenes if the user needs metrics. (E.g. Application Insights would use its own internal metrics API, a built-in metrics subscriber would use EventCounter etc.)

I personally thing that ship has sailed.

To cite this again... Unfortunately I think so too and this is my main point of frustration with all of this. There's almost no chance of any of this ever getting improved significantly -> WastedHoursThinkingAboutThis += 4;

I'm not saying there shouldn't be a metrics API. I'm just saying that instrumenting operations doesn't require calling the metrics API explicitly. This is something that should happen behind the scenes if the user needs metrics. (E.g. Application Insights would use its own internal metrics API, a built-in metrics subscriber would use EventCounter etc.)

That's still assuming that metrics are an implementation detail of events instead of libraries producing them directly.

To cite this again... Unfortunately I think so too and this is my main point of frustration with all of this. There's almost no chance of any of this ever getting improved significantly -> WastedHoursThinkingAboutThis += 4;

I think it can be improved but low level cross cutting concerns like this require lots of buy in. It's the same reason why it's hard to get people to agree on DependencyInjection libraries.

Maybe diagnostic source is the thing we should focus on.

I hate to throw out a comment on closed issues, resurrecting old discussions, but I think it's sorely needed at this point.

There is a now a push from the Application Insights teams to start implementing DiagnosticSource events into popular 3rd party libraries but @NickCraver has correctly pointed out in that issue that it's not likely to happen with all the inconsistencies, lack of documentation, and seemingly lack of interest, even within CoreFX itself. So what guidance and incentives should 3rd party devs have if Microsoft teams are not leading the way, being the authors of the API in question?

I'm really not trying to drop a stinkbomb here but I (and I guess many others) see great potential in the DiagnosticSource API, now that the Activity API is out there as well. It just needs that little bit of extra "push".

I also know that @SergeyKanzhelev, @lmolkova and @cwe1ss have had numerous back-and-forth discussions on some of this as well so I'd really like to see if there can be some movement on pushing all of this forward in a consistent manner and give it the focus it deserves, since anyone performance minded would love a consistent API to both consume and produce data that can easily plug into a profiler, making life easier and consistent for everyone.

I'm saying all this as a very interested outside party using all of these different frameworks and 3rd party libraries, and getting distributed tracing and profiling working properly today is no easy task, since everyone implements it their own way, so switching providers is a dreadful experience, and seeing the potential right there in front of you, seemingly "just" needing a proper cross-team focus is frustrating to say the least.

As an example, now that there is a working group in action to standardize distributed tracing (not the same thing I know, but very closely related) and requests are coming in to 3rd party libraries to implement what's needed, I think it's time to put proper focus on this also within CoreFX and ASP.NET if this shall have any chance of being successful.

I know I'm simplifying things here, but I'm just hoping this can get things moving forward properly.

Was this page helpful?
0 / 5 - 0 ratings