Azure-docs: Needs a full rewrite or severe overhaul and review with green customers

Created on 9 Sep 2019  Â·  4Comments  Â·  Source: MicrosoftDocs/azure-docs

I think this documentation needs a complete rewrite. I'm sorry that this is not specific and unactionable but I haven't managed to understand the product well enough to know why its bad, only that each time I think I understand what its saying, my experiments prove me wrong.

My success rate with Application Insights has been very low, since I began using it, years ago. I only persevere because its so integrated into the rest of Azure.

As background, I once tried to introduce AI to my team in 2015/16 but v1 was lacking, v2 was in beta but the collector used to hang all our dev PCs, so we abandoned it. Fail 1. Earlier this year I tried a PoC again in that same product team and also failed to get metrics out to help track business goals. Fail 2.

Now I'm with new client and I still cannot succeed with this product. My client's customer have also screwed their faces up when we've mentioned it in meetings and pointed us to some other tool. Fail 3.

My situation

I want to add a line of code that tracks an elapsed time. I want this to end up in the Azure Portal as an average of that time, in milliseconds. I want to call this method over and over, and it just be efficient and do the right thing so that I am successful with minimal thinking.

To be clear. I want to make calls to the telemetry client 1 trillion x a second if need be, and the SDK does all the windowing, sampling, aggregation and sending to AI/AM using low-GC code, so I don't have to.

However, if you're expecting me (us) to do all this, then call that out very obviously and give examples of how to do it. If you don't, you're leaving far too much scope for failure with the customer, and that will reflect poorly on our opinion of Application Insights.

Aggregation

Even as I type this, after hours of playing and reading, I still cannot work out if I am supposed to do the aggregation or if the SDK or AI itself does it.

If I am supposed to do the aggregation, then:

  • Over what time period?
  • Why doesn't the SDK do this bit and let me get on with coding my app?
  • Why make thousands of developers write the _same_ aggregation structures?

This paragraph is especially confusing.

> Single value. Every time you perform a measurement in your application, you send the corresponding value to Application Insights. For example, assume that you have a metric describing the number of items in a container. During a particular time period, you first put three items into the container and then you remove two items. Accordingly, you would call TrackMetric twice: first passing the value 3 and then the value -2. Application Insights stores both values on your behalf.

  • It doesn't state what the intention of the measurement is. Does the imaginary developer want to see the average number of items in the container over a period, or a instantaneous count?
  • Its too abstract, how about saying the intention is to see the items in a shopping cart or backlog of a queue?
  • It doesn't explain why this will work and what goes on under the hood. Does the math happen locally in the SDK or on the AI service?
  • It seems to be aggregating or doing some math on my behalf, and yet documentation around it seems to lead me to beleive I am supposed to do this aggregation, or I'm supposed to aggegrate but then only call the SDK API periodically, whatever that period is. But what if there's no traffic and it misses a load of periods?

> In the above example, the aggregate metric sum for that time period is 1 and the count of the metric values is 2.

  • So the average is 0.5 over that time period?
  • What about the starting value?
  • Wouldn't an aggregate for this container count over, let's say 1 second, start by tracking 0, then 3, then 1 and then as the time window rolls, that's a 1.333 average for the 3 measurements that landed in that window/bucket?
  • What is this telling us and does it achieve the developer's intentions?

The values I'm seeing in the portal seem to be sum totals within 1 minute periods.

Misleading

You can see how confused I am and how I'm trying to peice it all together. And then there is this paragraph which also confuses me.

> If your application requires sending a separate telemetry item at every occasion without aggregation across time, you likely have a use case for event telemetry; see TelemetryClient.TrackEvent (Microsoft.ApplicationInsights.DataContracts.EventTelemetry).

  • I scrolled to look at TrackEvent but the documentation doesn't even mention metrics or numeric values at all so I didn't look further into it.
  • Then over 90 minutes later I chanced upon an example under Timing events and find this code along with an example which sounds like my situation.
telemetry.TrackEvent("SignalProcessed", properties, metrics);
  • What is this doing down here so far from the TrackEvent section?
  • When I used the above TrackEvent code to track my time metrics, it still ended up summed over 1 minute periods, as if I need to aggregate it myself.
  • So I was led up the garden path?

Perhaps the API design isn't communicating the inner workings and helping us construct an accurate mental model.


More here:

https://stackoverflow.com/questions/57855145/app-insights-getmetricname-trackvalue-doesnt-aggregate-when-viewing-in-app


Document details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Azure-Monitosvc Pri1 application-insightsubsvc application-insightsvc assigned-to-author doc-enhancement product-question triaged

Most helpful comment

@lukepuplett Thank you so much for your clear & candid feedback. I have created an internal workitem to overhaul the docs in regard to GetMetric as this is clearly causing a lot of confusion and frustration. I am adding the workitem to our October sprint.

I also added the PM who owns custom metrics for App Insights and one of the engineers who is the most familiar with this aspect of the API which will hopefully help answer some of your questions in the interim between now and the updates to the docs.

All 4 comments

Adding @vgorbenko and @macrogreg

@lukepuplett Thank you so much for your clear & candid feedback. I have created an internal workitem to overhaul the docs in regard to GetMetric as this is clearly causing a lot of confusion and frustration. I am adding the workitem to our October sprint.

I also added the PM who owns custom metrics for App Insights and one of the engineers who is the most familiar with this aspect of the API which will hopefully help answer some of your questions in the interim between now and the updates to the docs.

The doc covering this topic in greater depth is now live here: https://docs.microsoft.com/azure/azure-monitor/app/get-metric

Please let us know if you run into any issues, or if there are topics you want us to dig into deeper. Thanks again for the feedback, and helping us to improve the docs!

please-close

Was this page helpful?
0 / 5 - 0 ratings