Opentelemetry-specification: Span name: Both low-cardinality (grouping key) and human-readable (display name)

Created on 9 Apr 2020 · 39Comments · Source: open-telemetry/opentelemetry-specification

This came up when I suggested the handler name instead of the route name for the HTTP semantic conventions (#548). I think that the handler name is a better grouping key than the route, but the route is a better display name, as @yurishkuro pointed out. So I think we should relieve span name from this double duty. I'll post ideas as separate comments to allow reactions.

EDIT:

Cf. current spec of span name https://github.com/open-telemetry/opentelemetry-specification/blob/e7fe34ad/specification/trace/api.md#span:

The span name is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name should be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances. That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality.

I used the term "grouping key" here because I think using the span name as a key to group / sample similar spans is the reason why a Span name should have a low cardinality.

api sampling p1 required-for-ga trace

Source

Oberon00

👍1

Most helpful comment

We discussed this the java sig meeting today... My take away from that discussion is that the span name should be the low cardinality part, but we add an optional well-defined attribute like description or detail that can be the high cardinality representation like sql query or http url. This would allow the span name to be used in metrics, while still allowing more interesting visualization when inspecting a specific trace.

At this stage of the development lifecycle, it seems less invasive adding a semantic attribute than an additional required parameter, especially for many cases there won't be a high cardinality name (eg class/method name).

tylerbenson on 22 May 2020

👍3 🚀1 ❤1

All 39 comments

1) Add a new field "GroupingKey" to the span that defines a grouping key. The span name should be used if this is not set.

Oberon00 on 9 Apr 2020

👍1

2) Add a semantic convention for a span attribute grouping_key.

Oberon00 on 9 Apr 2020

👍1

3) Define a resource semantic convention semconv.isgroupingkey:<ATTRIBUTE_NAME>=true that defines a given span attribute as being viable as a grouping key (Problem: what if multiple exist? Have the value be an integer priority?).

Oberon00 on 9 Apr 2020

I forgot to mention this in the SIG meeting today, but this topic was talked about there, so it seems this is an actual issue.

Oberon00 on 28 Apr 2020

👍1

Can you define what a grouping key is? I'm not familiar with this term.

jkwatson on 6 May 2020

@jkwatson Thank you for that question, I think I accidentally coined that term here. I edited the description to hopefully be more clear. The "grouping key" requirement can be found in the spec as a requirement for low cardinality of the span name. I think the intention here was to allow grouping / sampling per span name.

Oberon00 on 6 May 2020

@jkwatson You can think of the proposed grouping key as what an SQL query will look at in a "GROUP BY" clause. All spans with the same grouping key will be grouped together on the backend when being analyzed or presented in the UI.
Eventually, however, it is up to the backend to decide in which cases and for which purposes it applies grouping at all. It could also decide to group using another logic or even custom defined rulesets instead of relying on the grouping key (or span name).

arminru on 6 May 2020

Thanks for the answers, that helps a lot!

jkwatson on 6 May 2020

👍1

I forgot to mention this in the SIG meeting today, but this topic was talked about there, so it seems this is an actual issue.

@Oberon00 can you elaborate? I am not seeing an issue yet, especially since "grouping key" is a concept for machine processing, and why would machines care if they group by extra field or by readable span name (providing there's uniqueness).

yurishkuro on 6 May 2020

@yurishkuro

"grouping key" is a concept for machine processing, and why would machines care if they group by extra field or by readable span name

Exactly! That's the idea behind my suggestions in this issue.

I am not seeing an issue yet

You can see the issue in the PR comment linked in the description (https://github.com/open-telemetry/opentelemetry-specification/pull/548#discussion_r404397569; actually it was your comment 😄). The issue is that if we just want the span name to be a display name, we can drop the requirement for low cardinality and would potentially have a better human-readability (e.g. we could use GET domain.example.com as display name instead of just HTTP GET), or we could drop the requirement for (easy) human readability and would have a better grouping key. The span name needs to satisfy both requirements which obviously means there must be compromises made in some cases.

Oberon00 on 6 May 2020

Sorry, but I don't see "low cardinality" and "human readable" as conflicting requirements. In Jaeger UI, for example, you can select the Service from a dropdown, and the following dropdown is populated by span names - they have to be _both_ human readable and low cardinality (high cardinality will explode the dropdown and make it unusable).

yurishkuro on 6 May 2020

👍1

"HTTP GET" is a very poor display name (it's also a poor grouping key). "GET domain.example.com" is a good display name, but it's an even worse grouping key.
I think this clearly demonstrates that there are cases where this conflicts.

Oberon00 on 6 May 2020

If you want to use your groups for display (as opposed to sampling), you'll need a display name for your group. In that case you can use any one display name from within that group and display something like "'GET domain.example.com' and 32323 similar spans"

Oberon00 on 6 May 2020

To summarize, you dislike the use of span name being both (1) the name for the individual span "instance" and (2) the (low-cardinally) name of the span "class".

I agree. If I'm looking at a single trace, I'd prefer to see

HTTP GET /accounts/123
    HTTP GET /users/123
    HTTP GET /users/456
    HTTP GET /users/789

instead of

HTTP GET /accounts/{accountId}
    HTTP GET /users/{userId}
    HTTP GET /users/{userId}
    HTTP GET /users/{userId}

The former is a more informative summary of the span.

Of course the latter form still needed for grouping (e.g. Jaeger dropdown UI).

Counterargument:

Span attributes are the place to store instance-specific details.

Counter-counterargument:

There are more attributes than can be displayed for all spans in a trace, and there are more span attributes than every backend can be required to understand significant vs. insignificant. If you want something looks like the former trace, you need the instrumentation to name it.

For compatibility I propose the opposite: the span "name" should continue to be the low-carnality span "class." And there should be a standardized attribute name or instance that summarizes the high-carnality span "instance."

pauldraper on 8 May 2020

And there should be a standardized attribute name or instance that summarizes the high-carnality span "instance."

I think having an optional display.name attribute is a reasonable idea (I've seen it being proposed in some Jaeger tickets), but it's certainly not a required field. Tracing UIs have wide flexibility in how they want to display the spans, e.g. many APM vendor tools prominently show the HTTP status code without having to drill down into span details, but surely we're not suggesting that status code should be a recommended part of the display name. Because each span already captures a lot of other attributes, UIs can be intelligent enough to decide which display format is the most useful, and it would not require the application code to duplicate the information it already provides in the span.

yurishkuro on 8 May 2020

👍2

Tracing UIs have wide flexibility in how they want to display the spans, e.g. many APM vendor tools prominently show the HTTP status code without having to drill down into span details

My same feeling.

I like the idea of having an additional, optional attribute (such as display.name) that is defined in the semantic conventions, and that UIs may want to use for display instead of Span name.

carlosalberto on 8 May 2020

👍1

optional

Agreed. No reason to make it mandatory: there is an abundently obvious fallback in span name.

To add a further example for this idea:

Since HTTP clients don't have to pattern-match URLs, they frequently are left without a great low-cardinality names, and the instrumentation is forced to choose something poor like HTTP GET or HTTP GET github.com. The marginal utility of an instance-specific name is large in this case.

Bike shed: The thing I don't like about display.name is "display;" the instrumentation shouldn't be defining display.color or display.lines or such.

pauldraper on 8 May 2020

So back to the initial description: Will this display.nane allow me to merge #548 then?

Oberon00 on 8 May 2020

"HTTP GET" is a very poor display name (it's also a poor grouping key). "GET domain.example.com" is a good display name, but it's an even worse grouping key.
I think this clearly demonstrates that there are cases where this conflicts.

Why "GET domain.example.com" is a bad grouping key?

iNikem on 11 May 2020

My $0.02 was given verbally in the last SIG meeting. I'd like to see a span name with message-formatting syntax, like "GET {http.url}". This is both descriptive and low cardinality, and when it comes time to display, the system can format the message.

jmacd on 11 May 2020

If you care for one more opinion: span name is certainly a "grouping key" for me. A value that define a "class" of spans, as specification states. It is certainly has less information that possible display name of a single span. Every UI vendor can may his own decisions (probably even different ones for different users of that UI) what specific attribute of the span to use in their UI. Span name defines a unit of work, not a particular invocation of that work.

Taking that into account I think the original proposal of #548 is a good one and we should go with it.

iNikem on 11 May 2020

Span name is definitely a grouping key.

The question is whether backends or instrumentations are in a better
position to display a good name/summary for an individual span.

I believe instrumentations should do choose attributes for a display name;
you believe that backends should understand attributes and generate a
display name.

On Mon, May 11, 2020, 13:03 Nikita Salnikov-Tarnovski <
[email protected]> wrote:

If you care for one more opinion: span name is certainly a "grouping key"
for me. A value that define a "class" of spans, as specification states. It
is certainly has less information that possible display name of a single
span. Every UI vendor can may his own decisions (probably even different
ones for different users of that UI) what specific attribute of the span to
use in their UI. Span name defines a unit of work, not a particular
invocation of that work.

Taking that into account I think the original proposal of #548
https://github.com/open-telemetry/opentelemetry-specification/pull/548
is a good one and we should go with it.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/open-telemetry/opentelemetry-specification/issues/557#issuecomment-626896030,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAKWTB4XOD7AA3XEKTDDTYLRRBDYNANCNFSM4MER6NXQ
.

pauldraper on 11 May 2020

Correct. Backends have to understand attributes anyway if they hope to present useful information to their users. "Useful" as in "relevant to users' use-cases". You don't make decisions about UI in sql queries.

iNikem on 11 May 2020

@iNikem

Why "GET domain.example.com" is a bad grouping key?

Because it can have a very high cardinality. See https://github.com/open-telemetry/opentelemetry-specification/issues/270#issuecomment-539363629 EDIT: And https://github.com/open-telemetry/opentelemetry-specification/pull/416/files#r369591004

Oberon00 on 11 May 2020

Ah, I thought you were talking about "GET ". Using full URL is certainly bad for grouping. Even using path portion is very bad, agreed.

iNikem on 11 May 2020

Btw, @Oberon00 , adding to my comment above. I think that "GET domain.example.com" is OK for span name and consequently for grouping key, if it really contains only domain name, without any path info. I assume any given application queries only a fixed number of domains :)

And I again want to express my support for #548. What else has to be done before that PR can be revived and merged?

iNikem on 20 May 2020

tylerbenson on 22 May 2020

👍3 🚀1 ❤1

And I again want to express my support for #548. What else has to be done before that PR can be revived and merged?

For the record, I have since changed my opinion and think that http route information, if available, makes for a better span name than handler name. My current opinion is based on the following example.

Imagine we have a servlet container running Spring MVC application and having auto instrumentation agent attached. When we create SERVER span from servlet request we have full url, which is too high cardinality for span name, and servlet handler name, like MainServlet.doGet. Later Spring MVC kicks in and detects that requests should be handled to a PersonController.showDetails based on the request route /api/v2/persons/{personId}.

In the situation above I think it is sensible to update SERVER span name to route-based name /api/v2/persons/{personId}. But not PersonController.showDetails because that would be strange to servlet instrumentation created span.

iNikem on 13 Jul 2020

👍1

Hi all, I've made a display hint proposal to help resolve this, please have a look: https://github.com/open-telemetry/opentelemetry-specification/pull/730

tedsuo on 23 Jul 2020

"HTTP GET" is a very poor display name (it's also a poor grouping key).

I want to address this argument. For server-side span, there is no excuse in using "HTTP GET" as a span name. A server always has some kind of mapping from URLs to handlers, that mapping most commonly called "route". The route is good both as display string and grouping key, satisfying the span name requirements.

The situation where "HTTP GET" names become necessary are for the client-side spans emitted by generic instrumentation of HTTP frameworks, which are removed from business logic and only have the (potentially high-cardinality) URL to work with. It is often not that interesting to build aggregations of those spans, so while certainly a poor grouping key those names do not affect the analysis that much. Without aggregation requirement, these spans are only useful to be shown in the Gantt-chart style views, where we're dealing with a single span and the UI can replace/augment the span name with the full URL.

yurishkuro on 24 Jul 2020

I think that depends on how you choose to do it.

IMO there isn't a fundamental difference between client tracing and server tracing:

client.programming.function
   HTTP GET client
       HTTP GET server
           server.programming.function

What is instrumented by whom depends.

If you are using a SOAP client, there an obvious automatic name for the operation. If you are using ApacheHttpClient you will have to choose and add the name yourself.

If you are using a Play server, there is an obvious automatic name for the operation. If you are using com.sun.net.httpserver.HttpServer you will have to choose and add the name yourself.

pauldraper on 25 Jul 2020

@open-telemetry/dotnet-approvers are you aware of this issue? It seems you already made a choice here that runs counter the current direction of this discussion when naming the Name DisplayName (see https://github.com/open-telemetry/opentelemetry-specification/pull/730#issuecomment-673882556)

Oberon00 on 14 Aug 2020

I suggest to move this issue for Future milestone. This may be added after GA, not a breaking change.

SergeyKanzhelev on 14 Aug 2020

The issue here is that the purpose of the existing span name is not clearly defined. Sure, you can add a display.name later on (should it be required). But you cannot as easily undo the compromises made in semantic conventions to make the existing span name a more useful display name and a worse grouping key for that.

Oberon00 on 14 Aug 2020

👍1

See spec

The span name is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name should be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances.

the issue is that optimizing a span name for human-readability will sometimes yield different results from optimizing for "most general string", as demonstrated in #548. In that PR, I agree that when "naming" an HTTP span, the route name is better for human consumption, but I maintain that the handler name is a more general string and still identifies a statistically interesting class of spans.

Oberon00 on 14 Aug 2020

Outside of the spec, the span name is generally referred to as the 'grouping key' for operations and instrumenters are advised to avoid high cardinality span names. I agree with your (@Oberon00) overall point, but I would suggest that #548 could just as simply be resolved by turning the handler class/method name into a semantic attribute on a server span.

+1 to Sergey's point about tabling this until post-GA.

austinlparker on 14 Aug 2020

@Oberon00 this problem exists on many platforms already and every platform takes its route to resolve it. OpenTelemetry semantic convention made emphasize on (more) guaranteed grouping. Every app owner/vendor may adjust the span name with custom modules or custom code.

Also our requirement is to have a support for features in some backend. This is not a simple attribute and additional semantic is forced on it. So it would be great to have Jaeger or other backend to demonstrate how to use it.

SergeyKanzhelev on 14 Aug 2020

I'm perfectly fine with tabling #548 and #730, but this issue is not resolved by either IMHO.

So it would be great to have Jaeger or other backend to demonstrate how to use it.

How to use what?

I think I'll propose a one sentence PR for this issue before more confusion arises 😉

Oberon00 on 14 Aug 2020

I proposed #810 to clarify what I think we should resolve before GA.

Oberon00 on 14 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to discard over limited data on span

XSAM · 3Comments

Towards 1.0: increasing velocity

SergeyKanzhelev · 4Comments

Clarification required for Resource attribute values.

cijothomas · 3Comments

Interface stability for implementers and new methods in the spec

Oberon00 · 4Comments

Define the metrics that we expect to capture from RPC integrations

mtwo · 5Comments