Opentelemetry-specification: conventions for names such as attribute names

Created on 22 Jul 2020 · 7Comments · Source: open-telemetry/opentelemetry-specification

Do you have some recommendation or guideline for creating names such as span attributes with consistent conventions? I'm looking at the latest specification as well as some "tag names" that were in OpenTracing, and I can't quite follow the pattern.

Is there some sort of "namespace" separation? Is there a recommended intra-name delimiter, apart from the namespace?

For example, I look at http.server_name, and I think, "Ah, the http part must be a sort of namespace, separated from the rest of the name by a dot; and inside the local name (simple name) server_name we use an underscore for word separation."

But then I see net.host.name. What am I to think of that? Is net.host a sub-namespace of the net namespace, and is name the only local name (simple name) within that nested namespace? Or should this really be net.host_name to match http.server_name? Or is http.server_name in fact the anomaly—there is no "namespace" concept, and it should have really been http.server.name as just a single, multi-word name with a dot delimiter?

Or did everyone just throw in these names in an ad-hoc manner with no syntax conventions, and anyone may use any delimiters arbitrarily at any time?

semantic-conventions p2 required-for-ga trace

Source

garretwilson

Most helpful comment

OpenTelemetry needs to provide guidance in this area. As a user, what conventions should I use to create my span attributes? Let's use some "maximum value" attribute as an example.

Should I use the camelCase form maxValue? But that doesn't match any of the official OpenTelemetry attributes in the specification.
Should I instead use the dotted form max.value? That certainly matches things already in the specification such as enduser.id. But what happens when in a future version OpenTelemetry decides that the dot is a namespace delimiter (in line with what @Oberon00 said above)? Then suddenly this name might be grouped in a max namespace, which was not the intention at all.
Should I instead use max_value? That would seem to match things like messaging.message_id. But then why do we have message.id in the examples and net.peer.name in the normative part of the spec?
Should I prefix my attribute name with a namespace, such as myapp.max_value?

And then beyond the semantics there are operability concerns:

How do names like foo.bar and foo_bar interact with automatic span generation logic, for example adding spans automatically for RPC arguments, when Java variable names would use camelCase? Should we instead use variable names that are more compatible with programming language identifiers? Or should we normalize camelCase to dotted or snake_case when creating spans in an automated fashion?
How do names like foo.bar interact with log collection and reporting systems such as fluentd and Splunk? Is it easy for these systems to parse dotted names out of log lines? Should we stick to a more limited set of "word characters" for variable names, without common delimiters inside of them? Or should we transform span names when logging them?
How do the OpenTelemetry naming conventions interact with existing metadata frameworks such as XML or RDF or (more recently) CURIE syntax in HTML5?

(In fact the whole area of semantic framework interoperability seems to be largely ignored in the current spec, but I think it would be useful to be able to integrate different vocabularies of semantic identifiers on a system level, and not just create Yet More Arbitrary Name-Value Pairs.)

These issues are affecting a project I'm working on now. We have to make some decisions on our conventions, and I had been hoping to build on OpenTelemetry's recommendations in this area.

Does OpenTelemetry find these issues important? Is there a group or someone leading the resolution of these issues? If not, is there sufficient interest and are you looking for someone to lead work in this area?

garretwilson on 22 Jul 2020

👍3

All 7 comments

There are several net.host.* attributes, as well as net.peer.*, so I always saw host as as subnamespace of net. But there is nothing formalized.

Oberon00 on 22 Jul 2020

OpenTelemetry needs to provide guidance in this area. As a user, what conventions should I use to create my span attributes? Let's use some "maximum value" attribute as an example.

Should I use the camelCase form maxValue? But that doesn't match any of the official OpenTelemetry attributes in the specification.
Should I instead use the dotted form max.value? That certainly matches things already in the specification such as enduser.id. But what happens when in a future version OpenTelemetry decides that the dot is a namespace delimiter (in line with what @Oberon00 said above)? Then suddenly this name might be grouped in a max namespace, which was not the intention at all.
Should I instead use max_value? That would seem to match things like messaging.message_id. But then why do we have message.id in the examples and net.peer.name in the normative part of the spec?
Should I prefix my attribute name with a namespace, such as myapp.max_value?

And then beyond the semantics there are operability concerns:

How do names like foo.bar and foo_bar interact with automatic span generation logic, for example adding spans automatically for RPC arguments, when Java variable names would use camelCase? Should we instead use variable names that are more compatible with programming language identifiers? Or should we normalize camelCase to dotted or snake_case when creating spans in an automated fashion?
How do names like foo.bar interact with log collection and reporting systems such as fluentd and Splunk? Is it easy for these systems to parse dotted names out of log lines? Should we stick to a more limited set of "word characters" for variable names, without common delimiters inside of them? Or should we transform span names when logging them?
How do the OpenTelemetry naming conventions interact with existing metadata frameworks such as XML or RDF or (more recently) CURIE syntax in HTML5?

These issues are affecting a project I'm working on now. We have to make some decisions on our conventions, and I had been hoping to build on OpenTelemetry's recommendations in this area.

garretwilson on 22 Jul 2020

👍3

That would seem to match things like messaging.message_id. But then why do we have message.id in the examples

Where did you find message.id? It only exists in the gRPC conventions where it is unrelated to messaging.

Oberon00 on 23 Jul 2020

with automatic span generation logic, for example adding spans automatically for RPC arguments

Automatic generation of attribute names is a completely new topic. For example for (g)rpc arguments, you might want to define a semantic convention rpc.args.<paramname> where paramname is the name of the parameter in the IDL. Or, parameters are actually positional and not by name rpc.args.<N> where N is the index 0..n_args.

Is it easy for these systems to parse dotted names out of log lines?

I don't think attribute names were designed to be written to log lines. Although most languages have logging exporters, some of which are intended only for debugging but some of which are AFAIK indeed meant to produce something that could be parsed back.

While it would be helpful to spell out the rules used for attribute names and especially how and whether application-defined attributes should be prefixed with something, it looks like you have some concrete problems you are trying to solve?

Oberon00 on 23 Jul 2020

Where did you find message.id?

On this page: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/rpc.md

It only exists in the gRPC conventions where it is unrelated to messaging.

I suppose you are saying it is unrelated to your "message systems" page. But I don't see how it is relevant whether message.id is related to this other page or not. The reason I opened this ticket is to inquire about OpenTelemetry conventions for attribute names. If one "thing" uses the form message.id, and another "thing" uses message_id, then there seems to be some inconsistencies in the (or lack of) conventions—regardless of whether these two "things" are related.

Surely both of these are about some "message identifier", whether it's the same type of messaging or not. Maybe you can explain to me how the OpenTelemetry conventions can distinguish these cases—that is, what criteria was used to choose the message.id or message_id form. But I think I'm beginning to understand that really there are no such conventions. I would encourage you to think about creating some, and not just on an ad-hoc basis, but after a survey and analysis of other metadata frameworks.

garretwilson on 16 Aug 2020

@garretwilson Thanks for bringing this up. We're working on documenting some conventions for naming in #807 - did you happen to take a look at it? In particular it does discuss the use of dots for namespace vs underscore for multi word. If you have any suggestions for those guidelines or a pointer to frameworks that can be a good reference point, that would be very helpful!

As for message in particular, those developed organically and maybe without enough thought in preventing confusion with each other - it did cause significant confusing to me in a PR before. I filed #812 to look into that.

Thanks again for the feedback let us know if you have any more suggestions!

anuraaga on 16 Aug 2020

@garretwilson Thanks for bringing this up. We're working on documenting some conventions for naming in #807 - did you happen to take a look at it? In particular it does discuss the use of dots for namespace vs underscore for multi word. If you have any suggestions for those guidelines or a pointer to frameworks that can be a good reference point, that would be very helpful!

I took a look at it and added a comment to the pull request. Let me know if there is a better place to leave feedback. Good luck!

garretwilson on 19 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings