(Sorry if there is an earlier discussion on this.)
Is there a significant difference between and labels and trace attributes? If not, is there a reason why we are calling out them as attributes in the traces? Having two different concepts for similar/identical things is adding a mental overhead. It would be much easier from the user's perspective if we don't introduce a new concept if not necessary.
@jmacd suggested in https://github.com/open-telemetry/opentelemetry-specification/issues/446#issuecomment-660207029 that they should be the same, but currently labels only allow string values, while attributes allow a bunch of different types https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/common/common.md#attributes
@rakyll You assigned this the "spec:protocol" label. Is this related to the OTLP protocol in any way?
@rakyll we use attributes for things that we allow multiple value types (string, int64, float64, bool, etc.) and labels where the value types is only string. So there is a significant difference between "labels" and "attributes".
The question of using same multiple value types data structure for metrics is different in my opinion.
@Oberon00 it was automatically labelled, I have no rights to add/edit the labels in this organization. Feel free to edit.
@bogdandrutu In the Go library, labels can be of any type: https://pkg.go.dev/go.opentelemetry.io/[email protected]/label#Value
And SetAttributes take a label: https://pkg.go.dev/go.opentelemetry.io/[email protected]/api/trace#NoopSpan.SetAttribute
According to your comment, the Go package is doing something not supported by the spec. Is this correct?
@rakyll This should be fine. The current metrics spec says the following:
For OTLP, the exporter will translate the value into its string representation when populating the labels defined as string/string pairs in the protocol:
message IntDataPoint {
// The set of labels that uniquely identify this timeseries.
repeated opentelemetry.proto.common.v1.StringKeyValue labels = 1;
message StringKeyValue {
string key = 1;
string value = 2;
}
Which is implemented here:
https://github.com/open-telemetry/opentelemetry-go/blob/a12224a454135a5d7ec17831b6b39cf9723c0cdb/label/value.go#L249-L273
https://github.com/open-telemetry/opentelemetry-go/blob/a12224a454135a5d7ec17831b6b39cf9723c0cdb/exporters/otlp/internal/transform/metric.go#L349-L364
@rakyll we use attributes for things that we allow multiple value types (string, int64, float64, bool, etc.) and labels where the value types is only string. So there is a significant difference between "labels" and "attributes".
The question of using same multiple value types data structure for metrics is different in my opinion.
Is this a Java implementation detail or something defined in the specification?
Is this a Java implementation detail or something defined in the specification?
Attributes are defined as such in the API
There's also #376 which is proposing actually extending the types of things that can be added as a span value (maps / arrays).
I think in general this speaks to their usage as well: span attributes are generally metadata to provide more context on the span, while metric labels are used to narrow down or scope the value of a metric.
It's certainly true that there's a subset that applies to both labels and attributes, but we'd probably restrict spans utility unnecessarily by coercing span attributes to strings.
Around 18 months ago (shortly before I joined), OpenCensus and OpenTracing leadership teams were in the early stages of forming OpenTelemetry, and there were some terminological disagreements. The term "Tag" was vexing because it was used in both projects: OpenTracing Tags are currently known as OTel Span "Attributes", OpenCensus Tags are currently known (approximately) as OTel "Baggage". Neither OpenTracing nor OpenCensus used "Label" in relation to Spans or Traces.
In Metrics, OpenCensus uses "Label". Prometheus uses "Label". DataDog's dogstatsd uses "Tags" to mean strings which may or may not have key=value structure. The term "Metric Attribute" is not found in relation to observability, but you could find it defined in relation to, say, "business metrics".
In my view, the term "Attribute" was selected as a compromise, a term that won simply for being available and the least polluted. The term "Tag" was too confusing, and "Label" had connotations indicative of a metrics system.
Now we are left with two terms, "Label" and "Attribute".
One thing that OpenTracing did, which it brings to OpenTelemetry, is the separation of API and SDK. We specify our tracing and metrics APIs in _semantic_ terms because we want instrumentation to have meaning, but so far we have used the terms "Tag", "Label", and "Attribute" as near synonyms.
We have identified performance reasons why a metric system should only deal in string-valued labels/attributes, and we are aware that all legacy metrics systems deal in string-valued labels/attributes, but I do not believe we have found any semantic reasons why metric labels are different from span attributes.
There is a clear desire to translate span events into metric events. We are forced to address the question of what happens when creating a metric event for the duration of any span, with all of its attributes as metric labels. There is not a meaningful distinction here, I think, only a performance distinction. I think we could choose any of the terms "Label" or "Attribute" or "Tag", but we should probably choose a single term, to benefit the user.
We can define rules for translation from structured values into string values. Probably we should recognize that some values, including list- and map-valued attributes, are difficult to represent as strings. These could be defined to as "unrepresentable" for the purposes of metrics export, or something similar. For all the scalar values, however, it should not be a problem to translate structured values into string values. There is not a meaningful distinction to be had between string-valued "3" and number value "3", IMO.
Besides, we face this problem already for metrics exporters that export OpenTelemetry Resource attributes as metric labels, which are structured values in OTLP. There is only a problem translating structured values into string-valued labels when they are lists or maps.
Stepping back, we can't use one term for every observability signal (Spans: Attributes, Metrics: Labels) or we'll run out of words. We could look at existing logging systems to see whether they prefer "Tag" vs. "Label" vs. "Attribute". As these are truly near synonyms, I'll just state my preference: I think "Label" is nicer than "Attribute" because it is shorter, and I think "Label" is nicer than "Tag" because it is less polluted.
We discussed this in the Metrics SIG last week. No one was opposed to changing "Label" to "Attribute", but it's for the sake of velocity more so than for a terminological win.
For reference, in the Java ecosystem Micrometer uses the term "Tag" for additional information on a metric
There was a long discussion about this at the spec SIG today. There are two questions here:
(1) Should we use the same word in both places? General agreement here seemed to be "yes" - this is one part of the spec that we expect many developers to need to interact with as they're adding instrumentation to their code, and the proliferation of terms creates confusion. @bogdandrutu is concerned about overhead in languages that require allocations for objects.
(2) Which word should we converge on?
Pros for "Label":
Pros for "Attribute":
Tentative plan is for @jmacd to drive this, and to let everyone mull over it and then re-discuss (and possibly vote) at the meeting next week.
I think using the same concept across logs/metrics/traces would be great. I understand that a certain pillar might have restriction on the type/value, but that can be covered by the implementation. The key benefit I can see is that it makes things less confusing to the users.
So I vote for having a single, consistent name across logs/metrics/traces. I would prefer either Attribute (my 1st choice, as it is already well established in the tracing API) or Tag (my 2nd choice). Label doesn't seem to be widely adopted outside metrics world.
For C#/.NET, Attribute has a special meaning (and also properties), the current tracing API (Activity/ActivityContext API) has to use a different name SetTag due to historical reasons.
During this discussion, @reyang pointed out that there is a name conflict with "Attribute" in .NET and that that SIG has chosen the term "Tag". The term "Property" was also mentioned in this context.
I believe from a Metrics perspective, either "Tag" or "Label" is the preferred term. We left this discussion with action items for the group: (1) think about this topic, it's a Big Deal from the user's perspective, (2) will someone stand up to defend the term "Attribute"?, (3) discuss the performance implications (@bogdandrutu), (4) find a path forward (assigned to me).
For (3), speaking from the OTel-Go and OTLP perspectives, this change should be a pure rename.
In OTel-Go there is an existing type label.KeyValue that underlies span attributes and metric labels. The existing Metrics API already takes label.KeyValue, which we will continue to use. Current attempts to use a list-valued or map-valued KeyValue as a metric label will fail, probably (currently) in an unspecified way, because it would take relatively-expensive special support to make this happen.
In OTLP there is no need to change the protocol: metrics can continue to use string-valued labels while spans use structured label values. In other words, we can continue to make a performance distinction between metric labels and other kinds of label without having a terminology distinction.
For (4), I'd like to propose some kind of popular vote between (1) Attribute, (2) Label, (3) Tag. We shouldn't rule out Tag simply because it was confusing over a year ago, because--clearly--removing the term Tag did not help with user confusion.
From a user education perspective, I'm going to throw my hat in the ring on standardizing around Attribute.
Conceptually, as pointed out upthread, there is little different between trace and metric attributes. In practice, the details are often unremarkable for the end-user of a telemetry system, as the principle of most obvious action applies. In a trace analyzer, I would reasonably expect to perform queries over attribute values of the appropriate type (for example, querying for values of an attribute key where the value was greater than some integer value I define) but no such utility exists for metric attributes, so it wouldn't necessarily be confusing that 'attributes' don't let me do the same things in both places.
In addition, 'attribute' has a better claim to the actual concept we're describing here by dictionary definition. While they all mean, roughly, the same thing, attribute explicitly describes a causal relationship (when used as a verb) and an inherent quality (when used as a noun) in a way that neither 'tag' nor 'label' do. Attributes, for both metrics and traces, are strongly associated with their telemetry and benefit from the causal relationship (the measurement of the work being performed by a span is attributable to its host, etc. in a similar way that a metric instrument's counter might be)
Finally, while 'attribute' has more characters, there exists sufficient abbreviations (attr, for example) to either allow for maintainers or external contributors to create shorthand or convenience functions. Also, it's 2020, most people have tab complete.
@rakyll as the original poster, would you offer your opinion?
Personally I will go with +1 for the Attribute alternative (given what @austinlparker exposed, and also because it's well established term, as @reyang pointed out).
Do we realize the volume of breaking changes that needs to happen if we decide to rename attribute to labels or vice versa? It is in our APIs, in OTLP protocol specification (and field names which are at least part of the build contract), in the Collector configuration. This breaking changes will affect people who already use our software in production. I am very worried that we want to make a change like this so late in the game.
BTW, Logs already use "attributes" like traces.
For (3), speaking from the OTel-Go and OTLP perspectives, this change should be a pure rename.
@jmacd this is only true if "attributes" for metrics and "attributres" for traces are different thing, which I don't think we will want to do. If we want metrics to have the same "attributes" as traces and logs then it is a breaking change both on the wire and in the field names (the Protobuf message fields).
It would be costly, but better to pay that cost before claiming that we've reach a stable state, and living with the discrepancy long afterward鈥攍ikely forever. By way of version numbers, we're still at liberty to break every user.
Which wire format includes words that would need to change?
By way of version numbers, we're still at liberty to break every user.
That is not correct, at least for parts of OpenTelemetry. We declared OTLP Trace protocol stable and attributes' wire format is part of the guarantees. We cannot make changes to this for 12 months. Field names are the gray area, but wire format is part of the promise. If we just rename attributes to labels maybe that would work, although still not clear if people consider names in Protobuf declarations part of stability guarantees. We can't make any changes to the structure of the attributes though, only simply renaming is maybe a possibility.
I stand corrected. I was thinking only in terms of the derived artifacts I know well, such as the Go libraries.
For Protocol Buffers, I suppose you're considering not just the wire data鈥攚ith field tags remaining as they are鈥攂ut the impact on generated code, which makes use of the field names for language-specific fields, functions, methods, and the like.
For Protocol Buffers, I suppose you're considering not just the wire data鈥攚ith field tags remaining as they are鈥攂ut the impact on generated code, which makes use of the field names for language-specific fields, functions, methods, and the like.
Correct.
IMO we absolutely can't make a breaking change to the trace wire format.
Renaming a field in Protobuf (e.g. renaming it from attributes to label) does not change the wire format, but it is still a breaking change for everyone who consumes the Protobuf file definition. This is less severe than wire format changes since it primarily affects us (OpenTelemetry contirbutors) and not end users, but it can also affect other developers who have custom builds of the Collector or otherwise implement the protocol based on our published Protobuf files.
Again the field names are a gray area, it is not explicitly specified whether the stability guarantees apply to them. We likely should explicitly cover field names in the guarantees, but technically until we do we have a reasonable excuse to say they were not covered. I would only be comfortable doing that if I knew this does not have a significant impact on others (outside OpenTelemetry).
The other aspect is Collector's config file. It contains settings that use words "attribute" and "label" in the setting names. Changing these would be a breaking change for the end users. This is a bit less severe than the wire compatiblity since Collector is only announced to be Beta, so we can make the change since it is allowed for "beta" definition. It is still going to be painful for the users who already use the Collector in production.
To add to the above. We can make a change to metrics wire format because they are not declared stable. So, we can make metric labels look on the wire like trace attributes, but we can't do the opposite.
@tigrannajaryan there was no requirement to change trace format, only possibly the name to call that field tags/labels. The name can be changed because only the OTLP/gRPC is declared stable and for that it does not matter.
Renaming a field in Protobuf (e.g. renaming it from
attributestolabel) does not change the wire format, but it is still a breaking change for everyone who consumes the Protobuf file definition.
Guaranteeing that Protocol Buffer field names won't change weakens the benefit of the system; that's the raison d'锚tre for the field tags. That's what allows you name a field something like "thing_obsolete" or "thing_deprecated" when you introduce a new field (with a distinct tag, but perhaps reusing the name "thing") to replace it. The system intends the tags to remain stable (hence the longstanding convention of writing comments before messages like "Next tag: 6") but allows the names to drift. It's unfortunate if we're making promises that force giving up on that freedom.
Guaranteeing that Protocol Buffer field names won't change weakens the benefit of the system; that's the raison d'锚tre for the field tags. That's what allows you name a field something like "thing_obsolete" or "thing_deprecated" when you introduce a new field (with a distinct tag, but perhaps reusing the name "thing") to replace it. The system intends the tags to remain stable (hence the longstanding convention of writing comments before messages like "Next tag: 6") but allows the names to drift. It's unfortunate if we're making promises that force giving up on that freedom.
I agree. We still have the option to exclude field names from our guarantees, at least until we declare OTLP/JSON protocol stable (because field names are on the wire for OTLP/JSON). I am going to go ahead and make this explicit in the proto repo for now, so that we are free to make field name changes.
FYI, regarding Protobuf field name guarantees: https://github.com/open-telemetry/opentelemetry-proto/pull/225
This was discussed in today's maintainer's meeting, and I've been meaning to synthesize a response to the comments above. Thank you all for your comments.
We are in a position where it is simply too late to change the name "attribute" in the tracing API, since even changing the name of protocol fields results in a breaking change for JSON parsers that are already marked stable.
In the metrics world, the term "label" is well established, but we have come to realize that the term "tag" is also well established for metrics. Like tracing was a year ago, we have a conflict involving the word "tag", and I think we should choose the same outcome for metrics today. Let's choose the term "attribute" for the sake of consistency with OpenTelemetry and the need to find common terminology.
For performance and compatibility reasons, we also do not wish to change the OTLP metrics wire protocol. This means that with the exception of JSON-format payloads written for OTLP v0.5, a terminology change from "label" to "attribute" will not be a breaking change for OTLP metrics. Therefore, we are resolving to make this terminology change in OTLP metrics v0.6: label becomes attribute.
The specification will be rewritten to state that although the attribute concept is universal, the exact representation of attributes varies by telemetry signal. Metric attributes will continue to support only string values (as a performance requirement), and we will specify how metrics SDKs behave when presented with list-value and map-value attributes. This will not, in general, be a problem because exporters are already faced with translating OTel resources (which are structured) into metric labels.
Lastly, I want to acknowledge @austinlparker's point above:
'attribute' has a better claim to the actual concept we're describing here by dictionary definition
I was trying to avoid using the dictionary definition for "attribute" here, _a quality or feature regarded as a characteristic or inherent part of someone or something._ I think we can go too far with this sort of reasoning--for example I'm not sure how I feel about SetAttribute() redefining something that is characteristic or inherent. The word "label" better supports the idea of changing the value of an attribute, and also implies "last-value-wins" semantics. In that sense, ironically, "attribute" is more in-line with metrics (which has no SetAttribute concept) and "label" is more in-line with tracing (because it has a SetAttribute concept).
I'll close this and open a new issue stating that we will rename "Label" to "Attribute" and "LabelSet" to "AttributeSet" in the metrics API and SDK.
Please refer to #1113. Thanks @rakyll for filing this issue in the first place.
@jmacd I agree with everything you wrote except I have some concerns with this bit:
The specification will be rewritten to state that although the attribute concept is universal, the exact representation of attributes varies by telemetry signal.
I think this can be a source of confusion. It probably depends on how exactly you define the 2 different "attributes" in the spec, in the API and the protocol, so it would be good to see some drafts of what your proposal looks like before we fully commit to it.
Most helpful comment
Around 18 months ago (shortly before I joined), OpenCensus and OpenTracing leadership teams were in the early stages of forming OpenTelemetry, and there were some terminological disagreements. The term "Tag" was vexing because it was used in both projects: OpenTracing Tags are currently known as OTel Span "Attributes", OpenCensus Tags are currently known (approximately) as OTel "Baggage". Neither OpenTracing nor OpenCensus used "Label" in relation to Spans or Traces.
In Metrics, OpenCensus uses "Label". Prometheus uses "Label". DataDog's dogstatsd uses "Tags" to mean strings which may or may not have key=value structure. The term "Metric Attribute" is not found in relation to observability, but you could find it defined in relation to, say, "business metrics".
In my view, the term "Attribute" was selected as a compromise, a term that won simply for being available and the least polluted. The term "Tag" was too confusing, and "Label" had connotations indicative of a metrics system.
Now we are left with two terms, "Label" and "Attribute".
One thing that OpenTracing did, which it brings to OpenTelemetry, is the separation of API and SDK. We specify our tracing and metrics APIs in _semantic_ terms because we want instrumentation to have meaning, but so far we have used the terms "Tag", "Label", and "Attribute" as near synonyms.
We have identified performance reasons why a metric system should only deal in string-valued labels/attributes, and we are aware that all legacy metrics systems deal in string-valued labels/attributes, but I do not believe we have found any semantic reasons why metric labels are different from span attributes.
There is a clear desire to translate span events into metric events. We are forced to address the question of what happens when creating a metric event for the duration of any span, with all of its attributes as metric labels. There is not a meaningful distinction here, I think, only a performance distinction. I think we could choose any of the terms "Label" or "Attribute" or "Tag", but we should probably choose a single term, to benefit the user.
We can define rules for translation from structured values into string values. Probably we should recognize that some values, including list- and map-valued attributes, are difficult to represent as strings. These could be defined to as "unrepresentable" for the purposes of metrics export, or something similar. For all the scalar values, however, it should not be a problem to translate structured values into string values. There is not a meaningful distinction to be had between string-valued "3" and number value "3", IMO.
Besides, we face this problem already for metrics exporters that export OpenTelemetry Resource attributes as metric labels, which are structured values in OTLP. There is only a problem translating structured values into string-valued labels when they are lists or maps.
Stepping back, we can't use one term for every observability signal (Spans: Attributes, Metrics: Labels) or we'll run out of words. We could look at existing logging systems to see whether they prefer "Tag" vs. "Label" vs. "Attribute". As these are truly near synonyms, I'll just state my preference: I think "Label" is nicer than "Attribute" because it is shorter, and I think "Label" is nicer than "Tag" because it is less polluted.
We discussed this in the Metrics SIG last week. No one was opposed to changing "Label" to "Attribute", but it's for the sake of velocity more so than for a terminological win.