I was looking at the JS implementation of the tracing api as we move towards GA, and the tracestate is underspecified. There currently exists no direction on what API methods to interact with tracestate should exist, if any. The JS implementation currently as get(key): string | undefined, set(key, value): void, unset(key): void, and serialize(): string methods. There are a few problems with this:
As a solution, I recommend we specify that there is no public API to modify tracestate, and that the default sdk behavior is to propagate it unmodified for now. This leaves us open to define behavior in the future if we desire, even after GA release. If we do not specify this now, then it will be impossible to specify later without a breaking change in some SIGs.
I still think SpanContext is a part of the tracing API that is not very well designed. I also don't think it fits very well with the post-OTEP66 Context. Copying myself from https://github.com/open-telemetry/opentelemetry-specification/issues/525#issuecomment-605672217:
I feel like with the OTEP 66 Context API there is an opportunity to elegantly rectify this by removing the whole SpanContext from the API and hiding it away in the context. See
- open-telemetry/oteps#58 (aforementioned write-up for issues with SpanContext design; pre-OTEP66) by me
- https://github.com/open-telemetry/opentelemetry-specification/issues/510#issuecomment-600753511 ("Consider using Context as the unique way to specify parenthood" would remove the only remaining actual usage of SpanContext)
- Two older, pre-OTEP66 discussions:
- https://github.com/open-telemetry/oteps/issues/73 "Proposal: Span Context == Span" by @jmacd
- https://github.com/open-telemetry/oteps/issues/68 "Proposal: remove Span" by @yurishkuro
(Note that #510 is still open required-for GA, P2)
As for the TraceState part of the SpanContext: With SpanContext being required to be a final/sealed class (open-telemetry/oteps#58) and in the API spec, TraceState is the only place to store custom information with the SpanContext, which for example, the Dynatrace propagator and also the Dynatrace exporter requires (the propagator reads and writes routing information for the Span which the exporter uses). Note that this would not even be solvable otherwise with a fully custom SDK implementation.
@andrewhsu are you sure it's right to classify it as release:after-ga since it's already there in some form but underspecified?
@dyladan mentioned in the issue description that it would make sense to explicitly specify it as not part of the API, which should hopefully be simple and uncontroversial:
As a solution, I recommend we specify that there is no public API to modify tracestate, and that the default sdk behavior is to propagate it unmodified for now. This leaves us open to define behavior in the future if we desire, even after GA release. If we do not specify this now, then it will be impossible to specify later without a breaking change in some SIGs.
simple and uncontroversial
I just mentioned in my above comment that at least in Dynatrace's usage of the SDK, we do need some way to attach custom information to the SpanContext that is accessed in the propagators and the exporter. Whether this custom information is stored in the TraceState (the custom information is indeed stored and intentionally propagated as a W3C tracestate header entry) or only for manual use by the propagator is a secondary concern, but some way to add custom information to the SpanContext is required.
I would agree that the TraceState should not be accessible to instrumentations, but the current design of OpenTelemetry does not really allow this, with SpanContext being final/sealed and created in the SDK.
If the SpanContext was non-final and had an overridable propagate(newSpanId) method, then extractors (or the SDK in case of root spans) could create it and injectors/extractors could cast to their custom derived classes (cf. also open-telemetry/oteps#58).
If #510 (Consider using Context as the unique way to specify parenthood) was implemented, then we could remove the (TraceState from) SpanContext and store custom information in the Context, but only if also the exporter could get access to the full Context in which a Span was created, which may also be difficult to implement.
I also think this is definitely required-for-ga though.
From the spec meeting: moving it to GA to specify this
per spec SIG meeting: Before GA we need to do one of the following:
@open-telemetry/technical-committee Can this get a priority? I'd vote for P1 since this affects an API in SpanContext, a class that is very central in the Tracing API.
Trying to keep scope as minimal as possible, do I understand the problem correctly?
SpanContext which should be accessible by propagators and exporters.TraceContext is underspecified.If above statements are true, then following are my answers to that.
We currently specify that TraceState follows W3C specification. That W3C specification says:
The main purpose of the tracestate HTTP header is to provide additional vendor-specific trace identification information across different distributed tracing systems
So TraceState is the correct place for "arbitrary vendor-specific information" and thus should be used to satisfy the need from point 1. above.
W3C specifies mutating operations on TraceState. It is also natural to assume that there should be an API to read key-values pairs from TraceContext.
I think that by delegating to W3C specification OpenTelemetry does sufficient work of specifying TraceContext API and thus point 2. from above is invalid.
Regarding the concern raised in the original comment that
tracing system should not need access to arbitrary keys. only the otel key should be readable/writeable
W3C specifies that mutating operations are allowed to operate on _arbitrary_ keys. Indeed, by their nature (they are _vendor specific_), Otel cannot limit access to any keys TraceContext.
Taking all that into account, I think there is no need for any further actions regarding TraceContext API. Related concerns about whether this API should be available in Trace API or only to Trace SDK, and whether SpanContext should be part of the public API or not, can be addressed separately.
Taking all that into account, I think there is no need for any further actions regarding TraceContext API.
I do think we should at least list the operations. We do not need to specify them in detail, but at least we need to make clear what is needed. AFAIK, some languages allow only accessing TraceState as a raw string. They would then at least need to provide utility functions.
Also, TraceState is immuatable in OpenTelemetry, by SpanContext being immutable. This is not a contradiction (you can manipulate a SpanContextBuilder before creating a (child) SpanContext), but might warrant some additional API design hints.
I do think we should at least list the operations
You don't think that linking to that list in W3C specification is enough?
No, I don't think that's enough, as is evident by the diverging APIs in different languages. We don't have to describe each operations in detail (link to W3C should be enough) but we should list them at least.
Also I wonder if we should apply all limitations (like dropping old keys if it gets too long) only at process boundaries or already in-process. I would think only at process-boundaries, as TraceState may also be used by non-W3C propagators which may different limitations / encodings of the TraceState or special keys therein.
Ok, fair enough.
Then the solution to this issue should be an amendment to the spec, which copies a list of operations on TraceContext from W3C spec and notices that all validation/truncation requirements of W3C spec MAY happen as late as right before putting TraceContext on the wire.
@Oberon00 @dyladan @open-telemetry/specs-trace-approvers agreed?
Seems fine to me. The only thing that still needs to be specified is if our API to tracestate should allow you to specify your key. set(value), get() (key configured on the TracerProvider) or set(key, value), get(key) (key decided at call time).
As vendors can add arbitrary keys, Otel cannot decide on them.
I'm more and more leaning towards the larger but more future-proof change of saying that:
getTraceState(Context) -> string or parsed form that is available in the same package as the W3C tracestate propagator. Whatever package that is (it might be in the API IIRC, cf. #428), it MUST NOT depend on the SDK (it will typically depend on API, propagator base and maybe some propagator utility package).I must admit I don't see a need for such significantly more elaborate proposal. But, @Oberon00, if you have a clear vision of such new proposal, are you willing to reassign this issue to yourself and submit a PR? I was assigned this to ensure it is moving forward. Seems that you have the needed motivation to see it through.
@iNikem
I must admit I don't see a need for such significantly more elaborate proposal
I think for just solving this P1 issue, the simpler proposal would also work fine. But the more elaborate proposal would additionally solve the flexibility bottleneck that is SpanContext (as discussed in https://github.com/open-telemetry/opentelemetry-specification/issues/759#issuecomment-669739493). It will be significantly more work however 馃槥
I think I will try to get at least #510 resolved to not completely close the door for this in the future and in parallel we should go forward with your suggestion https://github.com/open-telemetry/opentelemetry-specification/issues/759#issuecomment-677614564 of just specifying an API for SpanContext.
If I/someone else still proposes a PR in time, we can easily adjust that API to operate on Context instead of SpanContext as a small (yet breaking!) spec change together with the other (larger) changes I proposed.
During the Monday maintainers meeting, @iNikem committed to providing a PR for the simple proposal this week. @Oberon00 does that work for you?
We need to solve this by the end of the week to hit our Tracing spec RC milestones.
Most helpful comment
Ok, fair enough.
Then the solution to this issue should be an amendment to the spec, which copies a list of operations on
TraceContextfrom W3C spec and notices that all validation/truncation requirements of W3C spec MAY happen as late as right before puttingTraceContexton the wire.@Oberon00 @dyladan @open-telemetry/specs-trace-approvers agreed?