Opentelemetry-specification: Specify the behavior of the Tracer APIs in the absence of an SDK

Created on 4 Jul 2020  路  18Comments  路  Source: open-telemetry/opentelemetry-specification

Because we do have some no-op behavior in the API, to enable no-SDK context propagation, we should precisely specify how the bare API should respond to requests for new spans.

Cases to specify:
1) There is an existing valid span-context (presumably populated by a propagator from an incoming request)

  • does the API generate a new span id, or continue to use the existing span-context, propagating transparently?
    2) There is no existing valid span-context (presumably the instrumentation is creating a new trace, for some "background" sort of process)

    • does the API generate a new trace id/span id, or create a no-op "invalid" span context?

api p1 required-for-ga trace

Most helpful comment

@Oberon00 I thought that we had all agreed that the w3c propagator would be the default for the API across all languages, but I can't find this in the spec anywhere. Am I missing it, or have we really not specified the default propagator for the API yet?

All 18 comments

Wondering, how does propagation work without the SDK? Would users manually call the propagators and insert them into headers? My current model is propagation is enabled by instrumentation, e.g. instrumentation of Apache http client, and instrumentation requires SDK. So want to confirm what no-SDK propagation looks like in practice. Nevermind, living too much in the auto instrument world, actual instrumentation only depends on API

For 1) I think we definitely have to propagate the incoming span context, or because downstream nodes with an SDK link to a phantom span ID, dependency graphs at the least would be weird. That doesn't mean the API can't generate a span ID that isn't propagated and only used in process - the use case we would expect for that is injecting into logs. But I think this currently wouldn't work since we always propagate currentSpan.

I think these are very important questions we should solve.

This is closely related with #208 "Default Tracer propagating SpanContext by default or not", please also read the discussion there (that issue was closed without a proper conclusion though, when most of context propagation moved out of the tracer -- but as this issue shows, there are still unsolved problems).

Also related are #520 "Default API behavior without SDK" (I think that issue is a bit too generic / was created in a pre-separate context era though) and #428 "Default global propagators".

Also note that when you say "valid span-context", AFAIK the spec is still missing a definition of what constitutes an (in)valid spancontext. That's not a trivial question, given the possibility of custom non-W3C propagators. E.g. do both trace and span-id have to be non-zero? Even if both are zero and there are trace-state entries, maybe that is enough for some propagators to link a trace?

Interestingly, one use case of a tracestate-entry would be to store a per-system/per-tenant/per-backend "duplicate" of the parent span ID so that we can link traces despite phantom span IDs in-between. I explained this in https://github.com/open-telemetry/opentelemetry-specification/issues/366#issuecomment-580235961.

My answers here:

  1. Never return a null Span (even if only API is installed) because it forces everyone to write code that checks for null spans when current Span is used. This is to clarify why the default implementation should not be one that returns null Span. If the language does not impose a null check when interacting with the Span, then null/nil Span can be returned.
  2. "There is an existing valid span-context (presumably populated by a propagator from an incoming request)". We should follow the trace-context w3c specs and do the "minimum" part https://www.w3.org/TR/trace-context/#design-overview. We propagate without participate in the trace (no mutation for any of the headers).

    • Question: Should we do validation? I think every propagator should do their own validation, and if the result of running the propagator produces a non-null Span context we propagate it, if the result is a null context then we do the same thing as the next case.

  3. "There is no existing valid span-context (presumably the instrumentation is creating a new trace, for some "background" sort of process)". I think in process we should return a no-op Span with an invalid SpanContext (traceid = 0/spanid=0/traceFalgs = 0/ traceState =""). When propagating we should not propagate any header in case of w3c propagator, for other propagators like b3 we should do what their specs suggest (most likely no headers propagated).

So I think the default API implementation must be very minimal and with very limited "knowledge" of headers and how to generate ids etc. As an argument against generating traceID for example can be the fact that traceID can contain non-random parts (like AWS X-Ray) so the default implementation should not know about all these things.

@Oberon00:

Also note that when you say "valid span-context", AFAIK the spec is still missing a definition of what constitutes an (in)valid spancontext.

A SpanContext is invalid if any of their components are invalid. So if traceID or spanID is invalid then the SpanContext is invalid.

Proposal

I would rephrase all the previous text in a simple way:

  • The default tracer span builder should:

    • Return a Span with the same SpanContext as the parent if parent exists (ignore if parent is remote or local).

    • Return a Span with an invalid SpanContext (traceid = 0/spanid=0/traceFalgs = 0/ traceState ="") if parent does not exists. If the language does not impose a null check when interacting with the Span, then null/nil Span can be returned.

The propagator should do what they do independently of the implementation used for the API.

A SpanContext is invalid if any of their components are invalid. So if traceID or spanID is invalid then the SpanContext is invalid.

That's the implementation in Java. However, I think that's very limiting for custom propagators, which might be able to make do with just a tracestate (which could come from anywhere, not necessarily the W3C tracestate header).

Never return a null Span

This seems sensible for most languages, but some (like Objective C) may actually be better served with null (or empty optional, etc).

If the language does not impose a null check, indeed we can return null. The only thing that is important is to not make any Span interaction an if condition that checks for null :)

That's the implementation in Java. However, I think that's very limiting for custom propagators, which might be able to make do with just a tracestate (which could come from anywhere, not necessarily the W3C tracestate header).

As mentioned in that long comment, the propagator should implement the specs for that specific propagation format. In case of w3c that is what is defined as invalid.

@tedsuo In the discussion in #208, many moons ago, you vehemently disagreed with what @bogdandrutu is proposing here. Do you still feel the same?

@bogdandrutu

As mentioned in that long comment, the propagator should implement the specs for that specific propagation format. In case of w3c that is what is defined as invalid.

I agree that the W3C propagator should consider this invalid. But the decision whether a span context is invalid is not up to the propagator, but currently the span context itself decides it, so the criteria is hardcoded in the API artifact usually.

Having the propagators decide would make sense though (then IsValid would become an additional (constructor-settable) field on the SpanContext, or we would deal with Optional, whatever fits the language.

I would create a new issue for this discussion, but it seems closely related to this issue.

EDIT: Actually, your proposal is only talking about existence, not validity, seems fine to me! So an invalid span context would just be copied and the Inject of the propagator can decide whether to actually inject or consider the context empty. 馃憤

@jkwatson @tedsuo @bogdandrutu I think this is not necessarily a disagreeing proposal. The important question would be: What are default propagators? If they are still no-op, then I think everything is fine.

EDIT:

we propagate without participate in the trace

OK, this part of the comment disagrees, but the resulting proposal does not.

@Oberon00 I thought that we had all agreed that the w3c propagator would be the default for the API across all languages, but I can't find this in the spec anywhere. Am I missing it, or have we really not specified the default propagator for the API yet?

@jkwatson I think that was a mutual agreement but never documented :)

@jkwatson I think you already found #428 "Default global propagators", just linking it here.

I am holding off on writing up a decision until #428 has been resolved, as I think this will depend on that decision. @carlosalberto ... ! :)

@Oberon00 I thought that we had all agreed that the w3c propagator would be the default for the API across all languages, but I can't find this in the spec anywhere. Am I missing it, or have we really not specified the default propagator for the API yet?

@jkwatson I think that was a mutual agreement but never documented :)

Was the agreement that this would be the _API_ default behavior, or the _SDK_ default behavior? In my mind, if an SDK is not configured we should do nothing in all cases.

In JS, because we do not have access to thread local storage or similar, we cannot propagate in-process without SDK set up anyway. Our context-propagation mechanism is not "free" and is platform specific. This means a no-op propagator is the only sensible option in the absence of the SDK.

edit: i understand now. I have to say as the JS maintainer I am strongly against propagating context when no SDK is set up. because of the lack of thread local storage, the context propagation mechanisms in node are relatively expensive and not enabled by default. there is also none at all in the web, which means we had to implement context-propagation in the SDK.

@dyladan I don't see where that PR says about context. It specifies the default implementation for "StartSpan" operation. So I don't see where the Context part comes in play for JS.

Hmm... I think I misunderstood the PR. It wasn't clear to me that the spancontext provided to startspan was provided by the user. I thought this was to happen automatically if a request came in with a traceparent header.

Was this page helpful?
0 / 5 - 0 ratings