Opentelemetry-specification: Proposal: Add option to export processing spans

Created on 5 Dec 2019  路  10Comments  路  Source: open-telemetry/opentelemetry-specification

Currently, exporters in OTel only get access to spans that are already finished in order to send them to the backend.
This proposal adds an option to also allow exporting spans that are currently processing, i.e., already started but not yet ended. They might also be referred to as open, unfinished, running or "in-flight" spans. (I'd like to avoid the term _active_ here since this has a different semantic in OTel.)

This allows the backend to display long-running spans and provide details about them in realtime. It further allows for debugging the instrumentation code since it will reveal spans that are started but not being ended.
The backend can also use this information for performance improvements because it receives information about spans ahead of time. This, for example, allows the backend to make decisions about keeping other, dependent parent or child spans in memory if there are currently processing spans expected to arrive later on or otherwise persist them on disk.

OpenCensus had a functionality like this used in their zPages feature:


This would have the following implications on the implementation:

A SpanProcessor implementing this feature would take Spans provided via the span start hook and provide those to the exporter. The current default processors only use the end span hook on which they hand the span over to the export right away (simple span processor) or batch it first (batching span processor).

The information on whether a span is processing or finished could, for example, be indicated by:

  • A) adding a field ProcessingStatus (or LifecycleStatus?) to the Span object which can have the values processing and finished, or
  • B) without any adaption by simply checking if the (mandatory) span end time is set - if it is not, the span is not yet finished, or
  • C) extending the SpanExporter interface by a method in addition to the existing export method, something like exportProcessing.

I would prefer option C since this provides the clearest distinction and spare the need for filtering all provided spans in exporters that don't want to handle processing spans - they could just make exportProcessing a no-op.

What's your opinion on a feature like that? I'd like to gather some feedback before I create an OTEP for it.

api after-ga trace

Most helpful comment

I implemented something like this for LightStep that we called 'meta events' - they were treated like spans to our recorder, but they would give the backend information about when a span started and ended, and when the span context was injected or extracted. I generally like the idea of some kind of diagnostic information about a span being available to exporters.

All 10 comments

I implemented something like this for LightStep that we called 'meta events' - they were treated like spans to our recorder, but they would give the backend information about when a span started and ended, and when the span context was injected or extracted. I generally like the idea of some kind of diagnostic information about a span being available to exporters.

I worry about lock ordering and the potential for deadlock when I read this proposal, though I agree that using the stop-time field as an indicator for completed spans is solid. AFAICT, OpenCensus did not implement z-pages using a span processor, it relied on internal logic.

I have some familiarity with an _internal_ Google implementation similar to OC z-pages, which is where this fear comes from. If there's a lock protecting the list of all live spans, you have to be careful when viewing z-pages or else stop the world.

@jmacd That's certainly a valid concern that we'd have to deal with in the implementations, yes.

@jmacd I do agree with you about the locking problem, but I think the current SpanProcessor supports that implementation. Add the ReadableSpan to a list of active Spans onStart then remove onEnd, when needed you can iterate over the list to read in-flight spans.

I think the standard SpanProcessor for the exporter pipeline should not support this, but if necessary a vendor can implement their own SpanProcessor that does report in-flight spans to the exporter framework.

Also this can be added later, because the framework that we built supports this.

@arminru I think you can use the presence of the end timestamp in the SpanData to determine if the Span is still in-process or finished. Does that solve your problem?

@bogdandrutu
The presence of the end timestamp would be one way of distinguishing processing from finished spans, yes.

I also wouldn't add it to the standard exporter pipeline. What I'm proposing is to add it to the spec so that the OTel implementations add a SpanProcessor which forwards these processing spans. If a user/operator is interested in them, they just have to plug that processor into their pipeline - it would be an opt-in feature, not the default.

Users might, however, plug a default exporter into the ProcessingSpanProcessor (we'll find a better name for that) so we'll have to make sure this works properly.
Option C mentioned above would require the exporter interface to be extended. A separate method for processing spans would be added to assure that exporters neatly and efficiently handle both kinds of spans - the current/default exporters would just make exportProcessingSpans() a no-op an be fine. This would require a spec change for the exporter interface.
With Option B (which you just proposed as well) exporters would have to check each end timestamp (or some convenience method like isFinished()) first in order to know which ones to ignore, before they can rely on assumptions for finished spans.

@arminru I see your point now, I think this is an edge case, and not sure if we need to design the API with this mistake in mind. I don't say that this is not a possible mistake that can happen, but in my mind every vendor will implement a "SpanProcessor build()" function that returns the SpanProcessor needed for that vendor and if no support for in-flight spans then it should not return the processor instance that does pass in-flight spans.

In an ideal case most of the users should do:

  TracerFactorySdk tracerFactorySdk = OpenTelemetrySdk.getTraceFactory()
  VendorExporter.init(tracerFactorySdk, options...)

Only in case of advance users they should build their own exporter pipeline and SpanProcessor.

@bogdandrutu Alright, I'm fine with relying on the end timestamp for checking if a span is processing or finished - this way we don't have to add a potentially confusing, second status field to Span and can keep the exporter interface as it is.
I still would like to propose this kind of processor for reporting processing spans into the SDK spec. This way implementations would include it by default and users or vendors can, if desired, activate it by adding it to the pipeline.

@bogdandrutu

I think you can use the presence of the end timestamp in the SpanData to determine if the Span is still in-process or finished

I don't think that is really a good option. E.g. consider the Java definition of an end timestamp:

  private long getEndNanoTimeInternal() {
    return hasBeenEnded ? endEpochNanos : clock.now();
  }

This shows two problems: First, we can't check the "presence" of the end timestamp since we get clock.now() if it isn't present. Second, if endEpochNanos != 0 would be a valid replacement for hasBeenEnded, why wouldn't the SDK use it instead of adding an additonal boolean? I think it should be possible for a processing spans processor to use the same definition as the SDK.

Maybe the situation is better in other languages (I'm pretty sure it is e.g. in Python), but if there isn't an implementation of the span processor in the SDK already, I think the processing span processor will be very difficult/impossible to implement due to lots of tiny problems and edge cases like this that no one has thought about because no one had used the SDK in a similar way.

I added an example Java implementation in open-telemetry/opentelemetry-java#697.

Was this page helpful?
0 / 5 - 0 ratings