Opentelemetry-specification: Add BeforeEnd to have a callback where the span is still writeable

Created on 13 Oct 2020 · 40Comments · Source: open-telemetry/opentelemetry-specification

What are you trying to achieve?
Set an attribute to all spans in a trace

What did you expect to see?
Writeable spans on SpanProcessor's OnEnd(Span)

Additional context.
What I tried to implement is a feature called trace field in Honeycomb (reference: AddFieldToTrace)

For that I first tried to use SpanProcessor, but soon found out that OnEnd(Span) is not writeable.

The solution I arrived was to write a custom exporter which was able to read from a thread-local storage (at the time of the implementation, Baggage had yet to be spec'ed out) to list all attributes related to a Trace and insert into the span before exporting.

From what I understood from the SpanProcessor specification, that's the place where I should have placed this logic, as it's a transformation that should happen before the span is sent to any exporter

sdk enhancement after-ga trace

Source

gugahoa

👍5 👎1

Most helpful comment

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

An HTTP user id parsed out of headers is the best example I can think of, but actual honeycomb users (@garthk @gugahoa ?) probably have more. The initial request span is started right away, before user code parses out headers they care about, a child span is started when the user's logic is called at which point they parse out the user-id and want to put it on all spans in that trace but the span created when the HTTP handler got the request has already run OnStart so the only time it could be added is OnEnd -- without manually going through the stack and adding the attribute.

tsloughter on 14 Oct 2020

👍2

All 40 comments

/cc @arminru

tsloughter on 13 Oct 2020

Required for dynamic resource-like attributes desired in every span but computed at run-time because they're not constant, _eg._ instance uptime and recent CPU utilisation. Injecting them in the exporter gives results that become less accurate during the spikes we're most interested in.

garthk on 13 Oct 2020

IIRC it was discussed to make spans writeable in OnEnd but this was left out because users will not always have full control over the order in which span processors are registered and in turn invoked, which could result in unreliable behavior.
From your use case it sounds like you would not need any attributes that are added/overwritten in OnEnd visible to any other span processor, right? If you only need to have those changes reflected once passed to exporters, this shouldn't be an issue then.

@carlosalberto you seemed to remember some of that discussion as well in the meeting yesterday. Do you know the rationale for that decision?

arminru on 14 Oct 2020

Why these attributes cannot be set in onStart, which already has writable span?

iNikem on 14 Oct 2020

👍2 👎1

Reading the spec again, I noticed that it explicitly says that spans are already marked as ended at the time they are passed to SpanProcessor.OnEnd:

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#onendspan

OnEnd is called after a span is ended (i.e., the end timestamp is already set).

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#end

End: Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).

We would also have to change either of those requirements if we wanted to make spans writeable in OnEnd and we would need to understand the consequences well.

arminru on 14 Oct 2020

Exactly. The difficult to get right (but specified!) order (exporting may happen before the other OnEnd processor) is another problem, but the main problem is that spans are already ended, as @arminru said. This was already discussed in https://github.com/open-telemetry/opentelemetry-specification/pull/669#issuecomment-654841632.

There are two ways we could go forward here:

Use existing mechanisms in your implementation to modify the "SpanData", i.e. the readable representation of the span that the exporter gets, instead of the Span itself. EDIT: E.g. Java has https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java
We may introduce a new SpanProcessor hook BeforeEnd.

We definitely can't pass a writable span to OnEnd.

Oberon00 on 14 Oct 2020

👍2

Why not define it as your Span Processor must assume it gets the Span in the form it is ended with and not expect to have any changes from previous processors visible?

tsloughter on 14 Oct 2020

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I'd be fine with a BeforeEnd span processor, but don't really like adding more hooks when as far as I can tell the only issue is to define that OnEnd processors can't depend on each other and that if you have multiple processors writing the same attribute you can't know which will actually be added last -- so a warning to use unique attribute keys.

tsloughter on 14 Oct 2020

👍1

No, the main issue is that in OnEnd the span is already ended, and for an ended span, SetAttribute should be a no-op. Changing that seems liable to opening a can of worms.

Oberon00 on 14 Oct 2020

Ah, ok. Yea, I wouldn't want to change the fact that ended means read-only but instead move the "ending" to after the processors. But this isn't possible since one of the processors is what takes the "ended" span and passes it to the exporter... I didn't like exporting being part of processors because of this.

And probably too late to argue that sending to the exporter should be separate and unrelated to processors, leaving OnEnd processors to act like "BeforeEnd" :(

tsloughter on 14 Oct 2020

Maybe we would even need a special state where the span is semi-ended, i.e. it has IsEnded==true and has an end timestamp, but is still recording.

Oberon00 on 14 Oct 2020

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I don't understand. If you have a SpanProcessor registered, it will receive all spans created in the given process. And so it will be able to add attributes to all spans. You don't "do a call" manually. All spans are handed to you.

iNikem on 14 Oct 2020

@iNikem it only receives a write-able span for OnStart, meaning if spans are already started in the trace you will not be handed those spans at a point that you have the attributes and can write to the span.

tsloughter on 14 Oct 2020

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

And even more: how does onEnd differ from onStart in this regard? I fail to see how you hoped to achieve your goal with onEnd that you cannot do in onStart?

iNikem on 14 Oct 2020

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

tsloughter on 14 Oct 2020

👍2

That's a good characterization, @tsloughter. This has been a "normal" way of using Honeycomb's Beeline library for a long time. It buffers the trace span tree in memory, and can distribute attributes (called "fields" there) up and down the span tree before sending the spans to the server. It stores a set of fields on the trace itself, in a field on the trace.Trace type called "traceLevelFields." You can see the transcription from that field down to each span just before sending each here.

seh on 14 Oct 2020

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

bogdandrutu on 14 Oct 2020

👍1

Agree with @bogdandrutu. I'm not sure which implemenation language you are using but in Java there is https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java

Unfortunately, #158 is not resolved so this is all a bit vague in the spec.

Oberon00 on 14 Oct 2020

"that we will export"

Where is there a pluggable place to modify SpanData before export?

If you are saying OnEnd Processor can modify SpanData then that would be fine.

tsloughter on 14 Oct 2020

If you are saying OnEnd Processor can modify SpanData then that would be fine.

Processor can certainly modify SpanData

iNikem on 14 Oct 2020

@tsloughter

Where is there a pluggable place to modify SpanData before export?

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

Oberon00 on 14 Oct 2020

@iNikem

Processor can certainly modify SpanData

True, but in order for the "modification" (SpanData is immutable at least in Java, so it actually creates an updated copy) to be picked up by an exporter, that exporter must be invoked in the same OnEnd.

Oberon00 on 14 Oct 2020

👍1

@bogdandrutu

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

That would mean writing my own processor which will export to the configured exporters, but I wouldn't want to write my own processor for that and instead it would be preferable to rely on tried-and-true processors such as the BatchProcessor

@Oberon00

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

gugahoa on 14 Oct 2020

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

That is what we hoped not to have to do and I believe is already the way people have been doing it.

But I guess if its a wrapper that can take any exporter it and be a new exporter that acts on the attributes and then calls the wrapped exporter with the modified batch it wouldn't be too bad.

It would certainly be more efficient to update "on end", but it sounds like a wrapper exporter is the way to go -- assuming there aren't issues with the lifetime of the shared attributes on a trace, will have to talk to those using it.

tsloughter on 14 Oct 2020

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

It does. We have an exporter behaviour, otel_exporter. You can create a new exporter that simply goes through all the spans and instead of exporting them it updates their attributes in the table and then calls the export/2 function of the wrapped exporter.

And has to call the init/1 of the wrapped exporter when its init/1 is called at startup.

tsloughter on 14 Oct 2020

👍1

It would certainly be more efficient to update "on end"

You seems to have an implicit assumption that for each span all its parent spans from the same trace will _end_ after it. That assumption is wrong. Parent span can end even before child span starts.

iNikem on 14 Oct 2020

Parent span can end even before child span starts.

What does that represent? Perhaps an asynchronous task spawned by the parent that took a while to start?

seh on 14 Oct 2020

👍1

It would certainly be more efficient to update "on end"

You seems to have an implicit assumption that for each span all its parent spans from the same trace will _end_ after it. That assumption is wrong. Parent span can end even before child span starts.

I know, this feature can't be perfect, there will be missed spans in some specific cases, but for many cases it appears to be useful and work just fine.

tsloughter on 14 Oct 2020

Parent span can end even before child span starts.

What does that represent? Perhaps an asynchronous task spawned by the parent that took a while to start?

Async servlet/Spring webflux controller. Controller method will return and its span will finish before async result starts actual computing.

iNikem on 14 Oct 2020

If we're not up for changing the behaviour of OnEnd to BeforeEnd, but some users need something with BeforeEnd semantics, why not add a BeforeEnd?

The alternative seems to be to either lose our time correlation or build amusing workarounds like using OnEnd to create a child span with the data we couldn't add to this one, and wrapping the exporter we want in another one which first buffers spans for long enough to reliably find the disposable spans, copy attributes from them to their parent, and then discard them, and then calls the original exporter. It's possible, but it's a lot of work to force on our users.

garthk on 14 Oct 2020

@garthk You make it sound very complicated. What is that about buffering something long enough? I think I don't understand your use case after all. I thought you simply wanted to do something like

  @Override void OnEnd(ReadWriteSpan span) { span.setAttribute("foo", "bar"); }

which translates to

  @Override CompleteableResult export(Collection<SpanData> spans) {
    Collection<SpanData> modified = new ArrayList<SpanData>(spans.size());
    for (SpanData span: spans) {
      Attributes newAttributes = Attributes.newBuilder(span.getAttributes()).setAttribute("foo", "bar").build();
      modified.add(SpanDataBuilder.newBuilder(span).setAttributes(newAttributes).build());
    }
    return wrappedExporter.export(modified);
  }

(@anuraaga is this actually the best we can do in Java currently? It's really boilerplate-y)

Sure, this is more allocation-heavy and a bit awkward, but it is functionally equivalent at least.

(Writing it out really makes me sympathetic to supporting BeforeEnd, but I still think I might not understand what you actually want here)

Oberon00 on 15 Oct 2020

The reporter works on the span later than the end of the span, and runs in a different process. It can't collect data at the relevant time or from the relevant process without help. OnEnd can, but it's got nowhere to put the data without extra machinery.

You're correct: the extra machinery would be complicated. I'd prefer not to build it, nor insist our users build it to meet that need. I'd prefer the spec granted that implementations MAY provide a means for the configured tracer to call code to modify the span prior to it reaching a state we consider final.

garthk on 15 Oct 2020

It can't collect data at the relevant time or from the relevant process

Right, but if you want to collect process data, you can use OnStart. If you need to do something based on span attributes, use the exporter wrapper. If you need to do both at the same time, you can transform the relevant process data into attributes in OnStart and remove them in the exporter wrapper.

Oberon00 on 15 Oct 2020

I'm sorry, I should have given concrete examples. Here's one with time _and_ process locality: a span attribute for the number of reductions performed by the process during the span.

Reductions are an excellent proxy for CPU effort, and cheap to measure. I'd much appreciate having a reduction count on all my spans, including those generated by my dependencies and their integration dependencies.

Third party code rules out using a function or macro to wrap the code and gather the data.

I could use OnEnd to take the last measurement, but due to the read-only span it couldn't put that measurement or its delta into the span's attributes, forcing the use of some other mechanism to get it to the exporter.

If the process weren't dead by the time the exporter got the span, it'd likely have done some more work, making the effort useless or impossible without using OnEnd and whatever mechanism smuggled the data to the exporter.

All this might be peculiar to our ecosystem, so I'd suggest an RFC 2119 MAY instead of a MUST for providing a BeforeEnd or other suitable way to configure our tracer to meet this requirement.

garthk on 15 Oct 2020

I'd be fine adding this additional BeforeEnd, although not sure about the timeline (in theory we could still add it, specially if it has a MAY for now, given it's on the SDK level). @bogdandrutu opinion on this?

carlosalberto on 16 Oct 2020

👋 We'd appreciate an optional BeforeEnd hook as well - internally at GitHub we have some complex filtering requirements for any data that may leave our datacenters (such as trace data, which currently goes to a SaaS). We're currently not able to run the opentelemetry collectors for filtering, and that leaves us attempting to scrub spans of sensitive data before they're emitted. [1]

Wrapping SpanProcessors or writing custom SpanExporters is certainly feasible, and we're implementing that solution now. However, a BeforeEnd hook would allow us to implement our (arguably esoteric) filtering needs in a much more ergonomic and sustainable way.

If nothing else, consider this an alternate use-case for a BeforeEnd hook other than annotation of spans with additional data - we need to possibly _remove_ unintentional attributes and events.

[1] Of course, I wish we didn't need to do last-minute, best-effort scrubbing. I wish sensitive data didn't leak out of exceptions and events in the first place. Not emitting sensitive data is the best practice! But we can't always ensure that, and the consequences of mistakes can be severe, so we always attempt a last-minute scrub for known types of sensitive data, just in case.

ahayworth on 5 Mar 2021

We'd appreciate a BeforeEnd hook as well. Our use case: we want to change the span status for 404 HTTP errors. We can't use the collector.

Writing custom SpanExporters is suboptimal: it involves additional allocations (a span needs to be copied), and it involves additional implementation effort (we'd need to wrap every exporter we use, instead of just having one processor that can be used with any exporter).

pyohannes on 9 Mar 2021

it involves additional allocations (a span needs to be copied), and it involves additional implementation effort (we'd need to wrap every exporter we use, instead of just having one processor that can be used with any exporter).

I generally agree with the usefulness of BeforeEnd, just want to confirm these points. At least in Java, we have these mechanisms

DelegatingSpanData so it's mostly only attributes that are copied, not span. And if the attribute copy isn't ideal, we could also add DelegatingAttributes (hooray for Attributes being an interface!)
SpanExporter.composite. For the implementation effort, is it not something like AttributeModifyingExporter.create(SpanExporter.composite(exporter1, exporter2))?

Just want to check on how useful these current mechanisms are for your use case. Thanks.

anuraaga on 9 Mar 2021

Feels wrong to have to use an exporter for something that isn't exporting spans but instead is processing them.

tsloughter on 9 Mar 2021

Just want to check on how useful these current mechanisms are for your use case. Thanks.

Those mechanisms could be useful, if they were spec'd out and available for all languages. However, it seems to me BeforeEnd is a simpler and cleaner solution.

pyohannes on 10 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Properly communicate the stability for OTLP

carlosalberto · 4Comments

Why are we using such a long name as StatusCanonicalCode?

bogdandrutu · 4Comments

Interface stability for implementers and new methods in the spec

Oberon00 · 4Comments

Investigate and submit an OTEP on how other semantic conventions can be adopted

tigrannajaryan · 5Comments

Consider adding a statsd exporter standard in the spec

naseemkullah · 5Comments