Opentelemetry-specification: Clarification required for Span.setStatus

Created on 29 Mar 2021  路  25Comments  路  Source: open-telemetry/opentelemetry-specification

One specific situation is currently absent from https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status. Namely, imagine we have a web application running with Otel auto-instrumentation. Instrumentation creates a SERVER span for an incoming request. During this request, some exception is thrown, which sets response code to 400. But framework provides an exception callback, during which application developer retrieves the SERVER span from the currently active context and sets its status to OK, based on some application logic.

Now, when auto-instrumentation is going to end this SERVER span, then based on http semantic convention it MUST set spans status to ERROR. And spec referenced above allows it to do so. But I think the intention was to allow application developer or operator to override any automatic decision by manually setting span status to OK. Should this mean that auto-instrumentation MUST NOT override span status OK? Please note also, that currently there is no way to read span status from Otel API.

api error-reporting trace

Most helpful comment

If Span's status were readable attribute then we could have a very simple solution: "if span has OK status set, then auto-instrumentations MUST NOT override it". That's it, isn't it?

All 25 comments

But I think the intention was to allow application developer or operator to override any automatic decision by manually setting span status to OK.

That's correct, based on my understanding. @tedsuo ?

It's quite common practice to use 400 (bad request) on REST API to indicate that client provided wrong input parameters.

We use it for any kind of validation of input parameters including also some business logic like to find appropriate records in DB. That's why it's quite important to have possibility to indicate that 400 is not an error.

I am confused by this too. I have an use case where HTTP auto-instrumentation returns a 404, but the application can handle that gracefully (by hitting a fallback endpoint), or in some cases even expects a 404.

I was wondering about this sentence in the spec:

Analysis tools SHOULD respond to an Ok status by suppressing any errors they would otherwise generate. For example, to suppress noisy errors such as 404s.

To me it's not clear whether that sentence refers to the HTTP span itself, to its parent span, or to any span in the trace.

So, given a trace with two spans:

  • Span 1: Unset, Span 2: Error. The backend/analysis tool should treat this as an error.
  • Span 1: OK, Span 2: Error. The backend/analysis tool should not treat this as an error, but suppress the error status on Span 2?

I also posted a related question on slack.

Discussed during the Friday issue triage meeting, assigned this to @carlosalberto, it might take some time since this seems to be a challenging topic.

Please also consider the original OTEP https://github.com/open-telemetry/oteps/blob/main/text/trace/0136-error_flagging.md which has suggestions on how to handle this with the Status Source attribute of the status.

If Span's status were readable attribute then we could have a very simple solution: "if span has OK status set, then auto-instrumentations MUST NOT override it". That's it, isn't it?

How would one put the OK status in the same span that the auto-instrumentation would set the status of?

How would one put the OK status in the same span that the auto-instrumentation would set the status of?

My original example describes exactly this situation. Auto-instrumentation sets the status of the span, which corresponds to the incoming http request, after/during the response is being sent out. Application developer can grab the current span in their code while processing that request and set span's status. Way before auto-instrumentation attempts to do so.

Sorry for not reading properly 馃檲

I think this example is perfectly solved in the original, accepted but only partially spec'd out OTEP: The user would call SetStatus(Status.OK, StatusSource.USER) and the instrumenation would then later call SetStatus(Status.ERROR, StatusSource.INSTRUMENTATION). Then the backend should normally prefer the USER status "as it is a communication from the application developer or operator and contains valuable information".

CC @tedsuo

Status source will work as well, certainly. I still think that my proposal is easier :)

Reading the status would violate the design principle that spans are write-only (except for SpanContext which is immutable, fixed from creation, and required for propagation). Also, every instrumentation would need to check the status and if it forgets, the user-status is overwritten.

A solution somewhere in the middle would be to provide a SetStatusIfUnset API.

All three solutions work for me :)

Status source will work as well, certainly. I still think that my proposal is easier :)

I also like your solution more, but I'm afraid that providing a getter for Status may be hard (impossible), e.g. a streaming implementation (besides what @Oberon00 already mentioned regarding the write-only design).

I'd be up for adding the Instrumentation part to _if_ languages can offer this without breaking their APIs, which I'm not totally sure it's feasible (might need to ask maintainers).

SetStatusIfUnset looks not as clean but it might be a good trade off definitely. I will bring this up in our SIG call today ;)

From the spec meeting today:

We discussed the options. There is general agreement that adding state to the API (via a status getter) is not a feasible solution.

We discussed adding a StatusSource field with two states, USER and INSTRUMENTATION. The USER status always wins. In practice, we add this feature by changing the current SetStatus function to set the source as USER, to avoid breaking user code. An additional SetStatus method or option would be added to set INSTRUMENTATION as the source.

Adding StatusSource has been approved as a solution to this problem, but we are reluctant to add this complexity and extra state until it is proven that we need it.

As a simpler solution, we are proposing that we add a SetStatus option to End. This status option would only be applied when the status is currently UNSET. All instrumentation would then be required to use this option. This allows the user to override instrumentation when setting the status. This incremental approach would be compatible with adding StatusSource at a later date, if we determine that we need it.

@iNikem volunteered to investigate whether adding a SetStatus option on End is a feasible solution.

  • Investigate whether the current Java instrumentation could meet the requirement of only setting the status when the span ends.
  • Investigate whether exception handling works with this solution.

It would be helpful to check other languages as well, and confirm that this solution is feasible before we try it. Please chime in if you, dear reader, can spend some time this month investigating this issue.

@tedsuo I have use cases where StatusSource would help, but the simpler solution (overriding status on End) wouldn't.

One use case is overriding error status of HTTP client instrumentations. With several HTTP client instrumentations I worked with there is no way to access the span that is created by the instrumentation. Furthermore there might be scenarios where application developers use libraries which in turn use instrumented HTTP clients - it would be very hard for them to directly access those spans across libraries.

StatusSource provides a way to override status without having direct access to spans created by instrumentations, and this is a need users have already expressed to me.

StatusSource provides a way to override status without having direct access to spans created by instrumentations

@pyohannes Can you elaborate on this, please? StatutSource still needs to be passed to a span, how will it help you if you don't access to spans?

Can you elaborate on this, please? StatutSource still needs to be passed to a span, how will it help you if you don't access to spans?

@inikem On backend or exporter side (maybe triggered by configuration) one could say: ignore error status values when StatusSource is INSTRUMENTATION, but honor the error status when the StatusSource is USER.

That is how I interpret this sentence in the OTEP:

Analysis tools MAY disregard status codes, in favor of their own approach to error analysis. However, it is strongly suggested that analysis tools SHOULD pay attention to the status codes when set by USER, as it is a communication from the application developer or operator and contains valuable information.

Then any instrumentation might set error status codes, but the backend configured in that way will only mark traces as failed when there's an error status code where StatusSource=USER.

Not the most elegant. But I think it can get very tedious for application developers to re-set status codes of instrumentations.

If backend "ignore error status values when StatusSource is INSTRUMENTATION", then it means all application developer has to set status code whenever they expect the span to be an error. This is even more work than re-setting status when there is a conflict between developer expectation and auto-instrumentation decision.

If backend "ignore error status values when StatusSource is INSTRUMENTATION", then it means all application developer has to set status code whenever they expect the span to be an error.

The backend could "ignore error status values when StatusSource is INSTRUMENTATION _only if_ there is a span in the trace with StatusSource=USER and Status=OK".

However, another possibility without the need for StatusSource would be to "ignore error status values _if_ there is a span in the trace with Status=OK". I wonder if this section of the current specification could be interpreted in this way; I tried to get clarification on this, with no success yet:

Analysis tools SHOULD respond to an Ok status by suppressing any errors they would otherwise generate. For example, to suppress noisy errors such as 404s.

@iNikem Do you think a solution in those lines could solve your problem? Otherwise I'll open another issue to not mix up different problems and requirements.

I think that ignoring status of one span based on the status of another span is a VERY special case and certainly is not something that Otel specification can recommend.

As a simpler solution, we are proposing that we add a SetStatus option to End. This status option would only be applied when the status is currently UNSET. All instrumentation would then be required to use this option. This allows the user to override instrumentation when setting the status. This incremental approach would be compatible with adding StatusSource at a later date, if we determine that we need it.

@iNikem volunteered to investigate whether adding a SetStatus option on End is a feasible solution.

  • Investigate whether the current Java instrumentation could meet the requirement of only setting the status when the span ends.
  • Investigate whether exception handling works with this solution.

I have a POC changes as described above in Otel Java API/SDK and Java instrumentation repo. They demonstrate that the approach above can work. Only one or two instrumentations out of many in Java instrumentation repo may require somewhat elaborate changes, in the vast majority of cases changes are trivial.

During implementation and following discussion with @jkwatson we found that:

  • Having Span.end(Status) in API may confuse application developers into thinking that they should use this method to end a span, instead of Span.end() which should be used in the majority of cases.
  • A simpler approach may be to change Span.setStatus method behaviour to "freeze" status if it was called with OK and ignore any later calls to setStatus.

In both cases the behaviour of API methods may be surprising to the end-users, because in some cases provided Status will be ignored. In both cases we have to rely on documentation to clear confusion.

From the SIG meeting: a take on option B (freezing Span.setStatus) will be done in the DotNet SIG, and after that the change will be proposed against the Specification.

For the record, my POC of option A above:
API/SDK
Instrumentations

.NET PoC of Option B is here.
We've discussed during the OpenTelemetry .NET SIG meeting and folks are good with this approach.

I'm wondering, if we go with option B and the SDK is now going to start enforcing these ideas, should the SDK also ignore any explicit call to set the status to UNSET?

Was this page helpful?
0 / 5 - 0 ratings