Users should be able to configure custom error handlers for errors all throughout their OpenTelemetry projects.
There is already demand for this feature - see an identical issue for the Go SDK here along with the corresponding implementation.
@carlosalberto This is not about instrumentations reporting application errors but about OpenTelemetry itself reporting usage and internal errors to instrumentations/users, something that's easy to mix up in this domain. We had a separate label error-handling for that on #141 and #153 but it was deleted apparently. Should we bring it back?
At the spec meeting this morning, there was consensus that this would be an excellent feature, and we would like to require it for GA. The request was to simply define an error reporting delegate pattern, which defaults to logging to standard out, or the equivalent default in every language.
There will be other details we may want to standardize on, but error handling is different enough between languages that we would like to start with this simple specification, and add details later as it becomes clear what the value would be.
Discussed this issue again at the spec SIG this morning. We agreed that this would be valuable, but want to be sure that
(1) We aren't reinventing a logging library where one already exists (use whatever is idiomatic in the language)
(2) We only leverage this error handler on errors that it is not possible for us to recover from
I will draft a change to the spec to reflect these points and will update my open PR in Java to leverage the existing logging library.
One thing to make sure is clear is that in the Go implementation there is a global implementation of the error handler so the instrumenter can use the error handling prior to an SDK being registered. Similar to a global metric or trace handler or propagator.
Also, to capture the point from the spec meeting this morning: the error handling design should be customizable by the user. If the instrumentor, or even the operator wants to write a more advanced error handler they can do so and register it. This means the user can build out as complex or as simple of an error handler as they want and by default the default global handler just handles error in the simplest way while remaining idiomatic to the language. Additionally, the spec should include some language about how only non-recoverable errors should be sent to the handler. If there is defined fallback actions (like an empty tracer name being replaced with a default) the action should be preformed and no error sent.
@MrAlias It seems like you're suggesting that the error handler shouldn't be global at the SDK level, but at the API level, is that correct? I'm asking because in my original PR for Java I had added the error handler to the API and there was some pushback.
cc @jkwatson @carlosalberto
The title says SDK and that would seem logical to me too. All other configuration is also in the SDK.
I'd be up for putting it in the SDK too (I don't imagine unrecoverable scenarios with the API being the only component). Any opinion on this specific part @MrAlias ?
I don't think we should add to the API surface by adding this to the API specification. SDK: maybe
... the error handler shouldn't be global at the SDK level, but at the API level, is that correct?
Allowing instrumentation authors access to the global error handler seems needed. They as well might need to send unrecoverable errors to the handler. This means that the error handler will need to be something that can be register in the global scope, similar to a generic tracer, meter, or propagator interface. Additionally, a service operator needs the ability to customize the handling of errors. Meaning they will need to register their implementation of the error handler in the global scope. Because of these uses cases it seems to follow that the error handler needs to be decoupled from the SDK (the instrumentation author will not have an instance of the SDK) and be visible at the global scope.
I agree having the error handler be a part of the API seems incorrect and we shouldn't do that. It will pollute the API with implementation details not pertaining to telemetry. Also, it seems to follow that it should _not_ be specified as part of the SDK. Instead it seems like the error handler should be included in the language in a language specific way. This means it would be up to the SIG to defined it at an appropriate hierarchical level, not muddy the API or SDK, and serve all the functional use cases for an instrumentation author or service operator.
Also, it seems to follow that it should not be specified as part of the SDK. Instead it seems like the error handler should be included in the language in a language specific way. This means it would be up to the SIG to defined it at an appropriate hierarchical level, not muddy the API or SDK, and serve all the functional use cases for an instrumentation author or service operator.
Sounds great to me.
While that does sound reasonable, I'm not sure what the conclusion is from that? Specify nothing? Specify only very vaguely that language implementations should allow users to be notified of internal errors? EDIT: Are you suggesting a separate artifact for the error handler that the API and SDK depend on? Maybe also the propagation layer? If it's that generic, you can as well use a logger framework, those also allow installing custom handlers.
While that does sound reasonable, I'm not sure what the conclusion is from that? Specify nothing? Specify only very vaguely that language implementations should allow users to be notified of internal errors?
Good point on not being clear. My understanding is that we should add a section on a recommended global handler, what is well integrated with the language. If there's one already, great, make the SDK and other components report to it; else, define one (like the Go one).
Is this solved by https://github.com/open-telemetry/opentelemetry-specification/pull/757 or we need something more?
I believe that this should be resolved by that PR, thanks for pointing that out! Will close this issue now.
Most helpful comment
Good point on not being clear. My understanding is that we should add a section on a recommended global handler, what is well integrated with the language. If there's one already, great, make the SDK and other components report to it; else, define one (like the Go one).