In opentelemetry-go, one metric == one observer.
This proposal stresses the need for one observer to observe multiple metrics.
The best use case is for reading various system metrics in one call such as:
import "runtime"
var ms runtime.MemStats
runtime.ReadMemStats()
This operation (and others) can be expensive. This one in particular can populates ~30 variables where each one is fit to be in its own metric/measure. Therefore, the current way it would work would have to be the following:
mm := metric.Must(meter)
mm.RegisterInt64Observer("process/heap_alloc", func(result metric.Int64ObserverResult) {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
result.Observe(int64(ms.HeapAlloc))
})
mm.RegisterInt64Observer("process/pause_ns", func(result metric.Int64ObserverResult) {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
result.Observe(int64(ms.PauseNs[(ms.NumGC+255)%256]))
})
mm.RegisterInt64Observer("process/sys_heap", func(result metric.Int64ObserverResult) {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
result.Observe(int64(ms.Sys))
})
// etc etc...
I hope you can see that this becomes inefficient fairly quickly.
As a workaround, it was suggested on Gitter that we can just use Labels to substitute for multiple metrics. This has two issues:
I propose that users should be able to register one call back that will populate multiple metrics as such:
mm.RegisterBatchInt64Observer(func(result metric.BatchInt64ObserverResult) {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
result.Observe("process/heap_alloc", int64(ms.HeapAlloc)))
result.Observe("process/sys_heap", int64(ms.PauseNs[(ms.NumGC+255)%256]))
result.Observe("process/sys_heap", int64(ms.Sys)))
})
If metric names must be passed ahead of time, then you'd probably need to pass the metric names in the RegisterBatch<T>Observer function.
Otherwise, I'm happy to hear any other suggestions.
Thanks
From Gitter:
@codeboten
Apr 06 11:41
i thought about implementing this with a single callback and using labels instead of separate observers, but i wasn't sure if that would make things better or worst
Tristan Sloughter
@tsloughter
Apr 06 11:52
I was thinking about how to do similar for the Erlang VM metrics and asked in Java channel since they would need similar. John pointed me to the Java code and they use labels instead of many observers
Tristan Sloughter
@tsloughter
Apr 06 11:53
like https://github.com/open-telemetry/opentelemetry-java/blob/e248ea7ad74b3ac9774f871636c8f6b839be998f/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java
alrex
@codeboten
Apr 06 12:56
right, it's similar to what's done in some examples in the python repo https://github.com/open-telemetry/opentelemetry-python/blob/master/docs/examples/basic_meter/observer.py#L39
the proposal -> this is how I implemented it in JS
And I think the spec says the same:
Multiple observations are permitted in a single callback invocation.
https://github.com/open-telemetry/oteps/blob/master/text/0072-metric-observer.md#observer-calling-conventions
OTEP 72 does say that multiple observations are permitted per callback, but the intention at that point was to support multiple (value, label set) observations for a single metric. The language is unclear, but the pseudocode in the OTEP is somewhat clear.
In getting to OTEP 72 we discussed several related issues. We removed "bound" observer instruments, justifying this because LabelSets were still part of the API and support caching the label encoding used in an exporter.
Now that we have removed LabelSets from the API, the notion of a bound observer starts to look like a real optimization. What would an API look like to support multiple metrics to be observed from one callback?
We would register callbacks with the SDK directly, since they are not associated with metrics directly. We could have three analogous calling patterns to the synchronous instruments: direct calls, bound calls, and batch calls. Callbacks would be executed sequentially in the order they were registered. For example:
var instrument1 = global.Meter(...).RegisterInt64Observer(...)
var instrument2 = global.Meter(...).RegisterFloat64Observer(...)
var boundInst2 = instrument2.Bind(labels...)
var instrument3 = global.Meter(...).RegisterInt64Observer(...)
func Callback(result metric.ObserverResult) {
// direct call
result.Observe(instrument1, value1, labels1...)
// bound call
boundInst2.Observe(result, value2)
// batch call
result.ObserveBatch([]core.KeyValue{labels3}, instrument1.Measurement(value1), instrument3.Measurement(value3), ...)
}
This sort of change requires us to mention that callbacks are executed in order, whereas we didn't have to before. Last-value wins was previously applied to the result from one callback, but now it has to be applied to the result from all callbacks.
Thoughts? @bogdandrutu this is a significant problem, please have a look.
I like the idea of adding a callback interface to the Meter.
Java pseudocode:
meter.addObserver(Callback<Meter> meterCallback)
where Callback<T> looks something like this:
interface Callback<R> {
void update(R input);
}
What if instead of function as a callback it would be an observable object. This way the problem with sync / async would be solved.
for example
const observable1 = new ObservableObject();
observerResult.observe(observable1, labels);
// anywhere in a code doesn't matter when
observable1.next(1);
//or
setTimeout(()=>{
observable1.next(1);
}, 1000)
// etc.
so every time you call observable1.next(1); it will simply update the metric.
The main problem when the callback is a function is that you have no easy way of getting data which requires async calls. And also you will have problem with blocking the sync call for longer than it is needed.
WDYT ?
@obecny What you've written looks like a synchronous instrument through and through. There are reasons for Observers to be asynchronous, they're supposed to be a solution. The reasons:
We had made the Observer callback support a single metric merely out of a desire to keep things as simple as possible. I believe this example demonstrates that we have oversimplified. The proposal I made above for bound observers surfaced in early in the 0.3 conversation IIRC, and we dropped it.
I think it should be possible to have the "batch-callback" similar to batch record. Probably not one per meter, but allow batching-callback is indeed something that we can support.
@jmacd I would rely on a combination of the BatchRecorder and ObserverResult to achieve the performance you mentioned:
var instrument1 = global.Meter(...).RegisterInt64Observer(...)
var instrument2 = global.Meter(...).RegisterFloat64Observer(...)
var instrument3 = global.Meter(...).RegisterInt64Observer(...)
global.Meter(...).RegisterBatchCallback(Callback)
func Callback(result metric.ObserverBatchResult) {
// batch call
result.ObserveBatch([]core.KeyValue{labels3}, instrument1.Measurement(value1), instrument3.Measurement(value3), ...)
}
// OR
func Callback(result metric.ObserverBatchResult) {
// batch call
result.BatchRecorder([]core.KeyValue{labels3})
.add(instrument1.Measurement(value1))
.add(instrument3.Measurement(value3))
.record()
}
I like this. There are a few details to work out, like:
As we have removed LabelSets, the BatchRecorder pattern that you bring up stands in as a replacement for LabelSet. I wonder if you would allow this pattern:
var (
meter = global.Meter("library")
metricRecorder = meter.AsyncBatchRecorder(labels...) // Or "ObserverBatchRecorder"
_ = meter.RegisterBatchCallback(
func (result metric.ObserverBatchResult) {
metricRecorder.record(result,
instrument1.Measurement(value1),
instrument3.Measurement(value3),
)
})
)
I have the same question for synchronous usage, e.g.:
func F(ctx context.Context, ch <-chan T) {
meter := global.Meter("library")
metricRecorder := meter.SyncBatchRecorder(labels...) // Or "MeasureBatchRecorder"
for elem := range ch {
metricRecorder.record(ctx,
instrument1.Measurement(elem.X),
instrument3.Measurement(elem.Y),
)
})
)
Note the only difference between Sync- and Async- BatchRecorders is that Sync- recorders take Context, while Async- recorders take ObserverBatchResult.
Instruments may pass a null callback, they can be registered w/o defining a callback due to batch observations
Callbacks will be called in the ordered they were registered, whether on individual instruments or batch observer callbacks, to preserve last-value-wins semantics
I would like to think of a solution where we can ensure only one callback is register at a time for every instrument, what do you think?
I can't immediately think of how I'd do that. Do you have an API in mind?
Possible OTel-Go APIs are being discussed here: https://github.com/open-telemetry/opentelemetry-go/pull/634
This API review is underway: open-telemetry/opentelemetry-go#634
Most helpful comment
From Gitter: