Opentelemetry-specification: Allow one observer to observe multiple metrics

Created on 7 Apr 2020  路  13Comments  路  Source: open-telemetry/opentelemetry-specification

Summary

In opentelemetry-go, one metric == one observer.

This proposal stresses the need for one observer to observe multiple metrics.

The best use case is for reading various system metrics in one call such as:

import "runtime"

var ms runtime.MemStats
runtime.ReadMemStats()

This operation (and others) can be expensive. This one in particular can populates ~30 variables where each one is fit to be in its own metric/measure. Therefore, the current way it would work would have to be the following:

mm := metric.Must(meter)

mm.RegisterInt64Observer("process/heap_alloc", func(result metric.Int64ObserverResult) {
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    result.Observe(int64(ms.HeapAlloc))
})
mm.RegisterInt64Observer("process/pause_ns", func(result metric.Int64ObserverResult) {
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    result.Observe(int64(ms.PauseNs[(ms.NumGC+255)%256]))
})
mm.RegisterInt64Observer("process/sys_heap", func(result metric.Int64ObserverResult) {
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    result.Observe(int64(ms.Sys))
})
// etc etc...

I hope you can see that this becomes inefficient fairly quickly.

Workaround

As a workaround, it was suggested on Gitter that we can just use Labels to substitute for multiple metrics. This has two issues:

  1. You have to stick with either int64 or float64, you cannot use both in the same observer.
  2. Using labels feels like a stretch. For an example, see https://github.com/marwan-at-work/otelruntime/blob/master/otelruntime.go#L52

Proposal

I propose that users should be able to register one call back that will populate multiple metrics as such:

mm.RegisterBatchInt64Observer(func(result metric.BatchInt64ObserverResult) {
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    result.Observe("process/heap_alloc", int64(ms.HeapAlloc)))
    result.Observe("process/sys_heap", int64(ms.PauseNs[(ms.NumGC+255)%256]))
    result.Observe("process/sys_heap", int64(ms.Sys)))
})

If metric names must be passed ahead of time, then you'd probably need to pass the metric names in the RegisterBatch<T>Observer function.

Otherwise, I'm happy to hear any other suggestions.

Thanks

metrics

Most helpful comment

From Gitter:

@codeboten
Apr 06 11:41
i thought about implementing this with a single callback and using labels instead of separate observers, but i wasn't sure if that would make things better or worst

Tristan Sloughter
@tsloughter
Apr 06 11:52
I was thinking about how to do similar for the Erlang VM metrics and asked in Java channel since they would need similar. John pointed me to the Java code and they use labels instead of many observers

Tristan Sloughter
@tsloughter
Apr 06 11:53
like https://github.com/open-telemetry/opentelemetry-java/blob/e248ea7ad74b3ac9774f871636c8f6b839be998f/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java

alrex
@codeboten
Apr 06 12:56
right, it's similar to what's done in some examples in the python repo https://github.com/open-telemetry/opentelemetry-python/blob/master/docs/examples/basic_meter/observer.py#L39

All 13 comments

From Gitter:

@codeboten
Apr 06 11:41
i thought about implementing this with a single callback and using labels instead of separate observers, but i wasn't sure if that would make things better or worst

Tristan Sloughter
@tsloughter
Apr 06 11:52
I was thinking about how to do similar for the Erlang VM metrics and asked in Java channel since they would need similar. John pointed me to the Java code and they use labels instead of many observers

Tristan Sloughter
@tsloughter
Apr 06 11:53
like https://github.com/open-telemetry/opentelemetry-java/blob/e248ea7ad74b3ac9774f871636c8f6b839be998f/contrib/runtime_metrics/src/main/java/io/opentelemetry/contrib/metrics/runtime/MemoryPools.java

alrex
@codeboten
Apr 06 12:56
right, it's similar to what's done in some examples in the python repo https://github.com/open-telemetry/opentelemetry-python/blob/master/docs/examples/basic_meter/observer.py#L39

the proposal -> this is how I implemented it in JS
And I think the spec says the same:
Multiple observations are permitted in a single callback invocation.
https://github.com/open-telemetry/oteps/blob/master/text/0072-metric-observer.md#observer-calling-conventions

OTEP 72 does say that multiple observations are permitted per callback, but the intention at that point was to support multiple (value, label set) observations for a single metric. The language is unclear, but the pseudocode in the OTEP is somewhat clear.

In getting to OTEP 72 we discussed several related issues. We removed "bound" observer instruments, justifying this because LabelSets were still part of the API and support caching the label encoding used in an exporter.

Now that we have removed LabelSets from the API, the notion of a bound observer starts to look like a real optimization. What would an API look like to support multiple metrics to be observed from one callback?

We would register callbacks with the SDK directly, since they are not associated with metrics directly. We could have three analogous calling patterns to the synchronous instruments: direct calls, bound calls, and batch calls. Callbacks would be executed sequentially in the order they were registered. For example:

var instrument1 = global.Meter(...).RegisterInt64Observer(...)
var instrument2 = global.Meter(...).RegisterFloat64Observer(...)
var boundInst2 = instrument2.Bind(labels...)
var instrument3 = global.Meter(...).RegisterInt64Observer(...)

func Callback(result metric.ObserverResult) {

   // direct call
   result.Observe(instrument1, value1, labels1...)

   // bound call
   boundInst2.Observe(result, value2)

   // batch call
   result.ObserveBatch([]core.KeyValue{labels3}, instrument1.Measurement(value1), instrument3.Measurement(value3), ...)
}

This sort of change requires us to mention that callbacks are executed in order, whereas we didn't have to before. Last-value wins was previously applied to the result from one callback, but now it has to be applied to the result from all callbacks.

Thoughts? @bogdandrutu this is a significant problem, please have a look.

I like the idea of adding a callback interface to the Meter.

Java pseudocode:

meter.addObserver(Callback<Meter> meterCallback)

where Callback<T> looks something like this:

interface Callback<R> {
  void update(R input);
}

What if instead of function as a callback it would be an observable object. This way the problem with sync / async would be solved.
for example

const observable1 = new ObservableObject();
observerResult.observe(observable1, labels);


// anywhere in a code doesn't matter when 
observable1.next(1);
//or
setTimeout(()=>{
  observable1.next(1);
}, 1000)
// etc.

so every time you call observable1.next(1); it will simply update the metric.

The main problem when the callback is a function is that you have no easy way of getting data which requires async calls. And also you will have problem with blocking the sync call for longer than it is needed.
WDYT ?

@obecny What you've written looks like a synchronous instrument through and through. There are reasons for Observers to be asynchronous, they're supposed to be a solution. The reasons:

  1. Optimization for expensive-to-read measurements
  2. Semantics of LastValue relationship
  3. Context-free reporting API.

We had made the Observer callback support a single metric merely out of a desire to keep things as simple as possible. I believe this example demonstrates that we have oversimplified. The proposal I made above for bound observers surfaced in early in the 0.3 conversation IIRC, and we dropped it.

I think it should be possible to have the "batch-callback" similar to batch record. Probably not one per meter, but allow batching-callback is indeed something that we can support.

@jmacd I would rely on a combination of the BatchRecorder and ObserverResult to achieve the performance you mentioned:

var instrument1 = global.Meter(...).RegisterInt64Observer(...)
var instrument2 = global.Meter(...).RegisterFloat64Observer(...)
var instrument3 = global.Meter(...).RegisterInt64Observer(...)
global.Meter(...).RegisterBatchCallback(Callback)

func Callback(result metric.ObserverBatchResult) {
   // batch call
   result.ObserveBatch([]core.KeyValue{labels3}, instrument1.Measurement(value1), instrument3.Measurement(value3), ...)
}

// OR

func Callback(result metric.ObserverBatchResult) {
   // batch call
   result.BatchRecorder([]core.KeyValue{labels3})
             .add(instrument1.Measurement(value1))
             .add(instrument3.Measurement(value3))
             .record()
}

I like this. There are a few details to work out, like:

  1. Instruments may pass a null callback, they can be registered w/o defining a callback due to batch observations
  2. Callbacks will be called in the ordered they were registered, whether on individual instruments or batch observer callbacks, to preserve last-value-wins semantics
  3. I take it you mean that the above two calling patterns are both valid options, and may vary by language? I am on-board with this. I think Golang will take the first of these options, but recognize that others will adopt the second choice.

As we have removed LabelSets, the BatchRecorder pattern that you bring up stands in as a replacement for LabelSet. I wonder if you would allow this pattern:

var (
   meter = global.Meter("library")

   metricRecorder = meter.AsyncBatchRecorder(labels...) // Or "ObserverBatchRecorder"

    _ = meter.RegisterBatchCallback(
       func (result metric.ObserverBatchResult) {
              metricRecorder.record(result, 
                   instrument1.Measurement(value1),
                   instrument3.Measurement(value3),
               )
        })
)

I have the same question for synchronous usage, e.g.:

func F(ctx context.Context, ch <-chan T) {
   meter := global.Meter("library")
   metricRecorder := meter.SyncBatchRecorder(labels...) // Or "MeasureBatchRecorder"

   for elem := range ch {
              metricRecorder.record(ctx, 
                   instrument1.Measurement(elem.X),
                   instrument3.Measurement(elem.Y),
               )
        })
)

Note the only difference between Sync- and Async- BatchRecorders is that Sync- recorders take Context, while Async- recorders take ObserverBatchResult.

Instruments may pass a null callback, they can be registered w/o defining a callback due to batch observations
Callbacks will be called in the ordered they were registered, whether on individual instruments or batch observer callbacks, to preserve last-value-wins semantics

I would like to think of a solution where we can ensure only one callback is register at a time for every instrument, what do you think?

I can't immediately think of how I'd do that. Do you have an API in mind?

Possible OTel-Go APIs are being discussed here: https://github.com/open-telemetry/opentelemetry-go/pull/634

This API review is underway: open-telemetry/opentelemetry-go#634

Was this page helpful?
0 / 5 - 0 ratings