Opentelemetry-specification: Open Telemetry SDK should not be required when instrumenting a Library

Created on 11 Feb 2021 · 16Comments · Source: open-telemetry/opentelemetry-specification

Overview

The current specification provides contradicting information. The overriding understanding is to maintain a clear separation of API and SDK. However, the ecommended method to create an metric instrument requires a MeterProvider that can only be obtain from an SDK.

API and SDK must be clearly decoupled

From the Spec:

Monitoring and alerting systems commonly use the data provided through metric
events, after applying various aggregations and converting into
various exposition formats. However, we find that there are many other uses for
metric events, such as to record aggregated or raw measurements in tracing and
logging systems. For this reason, OpenTelemetry requires a separation of the
API from the SDK, so that different SDKs
can be configured at run time.

The link above library-guidelines.md has following requriement:

Requirements

The OpenTelemetry API must be well-defined and clearly decoupled from the implementation. This allows end users to consume API only without also consuming the implementation (see points 2 and 3 for why it is important).

MeterProvider is obtained from an SDK

From the Spec:

A concrete MeterProvider implementation can be obtained by initializing and
configuring an OpenTelemetry Metrics SDK. This document does not
specify how to construct an SDK, only that they must implement the
MeterProvider.

New Meter instances can be created via a MeterProvider and its
GetMeter(name, version) method.

The API defines a Meter interface. This interface consists of a set
of instrument constructors, and a facility for capturing batches of
measurements in a semantically atomic way.

Example

A vendor wants to provides a library for general use. This library need to emit metrics, thus will instrument library code using Open Telemetry Metric API. However, this is not possible unless the library also takes a dependency on a specific Open Telemetry SDK.

An application uses the above library. The application will need to include an Open Telemetry SDK so it can be configured appropriately.

Issues:

There is confusion on which SDK is active.
What happens if Library uses a different version of SDK from Application?
There is question on what happens if SDK is late to initalize.

Proposed Solution

MeterProvider interface should be part of API
SDK should attach to MeterProvider when it is initialized

Using a EventSource / EventListener pattern

To make above work, MeterProvider will need to provide a way for SDK to attach to it. This attachment will thus allows the correct routing of metric measurments (i.e. [MeterProvider].GetMeter(...).CreateCounter(...)) to the attached SDK. This aligns with a classic EventSource / EventListener pattern.

The API will host the MeterProvider interface (global object or static class) and implement a EventSource / EventListener pattern.

using System.Collections.Generic;
using System.Collections.Concurrent;

namespace OpenTelmetry.Api
{
    public class MetricSource
    {
        // singleton Default Source
        private static MetricSource default_source = new MetricSource("DefaultSource");

        // Global list of all sources
        private static ConcurrentDictionary<string,MetricSource> source_registry = new();

        // All listeners on this source
        private ConcurrentDictionary<MetricListener, bool> listeners = new();

        private string name;

        private MetricSource(string name)
        {
            this.name = name;
        }

        public string GetName()
        {
            return name;
        }

        public static MetricSource GetSource(string name)
        {
            return source_registry.GetOrAdd(name, (k) => new MetricSource(k));
        }

        public static MetricSource DefaultSource
        {
            get => default_source;
        }

        public bool AttachListener(MetricListener listener)
        {
            return listeners.TryAdd(listener, true);
        }

        public bool DettachListener(MetricListener listener)
        {
            return listeners.TryRemove(KeyValuePair.Create(listener, true));
        }

        public ICollection<MetricListener> GetListeners()
        {
            return listeners.Keys;
        }

    }
}

SDK will implement a MetricListener interface and call AttachListener() to MetricSource instance.

using System;
using System.Collections.Generic;

namespace OpenTelmetry.Api
{
    public abstract class MetricListener
    {
        /// <summary>
        /// Let SDK know when new counters are being created
        /// </summary>
        public abstract bool OnCreate(MeterBase counter, LabelSet labels);

        // TODO: Represent int/double as a generic class so we don't need two OnRecord() function
        // TODO: Need discussion of carrying native number or BOX up the number into generic class

        /// <summary>
        /// Let SDK know when new measures are recorded
        /// </summary>
        public abstract bool OnRecord<T>(MeterBase meter, T value, LabelSet labels);

        /// <summary>
        /// Allow multiple measurements to be recorded atomicly
        /// </summary>
        public abstract bool OnRecord<T>(IList<Tuple<MeterBase, T>> records, LabelSet labels);
    }
}

When instruments are ready to record a measurment...

protected void RecordMetricData<T>(T val, LabelSet labels)
{
    if (Enabled)
    {
        // Route to all attached listeners
        foreach (var listener in source.GetListeners())
        {
            listener?.OnRecord(this, val, labels);
        }
    }
}

api metrics

Source

victlu

Most helpful comment

I have a counter example to this where late binding is done explicitly by a library consumer.

Late binding (at least in java) is risky because class loading is lazy and you need some level of knowledge of which classes must be force loaded so they are even statically initialized.

I think from a spec standpoint we need to be careful not to prescribe language-specific solutions that don't translate well. The spirit of the request is to allow apis to grab meter without requiring them to construct an sdk. Let's just say that for now. The mechanism is likely to have lots of nuance based on runtime and I wouldn't want e.g. python to share java limitations

jsuereth on 17 Feb 2021

👍2

All 16 comments

For reference, I have an alternative/experimental implementation of OTel Metric API here...

https://github.com/noahfalk/opentelemetry-dotnet/tree/design_experiment/example3

victlu on 11 Feb 2021

The MeterProvider is a part of the API. If dotNet has implemented it differently, that's definitely an issue for their API.

jkwatson on 11 Feb 2021

Not according to the Spec as I quoted in my overview section.

A concrete MeterProvider implementation can be obtained by initializing and
configuring an OpenTelemetry Metrics SDK. This document does not
specify how to construct an SDK, only that they must implement the
MeterProvider.

victlu on 11 Feb 2021

A concrete implementation of the interface is acquired from an SDK. Not the interface itself.

jkwatson on 11 Feb 2021

👍1

It might be helpful to think that the API has a 'mini-sdk' embedded inside itself, which does no-op for everything. Library takes dependency on API, and it gets the TracerProvider or MeterProvider implementation from API, which would be the no-op one.

When SDK is installed/enabled, a different implementation of Tracer/MeterProvider will be returned, which does actual things.

cijothomas on 11 Feb 2021

👍2

To relate this back to the code of c# Tracing API: https://github.com/open-telemetry/opentelemetry-dotnet/blob/cde206062bc80dc060f9982b4fcf1809bfe36f7e/src/OpenTelemetry.Api/Trace/TracerProvider.cs Notice the default implementation even though it's providing a class it is done with a no-op as @cijothomas mentioned.
VS the SDK: https://github.com/open-telemetry/opentelemetry-dotnet/blob/cde206062bc80dc060f9982b4fcf1809bfe36f7e/src/OpenTelemetry/Trace/TracerProviderSdk.cs

This is the same as what is desired in the Api vs SDK for Metrics (and logs). It's similar to using https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.ilogger-1?view=dotnet-plat-ext-5.0

hdost on 12 Feb 2021

My concern is not about a "no-op" / "mini-sdk" embedded inside itself. It's more a question of allowing each Library to have it's own named scoped provider (that does not require SDK) that can be "late-bind" / attached to from SDK instance.

FYI: My comments are based on C# implementation. I'm trying to implement examples per SIG meeting. example

In my shared Library A, I cannot take a dependency on SDK. Thus, I can only access the "Default" MeterProvider.

Meter meter = MeterProvider.Default.GetMeter("MyLibrary", "1.0.0"); var counter = meter.CreateInt64Counter("counter");

Image I had another Library B, exactly like the one above.

Now, in my Application, I take a dependency on SDK. But, I need to address meters from Library A differently than Library B (SetPushInterval(), SetExporters, etc...). I cannot, AFAIK, because they are both referencing "Default" provider.

As best I know, the recommendation is to create a new instance of a MeterProvider (instead of the Default) so that I can configure that instance differently. But I need a SDK to do that.

In my experimental SDK code (in the original description), I attach via a named GetMeter(), instead of the MeterProvider.Default level. Thus, my Library A will GetMeter("LibraryA"), and Library B will GetMeter("LibraryB"). And then, when I have my SDK instances, I can attach to "LibraryA" and "LibraryB" respectively.

victlu on 17 Feb 2021

FYI. At least in C# implementation, the TraceProviderBuilder has a AddSource() function to attach the SDK to the Library TraceProvider by name.

victlu on 17 Feb 2021

I have a counter example to this where late binding is done explicitly by a library consumer.

Late binding (at least in java) is risky because class loading is lazy and you need some level of knowledge of which classes must be force loaded so they are even statically initialized.

jsuereth on 17 Feb 2021

👍2

@jsuereth, I totally agree with your statement!

I think my use of "late-binding" was not intended to be specific to the mechanism. But rather as a mental model. I think I should just leave it as simply "a way to allow the SDK to attach to the MeterProvider".

victlu on 17 Feb 2021

I like the idea of using event-sourcing for "push" metrics in the API. However, the reality is that (e.g. in Java) metrics are actually pull based. To report metrics you need to construct an "interval reader" that looks up available metrics and either grabs their "local storage" of pending data points or goes and collects them.

I'm still getting up to speed but was curious if anyone (summoning @jmacd or @bogdandrutu) can speak to why Metrics is pull-based at the SDK level now vs. push-based like Tracing (at least in Java), and which direction the community is currently leaning towards.

jsuereth on 17 Feb 2021

speak to why Metrics is pull-based at the SDK level now vs. push-based like Tracing

One of the central tensions in OTel is that tracing has traditionally been push based, while metrics have a mixed history. Pull-based collection only works when aggregation is being applied, which generally means only for metrics.

We never got far enough in the early metrics SDK spec work to talk about what the SDKs do generically. In the OTel-Go prototype, there is a Controller component that is responsible for deciding when to collect from the accumulator. There is no per-instrument interval, it's just a single Collect() that outputs "accumulations" to the Processor. The latest OTel-Go SDK supports both push and pull in this way. Push mode just means there's a background thread pulling processed metrics and exporting them periodically. Pull mode just means there is not necessarily a background thread running w/ a push-based exporter.

This combined push/pull controller can push OTLP while exposing a Prometheus endpoint, for example. I've also used it to self-examine process metrics while pushing OTLP here https://github.com/lightstep/opentelemetry-prometheus-sidecar/blob/719e94051e0d629490d9aee6d9ea9d4d524dc7c2/health/health.go#L136, in a liveness handler.

jmacd on 24 Feb 2021

I think from a spec standpoint we need to be careful not to prescribe language-specific solutions that don't translate well.

(Yes! In the early days of OTel I made proposals about late-binding in OTel-Go. The picture was so different in the other languages that we stopped discussing how to load SDKs, but it leaves behind the kind of confusion that led @victlu to file this. In OTel-Go late-binding works without an SDK dependency, at least.)

jmacd on 24 Feb 2021

Common way for late-binding to work includes using Dependency Injection to avoid instrumented library from taking a dependency on SDK. This obviously works, but I think it would be better if the API spec does not require DI to allow the highlighted use cases. One proposal is to apply EventSource/EventListener pattern or some kind of Pub/Sub model.

victlu on 24 Feb 2021

One other issue. If Library X and Library Y gets their Meter interface from the global MeterProvider.Default (cause they do not want to take dependency on SDK). Then, we cannot build separate pipeline for Library X vs Library Y due to both using the same MeterProvider. If we use DI, it will be tricky to provision a different global MeterProvider.Default for Library X vs Library Y.

victlu on 2 Mar 2021

I gathered the general answer is to let each language define best method to allow Library code to gain access to a MeterProvider. Currently there are two options, #1 to use the global Meter Provider, and #2 to employ dependency injection.

I will close this overall Spec issue in favor of allowing each language SIG to determine their path forward.

victlu on 15 Mar 2021

Was this page helpful?

0 / 5 - 0 ratings