The current specification provides contradicting information. The overriding understanding is to maintain a clear separation of API and SDK. However, the ecommended method to create an metric instrument requires a MeterProvider that can only be obtain from an SDK.
From the Spec:
Monitoring and alerting systems commonly use the data provided through metric
events, after applying various aggregations and converting into
various exposition formats. However, we find that there are many other uses for
metric events, such as to record aggregated or raw measurements in tracing and
logging systems. For this reason, OpenTelemetry requires a separation of the
API from the SDK, so that different SDKs
can be configured at run time.
The link above library-guidelines.md has following requriement:
Requirements
- The OpenTelemetry API must be well-defined and clearly decoupled from the implementation. This allows end users to consume API only without also consuming the implementation (see points 2 and 3 for why it is important).
From the Spec:
A concrete
MeterProviderimplementation can be obtained by initializing and
configuring an OpenTelemetry Metrics SDK. This document does not
specify how to construct an SDK, only that they must implement the
MeterProvider.
New
Meterinstances can be created via aMeterProviderand its
GetMeter(name, version)method.
The API defines a
Meterinterface. This interface consists of a set
of instrument constructors, and a facility for capturing batches of
measurements in a semantically atomic way.
A vendor wants to provides a library for general use. This library need to emit metrics, thus will instrument library code using Open Telemetry Metric API. However, this is not possible unless the library also takes a dependency on a specific Open Telemetry SDK.
An application uses the above library. The application will need to include an Open Telemetry SDK so it can be configured appropriately.
Issues:
MeterProvider interface should be part of API
SDK should attach to MeterProvider when it is initialized
To make above work, MeterProvider will need to provide a way for SDK to attach to it. This attachment will thus allows the correct routing of metric measurments (i.e. [MeterProvider].GetMeter(...).CreateCounter(...)) to the attached SDK. This aligns with a classic EventSource / EventListener pattern.
The API will host the MeterProvider interface (global object or static class) and implement a EventSource / EventListener pattern.
using System.Collections.Generic;
using System.Collections.Concurrent;
namespace OpenTelmetry.Api
{
public class MetricSource
{
// singleton Default Source
private static MetricSource default_source = new MetricSource("DefaultSource");
// Global list of all sources
private static ConcurrentDictionary<string,MetricSource> source_registry = new();
// All listeners on this source
private ConcurrentDictionary<MetricListener, bool> listeners = new();
private string name;
private MetricSource(string name)
{
this.name = name;
}
public string GetName()
{
return name;
}
public static MetricSource GetSource(string name)
{
return source_registry.GetOrAdd(name, (k) => new MetricSource(k));
}
public static MetricSource DefaultSource
{
get => default_source;
}
public bool AttachListener(MetricListener listener)
{
return listeners.TryAdd(listener, true);
}
public bool DettachListener(MetricListener listener)
{
return listeners.TryRemove(KeyValuePair.Create(listener, true));
}
public ICollection<MetricListener> GetListeners()
{
return listeners.Keys;
}
}
}
SDK will implement a MetricListener interface and call AttachListener() to MetricSource instance.
using System;
using System.Collections.Generic;
namespace OpenTelmetry.Api
{
public abstract class MetricListener
{
/// <summary>
/// Let SDK know when new counters are being created
/// </summary>
public abstract bool OnCreate(MeterBase counter, LabelSet labels);
// TODO: Represent int/double as a generic class so we don't need two OnRecord() function
// TODO: Need discussion of carrying native number or BOX up the number into generic class
/// <summary>
/// Let SDK know when new measures are recorded
/// </summary>
public abstract bool OnRecord<T>(MeterBase meter, T value, LabelSet labels);
/// <summary>
/// Allow multiple measurements to be recorded atomicly
/// </summary>
public abstract bool OnRecord<T>(IList<Tuple<MeterBase, T>> records, LabelSet labels);
}
}
When instruments are ready to record a measurment...
protected void RecordMetricData<T>(T val, LabelSet labels)
{
if (Enabled)
{
// Route to all attached listeners
foreach (var listener in source.GetListeners())
{
listener?.OnRecord(this, val, labels);
}
}
}
For reference, I have an alternative/experimental implementation of OTel Metric API here...
https://github.com/noahfalk/opentelemetry-dotnet/tree/design_experiment/example3
The MeterProvider is a part of the API. If dotNet has implemented it differently, that's definitely an issue for their API.
Not according to the Spec as I quoted in my overview section.
A concrete MeterProvider implementation can be obtained by initializing and
configuring an OpenTelemetry Metrics SDK. This document does not
specify how to construct an SDK, only that they must implement the
MeterProvider.
A concrete implementation of the interface is acquired from an SDK. Not the interface itself.
It might be helpful to think that the API has a 'mini-sdk' embedded inside itself, which does no-op for everything. Library takes dependency on API, and it gets the TracerProvider or MeterProvider implementation from API, which would be the no-op one.
When SDK is installed/enabled, a different implementation of Tracer/MeterProvider will be returned, which does actual things.
To relate this back to the code of c# Tracing API: https://github.com/open-telemetry/opentelemetry-dotnet/blob/cde206062bc80dc060f9982b4fcf1809bfe36f7e/src/OpenTelemetry.Api/Trace/TracerProvider.cs Notice the default implementation even though it's providing a class it is done with a no-op as @cijothomas mentioned.
VS the SDK: https://github.com/open-telemetry/opentelemetry-dotnet/blob/cde206062bc80dc060f9982b4fcf1809bfe36f7e/src/OpenTelemetry/Trace/TracerProviderSdk.cs
This is the same as what is desired in the Api vs SDK for Metrics (and logs). It's similar to using https://docs.microsoft.com/en-us/dotnet/api/microsoft.extensions.logging.ilogger-1?view=dotnet-plat-ext-5.0
My concern is not about a "no-op" / "mini-sdk" embedded inside itself. It's more a question of allowing each Library to have it's own named scoped provider (that does not require SDK) that can be "late-bind" / attached to from SDK instance.
FYI: My comments are based on C# implementation. I'm trying to implement examples per SIG meeting. example
In my shared Library A, I cannot take a dependency on SDK. Thus, I can only access the "Default" MeterProvider.
Meter meter = MeterProvider.Default.GetMeter("MyLibrary", "1.0.0");
var counter = meter.CreateInt64Counter("counter");
Image I had another Library B, exactly like the one above.
Now, in my Application, I take a dependency on SDK. But, I need to address meters from Library A differently than Library B (SetPushInterval(), SetExporters, etc...). I cannot, AFAIK, because they are both referencing "Default" provider.
As best I know, the recommendation is to create a new instance of a MeterProvider (instead of the Default) so that I can configure that instance differently. But I need a SDK to do that.
In my experimental SDK code (in the original description), I attach via a named GetMeter(), instead of the MeterProvider.Default level. Thus, my Library A will GetMeter("LibraryA"), and Library B will GetMeter("LibraryB"). And then, when I have my SDK instances, I can attach to "LibraryA" and "LibraryB" respectively.
FYI. At least in C# implementation, the TraceProviderBuilder has a AddSource() function to attach the SDK to the Library TraceProvider by name.
I have a counter example to this where late binding is done explicitly by a library consumer.
Late binding (at least in java) is risky because class loading is lazy and you need some level of knowledge of which classes must be force loaded so they are even statically initialized.
I think from a spec standpoint we need to be careful not to prescribe language-specific solutions that don't translate well. The spirit of the request is to allow apis to grab meter without requiring them to construct an sdk. Let's just say that for now. The mechanism is likely to have lots of nuance based on runtime and I wouldn't want e.g. python to share java limitations
@jsuereth, I totally agree with your statement!
I think my use of "late-binding" was not intended to be specific to the mechanism. But rather as a mental model. I think I should just leave it as simply "a way to allow the SDK to attach to the MeterProvider".
I like the idea of using event-sourcing for "push" metrics in the API. However, the reality is that (e.g. in Java) metrics are actually pull based. To report metrics you need to construct an "interval reader" that looks up available metrics and either grabs their "local storage" of pending data points or goes and collects them.
I'm still getting up to speed but was curious if anyone (summoning @jmacd or @bogdandrutu) can speak to why Metrics is pull-based at the SDK level now vs. push-based like Tracing (at least in Java), and which direction the community is currently leaning towards.
speak to why Metrics is pull-based at the SDK level now vs. push-based like Tracing
One of the central tensions in OTel is that tracing has traditionally been push based, while metrics have a mixed history. Pull-based collection only works when aggregation is being applied, which generally means only for metrics.
We never got far enough in the early metrics SDK spec work to talk about what the SDKs do generically. In the OTel-Go prototype, there is a Controller component that is responsible for deciding when to collect from the accumulator. There is no per-instrument interval, it's just a single Collect() that outputs "accumulations" to the Processor. The latest OTel-Go SDK supports both push and pull in this way. Push mode just means there's a background thread pulling processed metrics and exporting them periodically. Pull mode just means there is not necessarily a background thread running w/ a push-based exporter.
This combined push/pull controller can push OTLP while exposing a Prometheus endpoint, for example. I've also used it to self-examine process metrics while pushing OTLP here https://github.com/lightstep/opentelemetry-prometheus-sidecar/blob/719e94051e0d629490d9aee6d9ea9d4d524dc7c2/health/health.go#L136, in a liveness handler.
I think from a spec standpoint we need to be careful not to prescribe language-specific solutions that don't translate well.
(Yes! In the early days of OTel I made proposals about late-binding in OTel-Go. The picture was so different in the other languages that we stopped discussing how to load SDKs, but it leaves behind the kind of confusion that led @victlu to file this. In OTel-Go late-binding works without an SDK dependency, at least.)
Common way for late-binding to work includes using Dependency Injection to avoid instrumented library from taking a dependency on SDK. This obviously works, but I think it would be better if the API spec does not require DI to allow the highlighted use cases. One proposal is to apply EventSource/EventListener pattern or some kind of Pub/Sub model.
One other issue. If Library X and Library Y gets their Meter interface from the global MeterProvider.Default (cause they do not want to take dependency on SDK). Then, we cannot build separate pipeline for Library X vs Library Y due to both using the same MeterProvider. If we use DI, it will be tricky to provision a different global MeterProvider.Default for Library X vs Library Y.
I gathered the general answer is to let each language define best method to allow Library code to gain access to a MeterProvider. Currently there are two options, #1 to use the global Meter Provider, and #2 to employ dependency injection.
I will close this overall Spec issue in favor of allowing each language SIG to determine their path forward.
Most helpful comment
I have a counter example to this where late binding is done explicitly by a library consumer.
Late binding (at least in java) is risky because class loading is lazy and you need some level of knowledge of which classes must be force loaded so they are even statically initialized.
I think from a spec standpoint we need to be careful not to prescribe language-specific solutions that don't translate well. The spirit of the request is to allow apis to grab meter without requiring them to construct an sdk. Let's just say that for now. The mechanism is likely to have lots of nuance based on runtime and I wouldn't want e.g. python to share java limitations