Opentelemetry-specification: Extra features supported by Jaeger clients

Created on 23 Aug 2019  路  7Comments  路  Source: open-telemetry/opentelemetry-specification

It would be great if in the future we could decommission Jaeger client libraries, which take non-trivial effort to support in all languages, and replace them with OpenTelemetry SDKs. @bogdandrutu asked me to enumerate additional features supported by Jaeger clients that are not currently supported by OpenTelemetry, to inform future roadmap after v1.

Remotely configurable sampling

Jaeger clients usually consult Jaeger backend for the sampling strategies to use. This is implemented as a polling clients -> agent -> collector, usually once a minute. The sampling can be statically configured on the backend, or automatically calculated to meet certain throughput goals. The sampling is controlled at the granularity of service + operation (aka span name), so that services (like API gateways) with endpoints that have vastly different QPS can sample different endpoints appropriately.

Firehose mode

Jaeger trace state contains a flag that indicates a firehose mode, in which traces are written to cheap storage and only accessible by trace ID, without indexing. This is useful when there are other upstream means of locating traces (e.g. trace ID is logged as part of an integration test), and allows higher throughput in the storage layer compared to fully indexed traces.

Setting debug flag

Jaeger trace state has a debug flag that tells the backend to try its best to sample the trace. For example, if the backend implements additional consistent downsampling (for capacity control), the traces with debug flag will avoid this downsampling.

From the API endpoint this is done by setting sampling.priority=1 tag on the root span.

In addition, the debug flag can be set by the user even before the trace is created, by including a special header jaeger-debug-id: anything. When Jaeger sees this header in the incoming request, it's equivalent to setting sampling.priority=1 and jaeger-debug-id=$value tags on the span. Storing the header value as a correlation ID allows finding the trace later. E.g. I can send a curl request with jaeger-debug-id: yuri-test-1.

Setting baggage

Similar to debug flag, there is a header jaeger-baggage: k=v,k=v that can be set by user before the trace even exists.

Baggage restrictions

This one is a bit iffy in terms of usefulness, but Jaeger clients also support remotely configurable way to restrict which services can set which baggage keys, as well as key/value lengths, etc.

Ad-hoc sampling policies

This is currently work in progress that I mentioned on the Sampling RFC. It's similar to Facebook's feature where users can centrally configure ad-hoc sampling policies to collect data exhibiting certain patterns, e.g a specific tag or a header or combination. Note that this is not after-the-fact sampling like "sample if there is an error or unusual latency", our ad-hoc sampling is still _mostly_ upfront. The main reason I mention it, even though it doesn't exist yet in Jaeger, is because it requires certain changes to the Sampler API in the SDK so that it can take into account various pieces of the span data like tags, etc.

sampling sdk after-ga baggage trace

Most helpful comment

One missing functionality is the ability to configure the client via environmental variables https://www.jaegertracing.io/docs/1.13/client-features/. This was proven to be useful for cloud/containerized deployments.

All 7 comments

Thanks I will transform this in individual issues to be addressed by the OpenTelemetry implementations.

One quick question from what I read: what is the interaction between baggages and traces?

There isn't much interaction. We usually log the baggage to the span when it is set, but aside from that baggage is a runtime thing.

One missing functionality is the ability to configure the client via environmental variables https://www.jaegertracing.io/docs/1.13/client-features/. This was proven to be useful for cloud/containerized deployments.

@yurishkuro for the:

Baggage restrictions

This one is a bit iffy in terms of usefulness, but Jaeger clients also support remotely configurable way to restrict which services can set which baggage keys, as well as key/value lengths, etc.

Does the length apply to incoming baggage key/value or only the once set by the process?
For the baggage keys configuration what is the behavior if the code tries to add a key that is not allowed by the config?

@pavolloffay the configuration can live in the Jaeger "exporter". I think we use a wrong name for exporter maybe rename it to Jaeger "client".

The idea is that we have the SDK that allows:

  • Setting span processor/exporters.
  • Setting TraceConfig at runtime.
  • Need to allow settings for distributed context restrictions.

Then the Jaeger "client" will depend on the SDK and:

  • Reads environment variables and converts them in OpenTelemetry config, then sets the right config in the SDK;
  • Sets the Jaeger exporter as the exporter for the trace SDK;
  • Starts a new timer that every 60 seconds reads the sampling config from the Jaeger backend, and if anything changes changes the SDK trace config;

@bogdandrutu I have also created a generic issue for SDK configuration https://github.com/open-telemetry/opentelemetry-specification/issues/232. Some configuration is indeed jaeger specific, however some properties apply to the whole SDK: specify the resource (service name...), reporter to use, propagation...

Starts a new timer that every 60 seconds reads the sampling config from the Jaeger backend, and if anything changes changes the SDK trace config;

That would be nice, we didn't have config watchers in jaeger.

@bogdandrutu

Does the length apply to incoming baggage key/value or only the once set by the process?

We've only implemented restrictions of baggage items set by the process, at the time they are set. No restrictions on propagated baggage.

For the baggage keys configuration what is the behavior if the code tries to add a key that is not allowed by the config?

It is not set, and a log entry is added to the span (log entry is added in all cases, btw).

Was this page helpful?
0 / 5 - 0 ratings