Spring-cloud-sleuth: Add support for Brave autoconfiguration

Created on 25 Sep 2017 · 20Comments · Source: spring-cloud/spring-cloud-sleuth

If Sleuth integrates with Brave then we would start supporting quite a few options that currently Sleuth doesn't support.

We may open issues for some stickier points:

[x] Provide migration path for name prefixed baggage (possibly by whitelisting names)
[ ] Make a JMS propagation strategy (ex fallback to check underscore convention; inject with underscores)
[x] hash out how to deal with naming conventions (ex http:/foo span namer)

The migration guide is drafted here - https://gist.github.com/marcingrzejszczak/d3c15a0c11dda71970e42c513c9c0e09

enhancement

Source

marcingrzejszczak

👍3

Most helpful comment

@marcingrzejszczak and I will be attacking this imminently, but it will involve significant effort here and in brave, so probably a month or two until completely done.

codefromthecrypt on 3 Jan 2018

👍3

All 20 comments

Would the semantics of this be: if Brave is on the classpath, use it as the tracer implementation; if not, use the existing homegrown Sleuth tracer?

shakuzen on 26 Sep 2017

right now, folks like @devinsba @jonathan-lo @jorgheymans and @ImFlog have or contribute to brave instrumentation, and this can work for example in classic spring environments or spring boot. Examples are grpc and kafka (not stream). There are also frameworks harder for us to support directly, such as dubbo.

In the case of http, if we had the ability to use brave types, we could obviate a lot of templated patterns, such as http sampling and span data policy.

So, to get highest value, there's probably a question of whether to implement a brave shim (which implies a decent amount of brave to implement it w/o brave), or whether to make sleuth a layer on brave (with the dependency locking implications of that).

There are some shortcuts we could take with a layer, which might reduce the amount of code needed, but it will still largely be copy/paste of code that is automatically fixed in brave.

So, it is a directional decision, something to ponder. Any folks have steering input? cc also @dsyer

codefromthecrypt on 26 Sep 2017

IMO the layer makes the most sense in the spring-boot philosophy. Make it easy to quickly get up and running doing tracing using brave/zipkin, provide trace-enable *Template classes and annotations etc, And at the same time facilitate a way to access the full power of brave for more advanced scenarios. This is how most spring-boot integrations work, from my perspective at least.

jorgheymans on 26 Sep 2017

👍1

Personally I've used both quite a bit over the last 2 years. I would love if sleuth was a layer on top of brave to enable all of the existing brave instrumentation.

For us, sleuth is an easy first step toward zipkin, we get log correlation and all of the built in goodies without needing to set up a reporter. If this was then layered on top of brave then it would allow those teams using things with brave instrumentation (grpc, dynamodb, etc..) to get these things as well.

devinsba on 26 Sep 2017

👍3

Our use case was fairly similar - we'd started with Sleuth, which was fantastic as a starting point, but found that we'd needed additional customisation ability from brave.

jonathan-lo on 27 Sep 2017

I also am really interested in this. We are using Sleuth everywhere for simplicity and easy standard tracing. But when we want some custom features it seems odd to switch to Brave. This would improve our possibilities while letting standard use cases to enjoy the built in tracing.

ImFlog on 27 Sep 2017

This is something that we could definitely use.

ghost on 28 Sep 2017

here is an interim solution for sending v2 spans and reusing the same reporter and senders brave uses https://github.com/spring-cloud/spring-cloud-sleuth/pull/720

codefromthecrypt on 3 Oct 2017

added this as it would be easier to setup senders with autoconfiguration. Ex rabbitmq transport for zipkin decorating on top of normal rabbit autoconfig https://github.com/openzipkin/zipkin-reporter-java/issues/71

codefromthecrypt on 9 Oct 2017

@marcingrzejszczak and I will be attacking this imminently, but it will involve significant effort here and in brave, so probably a month or two until completely done.

codefromthecrypt on 3 Jan 2018

👍3

working approach is to remove sleuth autoconfiguration and introduce brave ones until samples pass

codefromthecrypt on 3 Jan 2018

Not sure whether this was suggested before or not.

Why not take the approach of Spring Metrics & isolate this library from zipkin.

This way the library can be augmented to support Zipkin,OpenTracing (or) a new standard, natively instead of supporting just OpenZipkin

thekalinga on 8 Jan 2018

It has been suggested before but not comparing micrometer with OT (flimsy
comparison as micrometer is more like census than OT). Suffice to say the
suggestion is often sortof off the cuff and hypothetical based on merits
mentioned in OT marketing vs actual system interop.

Regardless, the goal isnt to add a hard dependency on OT 0.31.RC2 or any
other number of the variants. To the degree one of the many versions of OT
are needed it can be via the bridge.

There are already numerous means to export data to different systems via
zipkin api (ex jaeger datadog kamon google amazon etc). Other systems can
be added without requiring a hard dep on one of the several versions of OT.
"Just zipkin" is a half truth at best.

Anyway, by focusing on what the actual value we are trying to provide is,
we can better employ the limited hands available to maintaining things. OT
has not proven to reduce maintenance or expand options in a meaningful way
vs the path already in progress.

On 8 Jan 2018 8:43 pm, "Ashok Koyi" notifications@github.com wrote:

Not sure whether this was suggested before or not.

Why not take the approach of Sprint metris https://micrometer.io/ that
isolates this from zipkin.

This way this library can do everything it does to OpenTracing aswell
natively instead of just OpenZipkin

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spring-cloud/spring-cloud-sleuth/issues/711#issuecomment-355956837,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD61yw_zODjqitSJFPEJeJcYT6VM-LQks5tIg2HgaJpZM4PiyAF
.

codefromthecrypt on 9 Jan 2018

👍1

Thanks for the clarification. I'll wait things to settle down in this whole space then (from spring point of view).

I just came across this library that is more or less doing everything you see to be doing in this project for open tracing

https://github.com/opentracing-contrib/java-spring-cloud

Keeping it here just for reference

thekalinga on 9 Jan 2018

Yeah that library copy pasted a lot of code from sleuth when beginning. You
can tell as even log statements are copied verbatum

On 9 Jan 2018 7:27 pm, "Ashok Koyi" notifications@github.com wrote:

Thanks for the clarification. I'll wait things to settle down in this
whole space then.

I just came across this library that is more or less doing everything you
see to be doing in this project for open tracing

https://github.com/opentracing-contrib/java-spring-cloud

Keeping it here just for reference

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spring-cloud/spring-cloud-sleuth/issues/711#issuecomment-356257820,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD613a32VB8-DYiklXgeRmx3MMTZ_3nks5tI00FgaJpZM4PiyAF
.

codefromthecrypt on 9 Jan 2018

👍2

There seems to be demand for open tracing (for me, logs being inline with the trace is a good add). Probably will wait and watch on how zipkin & opentracing playout

thekalinga on 9 Jan 2018

Thanks for all the feedback. Note that most use log correlation rather than
trying to replace their logging backend with zipkin. This is a primary
feature of sleuth and brave. Have you used either yet?

On 9 Jan 2018 8:25 pm, "Ashok Koyi" notifications@github.com wrote:

There seems to be demand for open tracing (for me, logs being inline with
the trace is a significant tracer). Probably will wait and watch on how
zipkin & opentracing diverge/converge

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/spring-cloud/spring-cloud-sleuth/issues/711#issuecomment-356269877,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD61yDvIQy_dY9Pr0qugN6vRlZ3nDaKks5tI1q1gaJpZM4PiyAF
.

codefromthecrypt on 9 Jan 2018

@adriancole Agreed.

No. I have not used the setup completely in production as of now. I have to integrate observability in my product (which is why I'm scouting for the right combo)

Here is my view based on the documentation I have read

Log aggregation (& visualisation in UI, lets say Kibana) is helpful for detailed analysis & cannot be replaced with tracing system. As traces are only samples, where as logs are not.
Having said that, Having inline logs that accompany trace will help contextualise the issue much quicker, so long as it does not hurt the performance in any meaningful way. Again, this choice can be made based on the performance requirements of the system with opentracing, but not with zipkin
One thing I forgot to mention such as adaptive sampling from Jaeger (WIP as of today) seems to be a much more compelling reason. I believe Jaeger supports zipkin format aswell, not sure if zipkin API supports this or not

Thanks again for engaging with me in helping me understand the space better. Please correct me if I've gone off rails in my understanding

thekalinga on 9 Jan 2018

Both brave and sleuth do log correlation independently of trace sampling.

With regards to adaptive sampling, zipkin has had an adaptive sampler in
the past 'zipkin-zookeeper' in fact it is still built. However most dont
agree on a coordination mechanism for the adaptive part (eg zookeeper).
Jaeger relies on proprietary agents to coordinate iiuc. This obviously only
works when you can deploy and manage host agents.

Regardless the sampling mechanism whether parameterized or based solely on
trace id is pluggable. This means that whatever mechanism you choose,
there's likely a way to attach it. There is no reason to require an
adaptive sampler as a core project, iotw, as it can plugin via interfaces
like other things.

Bear in mind that the zipkin ecosystem isnt new and folks have been around
the block. There are different ideas and staffing levels which influences
tech. For example, uber chose to fork zipkin internally for a year and
later open source that work as jaeger. They could afford to create and
manage proprietary tech. Most in zipkin ecosystem are lightly staffed so
cannot take on the responsibility of logging infra just to do tracing.

Some who have been doing both have other stories to work on besides
adaptive sampling. For example, we are currently working on a "firehose
mode" which would absorb 100% of the traces obviating adaptive sampling
entirely except for overload conditions. This allows folks to do post
factum decisions and/or support things like a 1hr window where those on
call are likely to have a trace.

Others like census do a local 100% mode with a query endpoint. This acts
like stats endpoints in other services and allows things like viewing
in-flight transactions or late lazy collection even if at worst manual.

Tldr is that there are many working besides the staff on jaeger etc. If
you want to consider all options best to look around as OT is far from the
only game in town even if it is likely the best marketed one.

On 9 Jan 2018 8:52 pm, "Ashok Koyi" notifications@github.com wrote:

@adriancole https://github.com/adriancole Agreed.

No. I have not used the setup completely in production as of now. I have
to integrate observability in my product (which is why I'm scouting for the
right combo)

Here is my view based on the documentation I have read

Log aggregation (& visualisation in UI, lets say Kibana) is helpful
for detailed analysis & cannot be replaced with tracing system. As traces
are only samples, where as logs are not.

Having said that, Having inline logs that accompany trace will help
contextualise the issue much quicker, so long as it does not hurt the
performance significantly. Again, this choice can be made based on the
performance requirements of the system

One thing I forgot to mention such as adaptive sampling from Jaeger
(WIP as of today) seems to be a much more compelling reason. I believe
Jaeger supports zipkin format aswell, so no qualms with zipkin here

Thanks again for your engaging with me in helping me understand the space
better

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/spring-cloud/spring-cloud-sleuth/issues/711#issuecomment-356275606,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD6164JSXquKPC2MTj_dgC8FaLnTmEyks5tI2EQgaJpZM4PiyAF
.