Spring-cloud-sleuth: Sleuth Reactive Overhead

Created on 8 Jul 2019  路  13Comments  路  Source: spring-cloud/spring-cloud-sleuth

@marcingrzejszczak We seem to experience a fall in Spring Cloud Gateway latency when enabling Sleuth.

I have a created a sample project that demonstrates the Sleuth reactive overhead. A very rough summary is to say that Spring Cloud Gateway response times doubles when enabling Sleuth.

For details see:
https://github.com/tony-clarke-amdocs/sleuth-reactive-perf

enhancement in progress

Most helpful comment

That's a known issue due to instrumentation of project reactor. We're working with @bsideup to improve it. First reactor netty needs to be updated so that we can stop doing the onEachOperator wrapping and do onLastOperator. You can check out more about it here https://github.com/spring-cloud/spring-cloud-sleuth/issues/1392

All 13 comments

That's a known issue due to instrumentation of project reactor. We're working with @bsideup to improve it. First reactor netty needs to be updated so that we can stop doing the onEachOperator wrapping and do onLastOperator. You can check out more about it here https://github.com/spring-cloud/spring-cloud-sleuth/issues/1392

OK. I think the project, sleuth-reactive-perf, will help validate any improvements.

First approach to work on this would be this flag: https://github.com/spring-cloud/spring-cloud-sleuth/issues/1478

Reading the doc it says:

The downside of this is that when Project Reactor will change threads, the trace propagation will continue without issues, however anything relying on the ThreadLocal such as e.g. MDC entries can be buggy.

Trying to understand this. Can you elaborate a little more? Having invalid values in MDC sounds like a deal breaker. Isn't the trace ids itself stored in the MDC?

Not necessarily. When you're e.g. in the Gateway application, you don't care about custom MDC entries. The idea is to pass the tracing context from the input, to the output. That works well and the performance gain is gigantic (3-4 times faster).

What if your gateway application adds entries to MDC...so that for example sleuth logs some custom key/values from MDC.

I don't think that it's a common case that the Gateway does some additional, custom processing where log entries would have MDC entries modified. If that's the case however, you can remain with the current option to ensure that the tracing context is propagated and every single operator is wrapped in a tracing representation, together with setting MDC values.

Also with onLast it doesn't mean that we're not setting the MDC values, we do, however Reactor can switch threads and those MDC entries might be lost.

cc @smaldini @violetagg @bsideup

That is exactly what we are doing. We add additional entries to MDC from http headers. So this new flag is not an option and the current behavior is not an option since it kills performance.
Can't we store the MDC in reactor context instead of thread local? See: here

That would have to be done in Brave cc @adriancole

You can use the https://github.com/spring-cloud/spring-cloud-sleuth/issues/1478 switch to reduce the overhead

Has something changed that we mark this as closed? Previously we said that a change would be required in Brave.

You can use the #1478 switch to reduce the overhead

You can use a switch to use the onLast operator and that will reduce the overhead.

As for having something in-built in Brave, you would have to file an issue in Brave I guess. From Sleuth perspective we've added a feature to lower the overhead of reactor instrumentation.

We cannot switch to onLast since it changes the behavior and MDC content is lost. @adriancole any ideas if something can be done?

Was this page helpful?
0 / 5 - 0 ratings