Spring-cloud-sleuth: TraceId does not propagate between threads

Created on 1 Oct 2020  路  18Comments  路  Source: spring-cloud/spring-cloud-sleuth

Describe the bug
TraceId does not propagate between threads since version 2.2.2.RELESE of sleuth (Spring Cloud Hoxton.SR3)
I've tested with verison 2.2.1.RELEASE (Hoxton.SR2) - works correctly
On newest version - 2.2.5.RELEASE issue still persists.

Sample
Here's test Controller:

`
package app.testing;

import brave.Span;
import brave.Tracer;
import brave.propagation.TraceContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {
    private Tracer tracer;

    @Autowired
    public TraceIdController(Tracer tracer) {
        this.tracer = tracer;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.just(getTraceId()))
                .subscribeOn(Schedulers.elastic())
                .map(elasticThreadId -> new TraceIdDto(traceId, elasticThreadId.orElse(null)));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

package app.testing;

import lombok.AllArgsConstructor;
import lombok.Data;

@Data
@AllArgsConstructor
public class TraceIdDto {
    private String nativeThreadTraceId;
    private String elasticThreadTraceId;
}

`

And junit test that shows the problem:
` package app.testing;

import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import org.springframework.test.web.reactive.server.WebTestClient;

import static org.assertj.core.api.Assertions.assertThat;

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
public class TraceIdControllerTest {

    @Autowired
    private WebTestClient webTestClient;

    @Test
    public void testTraceId() {

        TraceIdDto traceIdDto = webTestClient.get().uri("/api/v1/traceId/test")
                .exchange().expectStatus().isOk()
                .expectBody(TraceIdDto.class)
                .returnResult().getResponseBody();

        assertThat(traceIdDto.getNativeThreadTraceId()).isEqualTo(traceIdDto.getElasticThreadTraceId());
    }

}

`

Proposed solution
I've investiagted sleuth code a little bit, and I think the problem is in SleuthContextListener. I've compered this class with ContextRefreshedListener that existed in 2.2.1 version.
It seems like SleuthContextListener sets refreshed and closed flags to false, when appropriate event comes, instead to true.
... listener.refreshed.compareAndSet(false, event instanceof ContextRefreshedEvent); listener.closed.compareAndSet(false, event instanceof ContextClosedEvent); ...

feedback-provided

Most helpful comment

@marcingrzejszczak here you go: https://github.com/jacekmg/sleuth-trace-id-propagation-test
Let me know if that's sufficient for you.

All 18 comments

Are you willing to create a failing test and a PR for this?

I'd rather not. I think, it would be better if someone who knows sleuth code better, did it.

No problem. What you could do is change the way SleuthContextListener works locally and check your code against snapshots to confirm that this is a problem.

I've tested a little bit more, and I was wrong - SleuthContextListener sets the refreshed flag correctly. However ContextUtil retrieves different instance of SleuthContextListener than the one that handles ContextRefreshedEvent. So I'm not sure how to fix the issue at the moment.

That's useful, thanks. I'll try to look at it.

I can't replicate this. It will be difficult to fix this without a sample. Can you please provide one?

So the sample that I provided up above works for you without a problem?

Those are two snippets, can you create a sample as a Java maven project that actually replicates this? That would be super helpful.

Allright, I will try to create a project for you. Gradle instead of maven won't be problem for you?

I'm not a fan of Gradle but sure ;)

@marcingrzejszczak here you go: https://github.com/jacekmg/sleuth-trace-id-propagation-test
Let me know if that's sufficient for you.

Thanks for the sample - it was super useful.

So unfortunately Sleuth and Reactor are a problematic couple from the very beginning. That's mainly Sleuth's fault - we're trying to make things super easy for the user without the user's need to interact with any of the Sleuth's internals. With Reactor we're going away of such an approach (that's why in Sleuth 3.0.0 you'll have the utility operators fro messaging, webflux etc.).

Coming back to your case - the trace id is properly propagated via the reactor context. For some reason it's just not set as a thread local.

What you should be doing is the following (for Sleuth 2.x)

@RestController
public class TraceIdController {

    private Tracer tracer;

    @Autowired
    public TraceIdController(Tracer tracer) {
        this.tracer = tracer;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.subscriberContext()
                    .map(ctx -> Optional.of(ctx.get(TraceContext.class)).map(TraceContext::traceIdString))
                .subscribeOn(Schedulers.elastic())
                .map(traceIdFromNewThread -> new TraceIdDto(traceId, traceIdFromNewThread.orElse(null))));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

and example for Sleuth 3.0.0

.application.yaml - disables the default on each operator wrapping

spring:
  sleuth:
    reactor:
      instrumentation-type: manual

.TraceIdController

package app.controllers;

import app.dtos.TraceIdDto;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.sleuth.CurrentTraceContext;  // (1)
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.TraceContext;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.cloud.sleuth.instrument.web.WebFluxSleuthOperators;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ServerWebExchange;

import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get(ServerWebExchange serverWebExchange) {
        String traceId = WebFluxSleuthOperators.withSpanInScope(tracer, currentTraceContext, 
            serverWebExchange, () -> getTraceId().orElse(null));  // (2)
        return Mono.defer(Mono::subscriberContext)
                .subscribeOn(Schedulers.elastic())
                .map(context -> WebFluxSleuthOperators.withSpanInScope(context, 
                    () -> new TraceIdDto(traceId, getTraceId().orElse(null)))); // (3)
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceId);
    }

}
  • (1) - new imports to the Sleuth's API
  • (2) - you can retrieve trace information from ServerWebExchange; the lambda will contain the trace information set in thread local
  • (3) - you can retrieve trace information from Reactor's Context; the lambda will contain the trace information set in thread local

That way things will not only just work but actually work much faster.

To clarify, the issue won't be fixed for webflux?
Because it also affects slf4j Logger. I've added logger to the sample project, and we can clearly see that logger in elastic thread do not have access to traceId:

2020-12-04 16:27:45.634  INFO [,32334ea865a2891f,32334ea865a2891f,true] 3747 --- [or-http-epoll-3] app.controllers.TraceIdController        : Traceid retrieval
2020-12-04 16:27:45.679  INFO [,,,] 3747 --- [      elastic-2] app.controllers.TraceIdController        : Traceid retrieval

So it makes sleuth pretty much useless when we are using slf4j logger and webflux.
In other words: In order to use sleuth in logging we would have to implement custom logger that would have subscriberContext as an argument, everytime that we want to log something?

The issue will be fixed for Webflux and Slf4j logger will work. That means that the fix is to use manual operators.

In Sleuth 2.x you'll have to put the span in scope manually. Example

package app.controllers;

import app.dtos.TraceIdDto;
import brave.Span;
import brave.Tracer;
import brave.propagation.CurrentTraceContext;
import brave.propagation.TraceContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private static final Logger log = LoggerFactory.getLogger(TraceIdController.class);

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    @Autowired
    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get() {
        String traceId = getTraceId().orElse(null);
        return Mono.defer(() -> Mono.subscriberContext()
                .map(ctx -> Optional.of(ctx.get(TraceContext.class)))
                .subscribeOn(Schedulers.elastic())
                .map(traceIdFromNewThread -> {
                    try (CurrentTraceContext.Scope scope = this.currentTraceContext.maybeScope(traceIdFromNewThread.orElse(null))) {
                        log.info("Hello");
                        return new TraceIdDto(traceId, getTraceId().orElse(null));
                    }
                }));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceIdString);
    }

}

Result:

2020-12-04 18:02:16.733  INFO [,cfaa55d92a45e52e,cfaa55d92a45e52e,true] 448629 --- [      elastic-2] app.controllers.TraceIdController        : Hello

In Sleuth 3.0.0 within the lambda we put the span in scope which means that we put it into MDC so you can log directly in lamba.

package app.controllers;

import app.dtos.TraceIdDto;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.sleuth.CurrentTraceContext;
import org.springframework.cloud.sleuth.Span;
import org.springframework.cloud.sleuth.TraceContext;
import org.springframework.cloud.sleuth.Tracer;
import org.springframework.cloud.sleuth.instrument.web.WebFluxSleuthOperators;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.server.ServerWebExchange;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Mono;
import reactor.core.scheduler.Schedulers;

import java.util.Optional;

@RestController
public class TraceIdController {

    private static final Logger log = LoggerFactory.getLogger(TraceIdController.class);

    private final Tracer tracer;

    private final CurrentTraceContext currentTraceContext;

    public TraceIdController(Tracer tracer, CurrentTraceContext currentTraceContext) {
        this.tracer = tracer;
        this.currentTraceContext = currentTraceContext;
    }

    @GetMapping(value = "/api/v1/traceId/test")
    public Mono<TraceIdDto> get(ServerWebExchange serverWebExchange) {
        String traceId = WebFluxSleuthOperators.withSpanInScope(tracer, currentTraceContext, serverWebExchange, () -> {
            log.info("Hello1");
            return getTraceId().orElse(null);
        });
        return Mono.defer(Mono::subscriberContext)
                .subscribeOn(Schedulers.elastic())
                .map(context -> WebFluxSleuthOperators.withSpanInScope(context, () -> {
                    log.info("Hello2");
                    return new TraceIdDto(traceId, getTraceId().orElse(null));
                }));
    }

    private Optional<String> getTraceId() {
        return Optional.of(tracer)
                .map(Tracer::currentSpan)
                .map(Span::context)
                .map(TraceContext::traceId);
    }

}

and the logs

2020-12-04 18:05:02.936  INFO [,bcc5da9deecfc003,bcc5da9deecfc003] 449021 --- [or-http-epoll-4] app.controllers.TraceIdController        : Hello1
2020-12-04 18:05:02.955  INFO [,bcc5da9deecfc003,bcc5da9deecfc003] 449021 --- [      elastic-2] app.controllers.TraceIdController        : Hello2

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.
Anyway, thanks for fast and clear responses.

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.

You'll have to do that manually, sorry.

By "won't be fixed" - I meant that traceId won't propagate automatically between threads as it was working with Hoxton.SR2. Worrying about whether tracing context will propagate or not, during feature development is quite inconvenient.
Anyway, thanks for fast and clear responses.

@jacekmg Did you find any solution on this. I am facing same issue

@vinodnholkar I didn't find any new solutions, besides manual scope update that @marcingrzejszczak proposed.

Was this page helpful?
0 / 5 - 0 ratings