Spring-cloud-sleuth: Bug Report - WebClient still lose context in flatMap after exchange

Created on 26 Dec 2017  路  13Comments  路  Source: spring-cloud/spring-cloud-sleuth

There seems still be problem after the fix #764 .

flatmap works fine if there is not WebClient involved:

return Mono.just("").flatMap(s -> { 
  log.info("s1 {}", tracer.getCurrentSpan().toString());  // Not null and good
  return Mono.just(""); 
});

However inside the flatMap of WebClient, the context is lost (not tracing going on) and a new traceId is generated if there are subsequent outgoing requests:

return webClient.get().uri(url).exchange()
  .flatMap(r -> {
    // here no traceId nor spanId
    log.info("s1 {}", tracer.getCurrentSpan().toString());  // NullPointerException
    return Mono.just("");
  });

webClient is Autowired and the upstream can get the traceId in the first call.

Version:
spring-boot-starter-parent 2.0.0.M7
spring-cloud-sleuth 2.0.0.M5

bug

Most helpful comment

:+1: works perfectly. thank you.

All 13 comments

Could you please check this after the Brave migration #711 ?

I think I'm experiencing the same/ a similar problem (and it also might be related to https://github.com/spring-cloud/spring-cloud-sleuth/issues/850).

I'm using Spring Boot with Webflux RC01 and Spring Cloud Sleuth M6.

We use a WebClient to send requests (with flatMap after the exchange). The traceId (and I think the span to) is there until the WebClient gets the response after the exchange, a thread change appears and the traceId is gone.

Log:
2018-02-17 09:29:27.992 TRACE [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] o.s.c.s.log.Slf4jCurrentTraceContext : With parent: 711504781224515631 2018-02-17 09:29:27.992 TRACE [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] o.s.c.s.log.Slf4jCurrentTraceContext : With parent: 711504781224515631 2018-02-17 09:29:28.054 DEBUG [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] c.d.t.tex.handlers.TokenExchangeHandler : Processing of request started 2018-02-17 09:29:28.124 DEBUG [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] c.d.t.t.modules.ClassWithWebClient : Sending request with WebClient 2018-02-17 09:29:28.125 DEBUG [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] .s.c.s.i.w.c.TraceExchangeFilterFunction : Creating a client span for the WebClient 2018-02-17 09:29:28.167 TRACE [application-name,48221e9569d3b656,a09a95f85b3fe3e5,true] 13424 --- [ctor-http-nio-3] o.s.c.s.log.Slf4jCurrentTraceContext : With parent: 2588340607130962870 2018-02-17 09:29:28.221 DEBUG [application-name,48221e9569d3b656,23eba2118dc28bb6,true] 13424 --- [ctor-http-nio-3] .s.c.s.i.w.c.TraceExchangeFilterFunction : Reactor Context got injected with the client span RealSpan(48221e9569d3b656/a09a95f85b3fe3e5) 2018-02-17 09:29:28.709 TRACE [application-name,48221e9569d3b656,a09a95f85b3fe3e5,true] 13424 --- [ctor-http-nio-5] o.s.c.s.log.Slf4jCurrentTraceContext : With parent: 2588340607130962870 2018-02-17 09:29:28.792 TRACE [application-name,48221e9569d3b656,a09a95f85b3fe3e5,true] 13424 --- [ctor-http-nio-5] o.s.c.s.log.Slf4jCurrentTraceContext : With parent: 2588340607130962870 2018-02-17 09:29:28.849 DEBUG [application-name,,,] 13424 --- [ctor-http-nio-5] c.d.t.t.modules.ClassWithWebClient : Received response from WebClient
I'm not sure how this all works, but what I'm seeing is that in Slf4jCurrentTraceContext directly after the thread change in line 69 the previousTraceId is loaded from the MDC:
final String previousTraceId = MDC.get("traceId");
Through the MDC these ids seem to be stored thread local, and MDC.get returns null. Later in line 117 these null values are set in the MDC and the traceId in the MDC is overrwritten (and therefore not available for logging):
replace("traceId", previousTraceId);

@sbueringer can you double check if the problem is gone after fixing https://github.com/spring-cloud/spring-cloud-sleuth/issues/850 ?

Sorry for late confirmation however the problem is still there on 2.0.0.M8 .

And being more confused, i found that the spanId is never changing...Not the same behavior in old versions.

log.info("Outside");  // [b2cda26676756307,b2cda26676756307] [XNIO-2 task-1]
return Mono.just(new ResponseEntity<>("GOOD", HttpStatus.OK)).flatMap(ss->{
    log.info("Inside"); //[b2cda26676756307,b2cda26676756307] [XNIO-2 task-1]
    return webClient.get().uri("https://www.google.com").exchange().flatMap(s1->{
        log.info(s1.statusCode().toString()); //[b2cda26676756307,b2cda26676756307] [reactor-http-nio-4]
        return s1.bodyToMono(String.class).flatMap(html ->{
            log.info(String.valueOf(html.length())); // [,] [reactor-http-nio-4]
            return webClient.get().uri("https://www.google.com").exchange().flatMap(s2->{
                log.info(s2.statusCode().toString()); // [2ec715a23375dd4b,2ec715a23375dd4b] [reactor-http-nio-5]
                return s2.bodyToMono(String.class).flatMap(html2 ->{
                    log.info(String.valueOf(html2.length())); // [,] [reactor-http-nio-5]
                    return Mono.just(ss);
                });
            });
        });
    });
});

I can't replicate it with snapshots (I modified your test a little bit)

2018-03-16 10:24:47.691 DEBUG [TraceWebFluxTests,,,] 57738 --- [o-auto-1-exec-7] o.s.c.s.instrument.web.TraceWebFilter    : Received a request to uri [/withFlatMap2]
2018-03-16 10:24:47.691 DEBUG [TraceWebFluxTests,,,] 57738 --- [o-auto-1-exec-7] o.s.c.s.instrument.web.TraceWebFilter    : Handled receive of span RealSpan(b7e97a826a4ef451/b7e97a826a4ef451)
2018-03-16 10:24:47.693  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [o-auto-1-exec-7] o.s.c.s.i.reactor.sample.FlatMapTests    : withFlatMap2
2018-03-16 10:24:47.693  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [o-auto-1-exec-7] o.s.c.s.i.reactor.sample.FlatMapTests    : Outside
2018-03-16 10:24:47.695  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [o-auto-1-exec-7] o.s.c.s.i.reactor.sample.FlatMapTests    : Inside
2018-03-16 10:24:47.697 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [o-auto-1-exec-7] .s.c.s.i.w.c.TraceExchangeFilterFunction : Instrumenting WebClient call
2018-03-16 10:24:47.697 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [o-auto-1-exec-7] .s.c.s.i.w.c.TraceExchangeFilterFunction : Created a client span for the WebClient RealSpan(b7e97a826a4ef451/75481d02d86cfc51)
2018-03-16 10:24:47.700 DEBUG [TraceWebFluxTests,,,] 57738 --- [-auto-1-exec-10] o.s.c.s.instrument.web.TraceWebFilter    : Received a request to uri [/foo]
2018-03-16 10:24:47.700 DEBUG [TraceWebFluxTests,,,] 57738 --- [-auto-1-exec-10] o.s.c.s.instrument.web.TraceWebFilter    : Handled receive of span RealSpan(b7e97a826a4ef451/75481d02d86cfc51)
2018-03-16 10:24:47.701  INFO [TraceWebFluxTests,b7e97a826a4ef451,75481d02d86cfc51,true] 57738 --- [-auto-1-exec-10] o.s.c.s.i.reactor.sample.FlatMapTests    : foo
2018-03-16 10:24:47.702 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,75481d02d86cfc51,true] 57738 --- [-auto-1-exec-10] o.s.c.s.instrument.web.TraceWebFilter    : Found span in context RealSpan(b7e97a826a4ef451/75481d02d86cfc51)
2018-03-16 10:24:47.702 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,75481d02d86cfc51,true] 57738 --- [-auto-1-exec-10] o.s.c.s.instrument.web.TraceWebFilter    : Handled send of RealSpan(b7e97a826a4ef451/75481d02d86cfc51)
2018-03-16 10:24:47.702  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-3] o.s.c.s.i.reactor.sample.FlatMapTests    : First flat map
2018-03-16 10:24:47.704  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-3] o.s.c.s.i.reactor.sample.FlatMapTests    : Second flat map
2018-03-16 10:24:47.704 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-3] .s.c.s.i.w.c.TraceExchangeFilterFunction : Instrumenting WebClient call
2018-03-16 10:24:47.705 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-3] .s.c.s.i.w.c.TraceExchangeFilterFunction : Created a client span for the WebClient RealSpan(b7e97a826a4ef451/801e7f6e64c0787f)
2018-03-16 10:24:47.707 DEBUG [TraceWebFluxTests,,,] 57738 --- [o-auto-1-exec-9] o.s.c.s.instrument.web.TraceWebFilter    : Received a request to uri [/foo]
2018-03-16 10:24:47.707 DEBUG [TraceWebFluxTests,,,] 57738 --- [o-auto-1-exec-9] o.s.c.s.instrument.web.TraceWebFilter    : Handled receive of span RealSpan(b7e97a826a4ef451/801e7f6e64c0787f)
2018-03-16 10:24:47.708  INFO [TraceWebFluxTests,b7e97a826a4ef451,801e7f6e64c0787f,true] 57738 --- [o-auto-1-exec-9] o.s.c.s.i.reactor.sample.FlatMapTests    : foo
2018-03-16 10:24:47.709 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,801e7f6e64c0787f,true] 57738 --- [o-auto-1-exec-9] o.s.c.s.instrument.web.TraceWebFilter    : Found span in context RealSpan(b7e97a826a4ef451/801e7f6e64c0787f)
2018-03-16 10:24:47.709 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,801e7f6e64c0787f,true] 57738 --- [o-auto-1-exec-9] o.s.c.s.instrument.web.TraceWebFilter    : Handled send of RealSpan(b7e97a826a4ef451/801e7f6e64c0787f)
2018-03-16 10:24:47.709  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-6] o.s.c.s.i.reactor.sample.FlatMapTests    : Third flat map
2018-03-16 10:24:47.711  INFO [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-6] o.s.c.s.i.reactor.sample.FlatMapTests    : Fourth flat map
2018-03-16 10:24:47.712 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-6] o.s.c.s.instrument.web.TraceWebFilter    : Found span in context RealSpan(b7e97a826a4ef451/b7e97a826a4ef451)
2018-03-16 10:24:47.712 DEBUG [TraceWebFluxTests,b7e97a826a4ef451,b7e97a826a4ef451,true] 57738 --- [ctor-http-nio-6] o.s.c.s.instrument.web.TraceWebFilter    : Handled send of RealSpan(b7e97a826a4ef451/b7e97a826a4ef451)

Can you confirm that it's working fine with snapshots?

As for having the same span. That's completely deliberate. We are passing the same span unless you create a new one.

ping @latifrons ?

Closing due to lack of feedback

From my point of view logging from flatMap() or map() seems to be working in M8. But after switching to M9 span are exchanged any more as it seems.

Can you provide a sample?

Hi

Sorry for the delay. Here is a sample: https://github.com/joke/sleuth
This might be related to https://github.com/spring-cloud/spring-cloud-sleuth/issues/949

Fixed via https://github.com/spring-cloud/spring-cloud-sleuth/commit/51abe7bba75a9dc9943d8dbbbb8987fcf68dd97b

@joke Can you double check that the snapshots work fine?

:+1: works perfectly. thank you.

Was this page helpful?
0 / 5 - 0 ratings