Spring-cloud-sleuth: Spring Cloud Gateway + Sleuth + Ilford = no traces in logs by default

Created on 3 Jan 2021  路  21Comments  路  Source: spring-cloud/spring-cloud-sleuth

Previously we have noted issues with Sleuth in the rector stack. Essentially the performance was so bad that it was unusable. With the release of Spring Cloud Sleuth (Ilford) the performance has improved a lot. I have updated benchmarks when running with Spring Cloud Gateway. See here. Clearly the manual instrumentation gives the best performance. But the manual instrumentation method requires that all the log invocations are rewritten inside a WebFluxSleuthOperators.withSpanInScope(). Recently the Sleuth project has changed the default instrumentation for Spring Cloud Gateway to be manual. See this commit. This means if you add sleuth as a dependency to spring cloud gateway you don't get the tracing int the logs by default. You would need to switch the instrumentation type to be "decorate-on-each" or "decorate-on-last". Is there a plan to rewrite all the logs to be compatible with manual instrumentation? Otherwise I don't know why "manual" was made the default.

question

Most helpful comment

The gateway is designed to be extensible. Users can (we did) add filters and predicates. These filters and predicates have error logging. But now they won't have the trace id unless we rewrite all the log statements. It's really a big change. OK, we can do it. But it wouldn't make sense unless we can also get the same changes in Spring Cloud Gateway itself.

Logging is done via slf4j with logback as the default implementation.

All 21 comments

Is there a plan to rewrite all the logs to be compatible with manual instrumentation?

There is no such plan.

Otherwise I don't know why "manual" was made the default.

Because of performance and due to the fact that when you use the manual mode and you, as a user, take care of explicitly wrapping the trace related aspects in operators, then the trace propagation actually works. The on each decoration option works most of the time, and on last works sometimes. When I say works I mean context propagation works.

Typically in the gateway you're reusing the existing filters, predicates etc. You don't need any specific logging. The reactive programming was never intended to work fine with thread locals and the current logging implementations in terms of MDC propagation works exactly in that fashion.

cc @rstoyanchev

As a user of Spring Cloud Gateway (with custom filters and predicates) with many customers in production I have to challenge this response.

take care of explicitly wrapping the trace related aspects in operators

But it means we need to rewrite all the logging statements and refactor it to pass the exchange. It's really a big change.

The on each decoration option works most of the time

Can you give an example of when it doesn't work?

Typically in the gateway you're reusing the existing filters, predicates etc. You don't need any specific logging.

This is just not true. When the Gateway logs an error we must have the trace ID available.
@spencergibb would you accept PRs to change the way we do logging in Spring Cloud Gateway so that it works with manual instrumentation?

Can you give an example of when it doesn't work?

Typically it doesn't work when you have a flatMap execution within the flux.

When the Gateway logs an error we must have the trace ID available.

So you're proving my point - some internal Gateway components are logging an error, right? It's not the case that the users should log it themselves. That means that it makes sense to make the Sleuth & Gateway interrop better for that matter. Let's see what @spencergibb says about it. Which Gateway's component is responsible for doing the logging?

The gateway is designed to be extensible. Users can (we did) add filters and predicates. These filters and predicates have error logging. But now they won't have the trace id unless we rewrite all the log statements. It's really a big change. OK, we can do it. But it wouldn't make sense unless we can also get the same changes in Spring Cloud Gateway itself.

Logging is done via slf4j with logback as the default implementation.

The gateway is designed to be extensible. Users can (we did) add filters and predicates. These filters and predicates have error logging. But now they won't have the trace id unless we rewrite all the log statements. It's really a big change. OK, we can do it. But it wouldn't make sense unless we can also get the same changes in Spring Cloud Gateway itself.

If you don't want to do it by hand just flip the switch to have the on each decoration (of course you'll observe the performance degradation).

Logging is done via slf4j with logback as the default implementation.

So you're referring to any logging done via the standard logger mechanism. Then indeed we would have to introduce some logging abstraction that for Sleuth would reference the ServerWebExchange to set the MDC entries.

Maybe we could introduce an interface in gateway that sleuth implements? @tony-clarke-amdocs clearly the performance is needed. Hooks into reactor are a pain. In the case of the gateway I think it's ok if the user needs to be careful about logging.

I was going to submit PR where we have a new interface in the gateway that has two implementations. One implementation will keep the logging as is now and the other will wrap with WebFluxSleuthOperators.withSpanInScope(). I didn't consider having the implementation in sleuth itself, but yeah, maybe that works too. I will submit something soon and we can evolve it.

@tony-clarke-amdocs either way. It would definitely be simpler if it was all in gateway. The user would have to define the bean or set a property. If the alternate impl was in sleuth, no user action would be required.

Lgtm. Once the interface is ready please file a new issue / pr here so that we implement the missing piece! For tracing we'd need to be able to access the ServerWebExchange AFAIR.

@marcingrzejszczak Here is the interface in a pending PR. wdyt?

LGTM - should work I guess

hey @tony-clarke-amdocs , do you have a sample that I could check out where you have custom filters / predicates / etc. that I could play around with to see if the latest decorate_queues feature wouldn't solve your problems without any additional work on the gateway side?

Alas, I don't have something on github that makes use of filters and predicates (that's all company IP). But I do think this could be helpful to measure performance of decorate_queues. Should be pretty simple to update it to check how decorate_queues performs...

If the performance was comparable to manual then would we just deprecate/remove the manual mode?

Correct however you had the issue not really with the performance cause manual mode worked fine. You had the problem with the fact that you didn't see the logging properly. I've seen the performance project that you linked but there were no custom components there that I could test out.

OK. I can test decorate_queues with an internal project and share with you outcome, but won't be able to share any code. Is that helpful?

Yes, that's helpful! Thank you.

I can confirm that some traces and spans are dropped with decorate-queues from the logs (even compared to decorate-on-last).
Sample output with decorate-on-last:

2021-03-17 17:57:43.793 TRACE [apigw,224bc785daaf57c1,224bc785daaf57c1] 1 --- [or-http-epoll-3] GatewayFilterFactory : AAA Value,BBB Value,CCC Value

The very same output with decorate-queues:

2021-03-17 18:16:42.537 TRACE [apigw,,] 1 --- [or-http-epoll-1] GatewayFilterFactory : AAA Value,BBB Value,CCC Value

What does this custom filter do? It mutates some http headers before the downstream service is invoked. In order to achieve this it needs to call two other services. The result of those two service invocations are processed by Mono.zip. This is where the log statement exists.

I went to start.spring.io and generated a new Gateway project (train 2020.0.2 and Boot 2.4.3). This is the code

package com.example.gateway;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import reactor.core.publisher.Mono;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.gateway.filter.GatewayFilter;
import org.springframework.cloud.gateway.filter.GatewayFilterChain;
import org.springframework.cloud.gateway.filter.GlobalFilter;
import org.springframework.cloud.gateway.handler.predicate.GatewayPredicate;
import org.springframework.cloud.gateway.route.RouteLocator;
import org.springframework.cloud.gateway.route.builder.RouteLocatorBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.core.Ordered;
import org.springframework.web.server.ServerWebExchange;

@SpringBootApplication
public class GatewayApplication {

    public static void main(String[] args) {
        SpringApplication.run(GatewayApplication.class, args);
    }

    @Bean
    RouteLocator myRouteLocator(RouteLocatorBuilder builder, CustomFilter customFilter,CustomPredicate customPredicate) {
        return builder.routes()
                .route("httpbin_route",
                        route -> route
                                .predicate(customPredicate)
                                .filters(f -> f.stripPrefix(1).filters(customFilter))
                                .uri("http://httpbin.org")
                ).build();
    }

    @Bean
    CustomGlobalFilter customGlobalFilter() {
        return new CustomGlobalFilter();
    }

    @Bean
    CustomFilter customFilter() {
        return new CustomFilter();
    }

    @Bean
    CustomPredicate customPredicate() {
        return new CustomPredicate();
    }

}

class CustomGlobalFilter implements GlobalFilter, Ordered {

    private static final Logger log = LoggerFactory.getLogger(CustomGlobalFilter.class);

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        log.info("custom global filter");
        return chain.filter(exchange);
    }

    @Override
    public int getOrder() {
        return -1;
    }
}

class CustomFilter implements GatewayFilter, Ordered {

    private static final Logger log = LoggerFactory.getLogger(CustomFilter.class);

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        log.info("custom filter");
        return chain.filter(exchange);
    }

    @Override
    public int getOrder() {
        return -1;
    }
}

class CustomPredicate implements GatewayPredicate, Ordered {

    private static final Logger log = LoggerFactory.getLogger(CustomPredicate.class);


    @Override
    public int getOrder() {
        return -1;
    }

    @Override
    public boolean test(ServerWebExchange exchange) {
        log.info("custom predicate");
        return true;
    }
}

this are the properties

spring.sleuth.reactor.instrumentation-type=decorate_queues

when I executed this request

$ http :8080/httpbin/uuid

I got the following output

2021-03-18 14:12:56.602 DEBUG [,,] 119587 --- [or-http-epoll-2] o.s.c.s.instrument.web.TraceWebFilter    : Received a request to uri [/httpbin/uuid]
2021-03-18 14:12:56.605 DEBUG [,,] 119587 --- [or-http-epoll-2] o.s.c.s.instrument.web.TraceWebFilter    : Handled receive of span NoopSpan(0f8f5e4326ea1af9/0f8f5e4326ea1af9)
2021-03-18 14:12:56.615  INFO [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] com.example.gateway.CustomPredicate      : custom predicate
2021-03-18 14:12:56.615 DEBUG [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] o.s.c.g.h.RoutePredicateHandlerMapping   : Route matched: httpbin_route
2021-03-18 14:12:56.617 DEBUG [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] o.s.c.g.h.RoutePredicateHandlerMapping   : Mapping [Exchange: GET http://localhost:8080/httpbin/uuid] to Route{id='httpbin_route', uri=http://httpbin.org:80, order=0, predicate=com.example.gateway.CustomPredicate@669fa836, gatewayFilters=[[[StripPrefix parts = 1], order = 0], com.example.gateway.CustomFilter@6a3ef6d0], metadata={}}
2021-03-18 14:12:56.617 DEBUG [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] o.s.c.g.h.RoutePredicateHandlerMapping   : [03b9c717-1] Mapped to org.springframework.cloud.gateway.handler.FilteringWebHandler@6637802c
2021-03-18 14:12:56.617 DEBUG [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] o.s.c.g.handler.FilteringWebHandler      : Sorted gatewayFilterFactories: [[GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.RemoveCachedBodyFilter@798256c5}, order = -2147483648], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.AdaptCachedBodyGlobalFilter@1c3259fa}, order = -2147482648], [GatewayFilterAdapter{delegate=com.example.gateway.CustomGlobalFilter@1943c1f2}, order = -1], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.NettyWriteResponseFilter@1e1eeedd}, order = -1], com.example.gateway.CustomFilter@6a3ef6d0, [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.ForwardPathFilter@1d8b0500}, order = 0], [[StripPrefix parts = 1], order = 0], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.RouteToRequestUrlFilter@7ec95456}, order = 10000], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.config.GatewayNoLoadBalancerClientAutoConfiguration$NoLoadBalancerClientFilter@428bdd72}, order = 10150], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.WebsocketRoutingFilter@76544c0a}, order = 2147483646], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.NettyRoutingFilter@d70f722}, order = 2147483647], [GatewayFilterAdapter{delegate=org.springframework.cloud.gateway.filter.ForwardRoutingFilter@748a654a}, order = 2147483647]]
2021-03-18 14:12:56.619  INFO [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] com.example.gateway.CustomGlobalFilter   : custom global filter
2021-03-18 14:12:56.619  INFO [,0f8f5e4326ea1af9,0f8f5e4326ea1af9] 119587 --- [or-http-epoll-2] com.example.gateway.CustomFilter         : custom filter
2021-03-18 14:12:56.947 DEBUG [,,] 119587 --- [or-http-epoll-4] ientBeanPostProcessor$TracingDoOnRequest : Handled send of the netty client span [NoopSpan(0f8f5e4326ea1af9/20e50948cd107e77)] with parent [0f8f5e4326ea1af9/0f8f5e4326ea1af9]
2021-03-18 14:12:57.133 DEBUG [,,] 119587 --- [or-http-epoll-4] PostProcessor$AbstractTracingDoOnHandler : Handle receive of the netty client span [NoopSpan(0f8f5e4326ea1af9/20e50948cd107e77)]
2021-03-18 14:12:57.152 DEBUG [,,] 119587 --- [or-http-epoll-2] o.s.c.s.instrument.web.TraceWebFilter    : Handled send of NoopSpan(0f8f5e4326ea1af9/0f8f5e4326ea1af9)

As you can see all Gateway related components have trace and span id.

I appologize but I can't replicate this issue. Also I've managed to achieve the same result when using the standard MANUAL mode.

Yes, your example isn't sufficient to illustrate the problem. But for sure, decorate-queues is dropping trace id and span id in some situations. The only change I made was to switch from from decorate-on-last to decorate-queues and some of the logs lost the trace and span ids.

I'll need a proper example that illustrates the problem

I understand, but I cannot share company code. It's just an FYI to let you know that there is an issue with decorate-queues instrumentation mode. I wish I had time to create a standalone case to illustrate it...but I am unlikely to get the bandwidth to do that anytime soon. I really would prefer to get decorate-queues working because the manual mode is so intrusive.

Was this page helpful?
0 / 5 - 0 ratings