Spring-cloud-sleuth: Zuul : When error occurs, traceId is reused in the next request on same tomcat worker thread with Dalston Release

Created on 11 Jul 2017  路  2Comments  路  Source: spring-cloud/spring-cloud-sleuth

Hi, there seems to be a problem with zuul when error occurs during zuul's routing. It occurs on Spring Cloud Dalston.SR1 release. There are two symptoms.

First, when using Zuul with spring cloud sleuth on Dalston, if any error occurs during routing, span is not closed properly.

2017-07-11 11:15:30.471 DEBUG [zuul-sleuth-problem,,,] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : Received a request to uri [/display/ddd] that should not be sampled [false]
2017-07-11 11:15:30.474 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : No parent span present - creating a new span
2017-07-11 11:15:30.480 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.n.zuul.filters.SimpleRouteLocator  : route matched=ZuulProperties.ZuulRoute(id=dp, path=/display/**, serviceId=null, url=http://localhost:9987/unknown, stripPrefix=true, retryable=null, sensitiveHeaders=[], customSensitiveHeaders=false)
2017-07-11 11:15:30.480  INFO [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.n.zuul.web.ZuulHandlerMapping      : Mapped URL path [/display/**] onto handler of type [class org.springframework.cloud.netflix.zuul.web.ZuulController]
2017-07-11 11:15:30.502 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.zuul.TracePreZuulFilter        : Current span is [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:30.503 DEBUG [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.zuul.TracePreZuulFilter        : New Zuul Span is [Trace: 27f2f52479630c95, Span: 0bec79abcfe9806a, Parent: 27f2f52479630c95, exportable:false]
2017-07-11 11:15:30.503 DEBUG [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.zuul.TracePreZuulFilter        : Result of Zuul filter is [SUCCESS]
2017-07-11 11:15:30.503 DEBUG [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.n.zuul.filters.SimpleRouteLocator  : Finding route for path: /display/ddd
   :
2017-07-11 11:15:30.510 DEBUG [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.n.z.f.r.SimpleHostRoutingFilter    : /unknown/ddd
2017-07-11 11:15:30.510 DEBUG [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.n.z.f.r.SimpleHostRoutingFilter    : localhost 9987 http
2017-07-11 11:15:34.544  WARN [zuul-sleuth-problem,27f2f52479630c95,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.n.z.filters.post.SendErrorFilter   : Error during filtering

com.netflix.zuul.exception.ZuulException: Connect to localhost:9987 [localhost/10.202.32.233, localhost/10.211.55.2, localhost/10.37.129.2, localhost/fe80:0:0:0:9e:b0b6:43e5:1e18%4, localhost/fe80:0:0:0:7dc:c095:41cc:cfd1%11, localhost/fe80:0:0:0:aff3:8c93:1dc8:6d59%14] failed: connect timed out
   :

2017-07-11 11:15:34.548 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : There has already been a span in the request [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:34.548 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : Received a request to uri [/error] that should not be sampled [false]
2017-07-11 11:15:34.548 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : The span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false] was already detached once and we're processing an error
2017-07-11 11:15:34.555 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Handling span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:34.557 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Adding a method tag with value [errorHtml] to a span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:34.557 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Adding a class tag with value [BasicErrorController] to a span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:34.579 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Skipping closing of a span for error controller processing
2017-07-11 11:15:34.586 DEBUG [zuul-sleuth-problem,27f2f52479630c95,27f2f52479630c95,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.zuul.TracePostZuulFilter       : Closing current client span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false]
2017-07-11 11:15:34.588 DEBUG [zuul-sleuth-problem,,0bec79abcfe9806a,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : Won't detach the span [Trace: 27f2f52479630c95, Span: 27f2f52479630c95, Parent: null, exportable:false] since error has already been handled

In the above logs, zuul failed to route in it's router and then TraceFilter and TracePostZuulFilter look like handing closing spans improperly. As you can see the last line, [zuul-sleuth-problem,,0bec79abcfe9806a,false] traceid is gone away.

I saw the related the issue (https://github.com/spring-cloud/spring-cloud-sleuth/issues/563) and the fix(https://github.com/spring-cloud/spring-cloud-sleuth/pull/568). But the similar problem still occurs with this fix on spring cloud Dalston.SR1.

The one big difference betwen previous release and Dalston release is that SendErrorFilter was post filter in previous release (Camden), but from Dalston release, SendErrorFilter is error filter. Therefore, the executing order of TracePostZuulFilter and SendErrorFilter seems to be different in two releases. In Dalston release, SendErrorFilter is executed before TracePostZuulFilter. Therefore the forwarded request /error is also handled before TracePostZuulFilter is executed. I'm not sure that this difference makes the problem again, but it seems to be related.

The second symptom - more important one - is that traceid (span) seem to be remained in the worker thread. So if the following request arrives on the same worker thread after error occurs, the same traceid is used. So trace is continued on all following requests that are being executed on same tomcat worker thread.

2017-07-11 11:16:49.001 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : There has already been a span in the request [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false]
2017-07-11 11:16:49.001 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : Received a request to uri [/error] that should not be sampled [false]
2017-07-11 11:16:49.001 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.sleuth.instrument.web.TraceFilter  : The span [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false] was already detached once and we're processing an error
2017-07-11 11:16:49.003 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Handling span [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false]
2017-07-11 11:16:49.003 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Adding a method tag with value [errorHtml] to a span [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false]
2017-07-11 11:16:49.003 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Adding a class tag with value [BasicErrorController] to a span [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false]
2017-07-11 11:16:49.004 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.web.TraceHandlerInterceptor    : Skipping closing of a span for error controller processing
2017-07-11 11:16:49.005 DEBUG [zuul-sleuth-problem,27f2f52479630c95,5c00090333cf55ab,false] 6481 --- [nio-8765-exec-1] o.s.c.s.i.zuul.TracePostZuulFilter       : Closing current client span [Trace: 27f2f52479630c95, Span: 5c00090333cf55ab, Parent: 0bec79abcfe9806a, exportable:false]

As you can see in the above logs, TrancId 27f2f52479630c95 is used AGAIN for totally different request if the same woker thread (nio-8765-exec-1) is used. It makes our zipkin dashboard show many different requests are collected into the one trace.

I've uploaded the code to reproduce the symptom easily here(https://github.com/yongsungyoon/zuul-sleuth-problem).

If you invoke http://localhost:8765/display/ddd and it will make exception in routing fiilter and then you can see the logs about first symptom.

If you invoke same request more than 10 times (the number of tomcat worker thread looks like 10 in dev environment), you can see the SAME trace id on the logs whenever the same worker thread is used.

I'm using sleuth-core:1.2.1.RELEASE and netflix-core:1.3.1.RELASE that are used in Dalston.SR1.

Thank you in advance.

bug

Most helpful comment

Thanks a lot for a fantastic bug description and sample. You don't even know how easier my job was thanks to this. Thanks again!

All 2 comments

Thanks a lot for a fantastic bug description and sample. You don't even know how easier my job was thanks to this. Thanks again!

You're welcome. Thank you for your quick fix.

Was this page helpful?
0 / 5 - 0 ratings