I'm using Spring Cloud Gateway 2.1.2.RELEASE ( Greenwich.SR2 ) along with a Hystrix Filter and I'm running a series of unit tests in order to verify the behavior of the circuit breaker mechanism.
In case the back-end system does not respond in time (see the property: execution.isolation.thread.timeoutInMilliseconds), then the HTTP status dispatched by the Gateway is 504 (as expected).
In case multiple errors of this kind occur, then the circuit is broken (also as expected) but the HTTP status dispatched by the Gateway is 500 (while one would expect 503).
Is there any configuration we are missing here?
I think this is a tricky situation as to what the right status code should be. I can see a 503 indicating that the downstream service being unavailable, however I could see it also meaning the gateway itself if unavailable.
Thank you @ryanjbaxter for the prompt response!
Let me also add to the discussion that there is nothing wrong with the Gateway itself; it's just one route (out of dozen registered ones) which received multiple timeout errors from a single back-end service.
A response with HTTP 500 should prevent the original service consumer from retrying, while a response with HTTP 503 should indicate that a retry is feasible after a while.
On a side note, are you aware whether HTTP 500 is coming straight from Hystrix or is that how the Gateway interprets the com.netflix.hystrix.exception.HystrixRuntimeException ?
I understand, my gut says we should return a 503 but I want to hear what my teammates think as well.
On a side note, are you aware whether HTTP 500 is coming straight from Hystrix or is that how the Gateway interprets the com.netflix.hystrix.exception.HystrixRuntimeException ?
Not off the top of my head
If it helps, below you may find the stacktrace:
Daemon Thread [HystrixTimer-1] (Suspended (breakpoint at line 39 in HystrixRuntimeException))
HystrixRuntimeException.<init>(FailureType, Class<HystrixInvokable>, String, Exception, Throwable) line: 39
HystrixGatewayFilterFactory$RouteHystrixCommand(AbstractCommand<R>).handleFallbackDisabledByEmittingError(Exception, FailureType, String) line: 1052
HystrixGatewayFilterFactory$RouteHystrixCommand(AbstractCommand<R>).getFallbackOrThrowException(AbstractCommand<R>, HystrixEventType, FailureType, String, Exception) line: 878
HystrixGatewayFilterFactory$RouteHystrixCommand(AbstractCommand<R>).handleTimeoutViaFallback() line: 997
AbstractCommand<R>.access$500(AbstractCommand) line: 60
AbstractCommand$12.call(Throwable) line: 609
AbstractCommand$12.call(Object) line: 601
OperatorOnErrorResumeNextViaFunction$4.onError(Throwable) line: 140
OnSubscribeDoOnEach$DoOnEachSubscriber<T>.onError(Throwable) line: 87
OnSubscribeDoOnEach$DoOnEachSubscriber<T>.onError(Throwable) line: 87
AbstractCommand$HystrixObservableTimeoutOperator$1.run() line: 1142
HystrixContextRunnable$1.call() line: 41
HystrixContextRunnable$1.call() line: 37
HystrixContextRunnable.run() line: 57
AbstractCommand$HystrixObservableTimeoutOperator$2.tick() line: 1159
HystrixTimer$1.run() line: 99
Executors$RunnableAdapter<T>.call() line: 511
ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(FutureTask<V>).runAndReset() line: 308
ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.access$301(ScheduledThreadPoolExecutor$ScheduledFutureTask) line: 180
ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 294
ScheduledThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1149
ThreadPoolExecutor$Worker.run() line: 624
Thread.run() line: 748
Note: I'm using the latest version of Hystrix (i.e. 1.5.18).
I believe the issue is with regards to org.springframework.cloud.gateway.filter.factory.HystrixGatewayFilterFactory.
Specifically, the switch statement handles the SHORTCIRCUIT failureType as a generic/default case.
One solution here could be something along the following lines:
switch (failureType) {
case TIMEOUT:
return Mono.error(new TimeoutException());
case SHORTCIRCUIT:
return Mono.error(new ServiceUnavailableException());
Obviously, the class ServiceUnavailableException does not exist but it should be easy to create one based on the concept of the existing org.springframework.cloud.gateway.support.TimeoutException.
Any thoughts?
Thanks, first I want others to chime in on whether returning a 503 is right.
@spencergibb @TYsewyn @OlgaMaciaszek any thoughts?
IMO it does make sense to return 503 in such cases, and AFAIK we have the failure type SHORT_CIRCUITED for that.
That new exception would indeed look like the TimeoutException.
Thumbs up from me! 馃憤
EDIT: We should also look at the Retry-After HTTP header. If this is not passed to the browser in a response then the browser will - in most cases, if not all - just handle the 503 like it鈥檚 a 500 error.
There's already a switch to determine if it is short-circuited where a 503 could be returned.
PRs welcome
@spencergibb, @ryanjbaxter : PR created ( https://github.com/spring-cloud/spring-cloud-gateway/pull/1230 )