I posted this on Stack Overflow but didn't receive any attention to it, so I apologize if this is just a dumb configuration issue on my end. I can post more of my code, but I tried to simplify it to the essential bits.
I've just started trying to use Spring Boot with spring-cloud-starter-feign, ribbon, Hystrix, and consul-discovery.
I believe I have everything working as expected except for the very first request, after my application starts up, to a given @FeignClient annotated service will always fail with a Hystrix exception saying that the request has timed out. From what I can tell it seems as though the server list which consul-discovery should be populating only gets initialized _after_ the first request. It looks like the DynamicServerListLoadBalancer.restOfInit method does not get called when this bean is instantiated. Here's some debugging output when this happens, note that the request actually does run, but after the Timeout has happened.
Also, I have set the feign.hystrix.enabled=false configuration and then everything does work OK on the first request as the request isn't wrapped in a Hystrix command. Maybe Hystrix is just timing out too quickly? I can't quite tell what's going on with it.
2016-01-12 21:37:08.922 INFO 22151 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization started
2016-01-12 21:37:08.952 INFO 22151 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 30 ms
2016-01-12 21:38:22.830 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] ---> GET http://routing/RoutingService/api/v1/Accounts/100 HTTP/1.1
2016-01-12 21:38:22.831 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] ---> END HTTP (0-byte body)
2016-01-12 21:38:22.836 INFO 22151 --- [strix-routing-1] s.c.a.AnnotationConfigApplicationContext : Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@537a7631: startup date [Tue Jan 12 21:38:22 CST 2016]; parent: org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@115ca7de
2016-01-12 21:38:22.944 INFO 22151 --- [strix-routing-1] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2016-01-12 21:38:23.450 INFO 22151 --- [strix-routing-1] c.netflix.config.ChainedDynamicProperty : Flipping property: routing.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2016-01-12 21:38:23.510 INFO 22151 --- [strix-routing-1] c.n.u.concurrent.ShutdownEnabledTimer : Shutdown hook installed for: NFLoadBalancer-PingTimer-routing
2016-01-12 21:38:23.633 INFO 22151 --- [strix-routing-1] c.netflix.loadbalancer.BaseLoadBalancer : Client:routing instantiated a LoadBalancer:DynamicServerListLoadBalancer:{NFLoadBalancer:name=routing,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2016-01-12 21:38:23.868 ERROR 22151 --- [nio-8080-exec-9] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.netflix.hystrix.exception.HystrixRuntimeException: account timed-out and no fallback available.] with root cause
java.util.concurrent.TimeoutException: null
at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:596) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.AbstractCommand$9.call(AbstractCommand.java:577) ~[hystrix-core-1.4.21.jar:1.4.21]
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$1.onError(OperatorOnErrorResumeNextViaFunction.java:99) ~[rxjava-1.0.14.jar:1.0.14]
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70) ~[rxjava-1.0.14.jar:1.0.14]
at rx.internal.operators.OperatorDoOnEach$1.onError(OperatorDoOnEach.java:70) ~[rxjava-1.0.14.jar:1.0.14]
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$1.run(AbstractCommand.java:951) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:41) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable$1.call(HystrixContextRunnable.java:37) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.strategy.concurrency.HystrixContextRunnable.run(HystrixContextRunnable.java:57) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.AbstractCommand$HystrixObservableTimeoutOperator$2.tick(AbstractCommand.java:971) ~[hystrix-core-1.4.21.jar:1.4.21]
at com.netflix.hystrix.util.HystrixTimer$1.run(HystrixTimer.java:98) ~[hystrix-core-1.4.21.jar:1.4.21]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_66]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_66]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_66]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
2016-01-12 21:38:24.527 INFO 22151 --- [strix-routing-1] c.netflix.config.ChainedDynamicProperty : Flipping property: routing.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2016-01-12 21:38:24.538 INFO 22151 --- [strix-routing-1] c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client routing initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=routing,current list of Servers=[192.168.34.248:80, 192.168.34.249:80],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:2; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.34.249:80; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Wed Dec 31 18:00:00 CST 1969; First connection made: Wed Dec 31 18:00:00 CST 1969; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
, [Server:192.168.34.248:80; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Wed Dec 31 18:00:00 CST 1969; First connection made: Wed Dec 31 18:00:00 CST 1969; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:org.springframework.cloud.consul.discovery.ConsulServerList@261dac73
2016-01-12 21:38:24.991 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] <--- HTTP/1.1 200 OK (2160ms)
2016-01-12 21:38:24.991 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] Server: Microsoft-IIS/8.5
2016-01-12 21:38:24.992 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account]
2016-01-12 21:38:25.003 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] {
"id": "100"
}
2016-01-12 21:38:25.003 DEBUG 22151 --- [strix-routing-1] c.surescripts.discovery.RoutingService : [RoutingService#account] <--- END HTTP (840-byte body)
2016-01-12 21:38:25.008 INFO 22151 --- [ool-11-thread-1] c.netflix.config.ChainedDynamicProperty : Flipping property: routing.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
Here's my Feign Client:
@FeignClient(value = "routing", configuration = RoutingServiceConfiguration)
interface RoutingService {
@RequestMapping(
method = RequestMethod.GET,
value = "/RoutingService/api/v1/Accounts/{id}"
)
Account account(@PathVariable("id") String id)
}
Looks like maybe a timeout to me (1s?). Did you play with the various config options (look them up in Netflix wikis for Ribbon and Hystrix).
You also didn't show your custom client config. Is that doing anything complicated?
Hi,
I'm seeing the same behavior when I switch from using url property in FeignClient to name/value property for accessing service registered with Eureka.
This does not feel like an actual timeout issue, the error shows up right away.
Maybe there is a property that's not set (like timeout value) that one of components is looking for?
I'm only seeing this behavior when name/value is set to service id of a service registered in Eureka.
Also I get this error for all requests, not just the first one.
That's the same behavior I noticed, just times out instantly (or seemingly so). I haven't had a chance to dig more in to it yet, but I will this weekend.
Adding this property:
hystrix.command.default.execution.timeout.enabled=false
Blocks the call (no more TimeoutException: null errors) until socket timeout expires (SocketTimeoutException: connect timed out).
It looks like Feign client cannot get in touch with Eureka server?
Exception occurs in feign.Client$Default.convertResponse function (Client.java:143, feign-core-8.1.1.jar).
I enabled debugging for Feign client with this configuration change:
import feign.Logger;
@Configuration
public class FooConfiguration {
@Bean
public Logger.Level feignLoggerLevel() {
return Logger.Level.FULL;
}
}
It looks like request being built is using service id as host name:
[MyClient#getDataByIds] ---> GET http://abc/data/ids?ids=1 HTTP/1.1
END HTTP (0-byte body)
Where "abc" is provided as FeignClient annotation param:
@FeignClient("abc") or @FeignClient(value="abc")
If FeignClient annotation has a distinct "url" param why is it trying to use "value" as url?
Maybe this is a dependency issue, i.e. Feign client does not know it's trying to talk to Eureka?
Without a project reproducing the problem I can't be sure, but generally that means eureka isn't enabled or included as a dependency.
@spencergibb I should be able to get a simple project out in GitHub this weekend. I'm sure I have something configured incorrectly.
The OP said he was using consul, but I agree with @spencergibb (discovery is disabled, or the wrong client).
The strange part is that it's only the very first request after the app starts up that fails, and it does actually complete normally, just after Hystrix kills it. All subsequent requests work just fine.
I'm adding this just to document this scenario:
Message "Load balancer does not have available server for client: def" shows up when @FeignClient("def") is used. Tag "def" does not correspond to a valid service id registered with Eureka.
When tag is changed to "abc" (a valid service id) timeout occurs.
@spencergibb Could you help me understand something, when this debug message is printed by Feign client:
[MyClient#getDataByIds] ---> GET http://abc/data/ids?ids=1 HTTP/1.1
Is it before Feign client gets service info from Eureka or after (if it contacts Eureka at all)?
Never mind, I think this is indeed a discovery client problem (or lack of discovery client).
Both error messages (about load balancers and Hystrix timeout) are coming from Ribbon (which is internally using Hystrix commands) after final service uri is resolved (although incorrectly).
So the key thing is to figure out why Feign client is not pulling service uri (based on service id) from Eureka.
The thing that puzzles me most is the fact that enclosing service (the one using Feign client) can register itself with Eureka server without any problems, at least in my case.
@macheeto this issue you're commenting on isn't related to yours.
@spencergibb If both are caused by Feign not using Eureka client then they are related?
@macheeto this is is specifically consul, not eureka. And this issue only deals with the 1st request, after that they work fine.
@spencergibb After a bunch more experimentation with a simplified client, I believe everything really is working OK, and the initial request just takes longer than 1 second, which is the default hystrix timeout. I'm guessing some framework related code lazily does certain tasks so the initial request really does just take longer so the timeout is valid. I apologize for the noise. Thanks for the help!
@jjathman how did you solve this. I am getting the same error
Tried bunch of config options but none seem to make any difference. First few requests error out and then things get restored. I cant afford for any requests to fail hence have to find a way to solve this.
feign:
histrix:
enabled: false
hystrix:
command:
choose:
default:
execution:
timeout:
enabled: false
related issue https://github.com/spring-cloud/spring-cloud-netflix/issues/384 "Feign: trigger early initialization of Feign clients"
@gamefundas have you been able to resolve it?
@mkostin @gamefundas I guess that it's a related question http://stackoverflow.com/questions/32113671/zuulproxy-fails-with-ribboncommand-timed-out-and-no-fallback-available-when-it
@dsyer @jjathman @spencergibb hi,Excuse me, I meet the same problem with you.And I use consul and feign client,too. I know that the first request is slow.But when I improve the configure of time out, it still fails in the first time, do you know the reason?
this is my configure of time out:
@Configuration
public class FeignConfigure {
public static int connectTimeOutMillis = 12000;
public static int readTimeOutMillis = 12000;
@Bean
public Request.Options options() {
return new Request.Options(connectTimeOutMillis, readTimeOutMillis);
}
}
Most helpful comment
@jjathman how did you solve this. I am getting the same error
Tried bunch of config options but none seem to make any difference. First few requests error out and then things get restored. I cant afford for any requests to fail hence have to find a way to solve this.