Spring-cloud-netflix: Retry with Ribbon & Feign client doesn't work sometimes. Throws error "LoadBalancer [service-name]: Error choosing server for key null"

Created on 2 Mar 2018  路  10Comments  路  Source: spring-cloud/spring-cloud-netflix

Using Spring Boot 2.0.0.RC2 & Cloud 2.0.0.M7. Having following setup,

Zuul Gateway - Replica 1
Edge service - Replica 2
Micro service - Replica 2

Following are the configurations,

ribbon:
  ConnectTimeout: 5000
  ReadTimeout: 10000
  MaxAutoRetries: 0
  MaxAutoRetriesNextServer: 2
  retryableStatusCodes: 404,502,504

hystrix:
  shareSecurityContext: true

feign:
  hystrix:
    enabled: false

health.config.enabled: false
spring.cloud.loadbalancer.retry.enabled: true

zuul:
  retryable: true
  sensitiveHeaders: Cookie
  ignoredServices: '*'
  ribbon:
    eager-load:
      enabled: true
  routes:
    user-service:
      path: /users/**
      stripPrefix: true
    product-edge:
      path: /products/**
      stripPrefix: true



md5-6895d1bc02fa0b8669201367e0fc1c24



java.lang.IndexOutOfBoundsException: index (2) must be less than size (2)
    at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:310)
    at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:292)
    at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:65)
    at com.netflix.loadbalancer.AbstractServerPredicate.chooseRoundRobinAfterFiltering(AbstractServerPredicate.java:203)
    at com.netflix.loadbalancer.PredicateBasedRule.choose(PredicateBasedRule.java:45)
    at com.netflix.loadbalancer.BaseLoadBalancer.chooseServer(BaseLoadBalancer.java:736)
    at com.netflix.loadbalancer.ZoneAwareLoadBalancer.chooseServer(ZoneAwareLoadBalancer.java:113)
    at com.netflix.loadbalancer.LoadBalancerContext.getServerFromLoadBalancer(LoadBalancerContext.java:481)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:184)
    at com.netflix.loadbalancer.reactive.LoadBalancerCommand$1.call(LoadBalancerCommand.java:180)
    at rx.Observable.unsafeSubscribe(Observable.java:10151)
    at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:94)
    at rx.internal.operators.OnSubscribeConcatMap.call(OnSubscribeConcatMap.java:42)
    at rx.Observable.unsafeSubscribe(Observable.java:10151)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber$1.call(OperatorRetryWithPredicate.java:127)
    at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.enqueue(TrampolineScheduler.java:73)
    at rx.internal.schedulers.TrampolineScheduler$InnerCurrentThreadScheduler.schedule(TrampolineScheduler.java:52)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:79)
    at rx.internal.operators.OperatorRetryWithPredicate$SourceSubscriber.onNext(OperatorRetryWithPredicate.java:45)
    at rx.internal.util.ScalarSynchronousObservable$WeakSingleProducer.request(ScalarSynchronousObservable.java:276)
    at rx.Subscriber.setProducer(Subscriber.java:209)
    at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:138)
    at rx.internal.util.ScalarSynchronousObservable$JustOnSubscribe.call(ScalarSynchronousObservable.java:129)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.subscribe(Observable.java:10247)

Getting these exceptions frequently, so retry logic is breaking. MaxAutoRetriesNextServer: 2, but available servers are only 2, not 3. So this case the bounds have to be checked before execution.

Most helpful comment

I am seeing this exact issue with Boot 1.5.10 and Edgware (RELEASE, SR1, SR2) too. I found this merged PR in Ribbon which seems to fix it.

It is a problem introduced fairly recently, in the 2.2.4 release of Ribbon, which is used by spring-cloud-netflix 1.4.x. There is no release > 2.2.4 yet, so downgrading to Dalston was the only viable option for me. Once the next release of Ribbon is out, this will be an easy fix.

All 10 comments

We are going to need more information. To start with nothing in that stacktrace points to anything in Spring Cloud Netflix. Is there more in the logs that would point to some code in Spring Cloud Netflix?

The same problem, spring cloud M8.

2745

Apologies it took some time to get back. Here is the code repo which I have tried. Here is the complete log of Zuul gateway.

Once you setup the services config, registry, gateway & edge (micro is not needed now), run the project breaker. It will call the services and service will simulate timeout, and this error appears.

I am seeing this exact issue with Boot 1.5.10 and Edgware (RELEASE, SR1, SR2) too. I found this merged PR in Ribbon which seems to fix it.

It is a problem introduced fairly recently, in the 2.2.4 release of Ribbon, which is used by spring-cloud-netflix 1.4.x. There is no release > 2.2.4 yet, so downgrading to Dalston was the only viable option for me. Once the next release of Ribbon is out, this will be an easy fix.

Thanks @grelland.

@spencergibb do you think we should downgrade to 2.2.3? Doesnt look like the bug is in that release.

sure, we can ask for a release as well.

This suggest the quick solution to disable the ribbon's circuit breaker.
niws:
loadbalancer:
availabilityFilteringRule:
filterCircuitTripped: false # defaults to true

https://yangdongdong.org/2017/12/31/spring-cloud-feign/

@spencergibb, @ryanjbaxter Is it right way to go for now ?

I am not sure if that property will make a difference or not. You can try overriding the ribbon dependencies in your applications POM to use version 2.2.3.

@ryanjbaxter, @spencergibb we have the same issue. Can you please help me, how do I override ribbon dependency in POM to use version 2.2.3? I do not have an explicit dependency for Ribbon in POM. Also, the latest version for Ribbon I see in Maven is 2.0.1. Do I nee this in API GAteway level?

Was this page helpful?
0 / 5 - 0 ratings