Elasticsearch: REST Client should retry on 429

Created on 27 Oct 2016  路  6Comments  路  Source: elastic/elasticsearch

I haven't tested, but my understanding of the code makes me think that HTTP 429 is not retried automatically by the RestClient. I feel it should be retried automatically.

Citation: https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java#L312-L313

Context: I am evaluating the new REST Java client for use in the Logstash Elasticsearch output

Most helpful comment

The client failover mechanism is more about failing fast and in a controlled/timely manner then it is about forcing requests to succeed.

https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/falling-over.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/max-retries.html

Which is orthogonal to logstash needs, which is fine. As we discussed this year, the difference in status handling between beats+logstash and the clients is a good thing :)

From a clients perspective:

429 can be argued both ways, but in many cases it simply amplifies the load in cases where its less desirable. It could be a queue bounce, in which case we _might_ have better luck on a different node but we could also just amplify the load on the cluster by failing over. e.g each single request ends up being N requests as we attempt to fail over fast, registering more and more requests in the queues.

That said:

429 would be a prime candidate for an incremental backoff retry mechanism but we currently have no such feature in the clients, should we? Some of us already have bulk helpers that support exponential retries of individual 429 bulk item failures.

All 6 comments

@jordansissel the line you linked to removes the current host form the list of "active" ones so it won't be used again for a while. Then the request is retried. See comment as well:

 //mark host dead and retry against next one

I presume you had some reason to think things are going wrong?

Ahh, you are right. I misread the code.

(I see a 429, I think, not being a retryStatus value, will mark the host alive and fire onDefinitiveFailure. I can use this to do my own retries on 429s)

429 is not a retry status. I think this is the same for all of our language clients though. @jordansissel did you mean that this should change? If so, @elastic/es-clients may want to comment on that.

The client failover mechanism is more about failing fast and in a controlled/timely manner then it is about forcing requests to succeed.

https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/falling-over.html
https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/max-retries.html

Which is orthogonal to logstash needs, which is fine. As we discussed this year, the difference in status handling between beats+logstash and the clients is a good thing :)

From a clients perspective:

429 can be argued both ways, but in many cases it simply amplifies the load in cases where its less desirable. It could be a queue bounce, in which case we _might_ have better luck on a different node but we could also just amplify the load on the cluster by failing over. e.g each single request ends up being N requests as we attempt to fail over fast, registering more and more requests in the queues.

That said:

429 would be a prime candidate for an incremental backoff retry mechanism but we currently have no such feature in the clients, should we? Some of us already have bulk helpers that support exponential retries of individual 429 bulk item failures.

I also think that is more of a feature for a high level client, rather than a low level one.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ppf2 picture ppf2  路  3Comments

martijnvg picture martijnvg  路  3Comments

clintongormley picture clintongormley  路  3Comments

dadoonet picture dadoonet  路  3Comments

Praveen82 picture Praveen82  路  3Comments