Elasticsearch: Bulk index with invalid_index_name_exception reply a HTTP status 200.

Created on 5 Feb 2018 · 9Comments · Source: elastic/elasticsearch

Elasticsearch version: 5.6.2

Plugins installed: []

JVM version (java -version):

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OS version (uname -a if on a Unix-like system):

Linux Ubuntu16 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Actual behaviour: When trying to bulk index a document with an invalid index name (index containing white-spaces), elasticsearch returns a 'invalid_index_name_exception'. In the JSON response a status code of 400 is specified and the HTTP status code is 200.

Expected behaviour: The HTTP status code is also 400. When indexing data, we don't want to deserialize the response as it's more a fire and forget call. Response gets deserialized only when something went wrong.

Steps to reproduce:
Bulk index one or more doc in an index containing invalid character such as , \", *, \\, <, |, ,, >, /, ?

Source

MarcMagnin

👍1

All 9 comments

The behavior here is by design. A 200 OK status code on a bulk response means that the coordinating node successfully parsed and executed the client request. You need to check the errors field to know if there were any errors handling the individual requests, and indeed parse the response body to know which documents failed and why.

jasontedor on 5 Feb 2018

Thanks for your quick reply.
I understand this behavior is by design but knowing that the coordinating node successfully parsed and executed the client request is not very needed because regardless of the 200 we have to look into the details of the response.

The problem I want to emphasis on is invalid_index_name_exception is something that will be solve and will never be triggered again. The client needs to deserialize and examine the response that will never have this issue anymore so the client will just skip the deserialization to prevent wasting memory/CPU in write intensive scenarios.
Then imagine there is an update in elasticsearch invalid characters list. This could silently break such client, hence client is regardless forced to always deserialize and examine the response for "safety".
That's why quickly checking the HTTP status make sense to tell if there is something to look at or just keep going and not bother.

MarcMagnin on 5 Feb 2018

Again, check the errors field in the response for a "quick" check as to whether or not there were any failures handling the individual requests.

jasontedor on 5 Feb 2018

👍1

Again, there is nothing "quick" in the deserialization of a large message to check an error field. The only in-between I can think of is a regex match a status!=200 in the body before actually deserializing.
Anyway thanks for your time.

MarcMagnin on 6 Feb 2018

@marcinpm in a healthy system, you have to check error message for things like rejections. What language are you using? We're developing assuming that when people are performance sensitive like that, they will use a forward only parser, making reading the error (which is the second field) light.

Re status codes - sadly there's no status code that will correctly reflect that the bulk was successful executed but some of the inner requests failed.

bleskes on 6 Feb 2018

❤1

sorry, I meant @MarcMagnin

bleskes on 6 Feb 2018

@bleskes Thanks for your answer!
Sorry @jasontedor I didn't catch the fact that we have a global error at the beginning of the response, hence a forward only parser will do the job in an acceptable timing.
Currently I'm working with some Golang code so I may give a try with this one: https://github.com/buger/jsonparser
Knowing that I'll probably just go with a very naive byte lookup at the beginning of the stream.
Thanks a lot guys!

MarcMagnin on 6 Feb 2018

I wrote this naive way to get it:

var errorFlag bool
var took int
fmt.Fscanf(resp.Body, "{\"took\":%d,\"errors\":%t", &took, &errorFlag)
fmt.Printf("\n %v %v", took, errorFlag)

Sounds to be an efficient way to get the error. Any suggestion regarding this?

MarcMagnin on 6 Feb 2018

@MarcMagnin to answer your question will need to dig deeper into the go semantics and your uscases. I suggest you take this to the forums on http://discuss.elastic.co as this goes out of scope for github (we keep it for issues and features requests)

bleskes on 6 Feb 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings