Elasticsearch version (bin/elasticsearch --version): Version: 5.6.2, Build: 57e20f3/2017-09-23T13:16:45.703Z, JVM: 1.8.0_131
Plugins installed: []
JVM version (java -version):
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
OS version (uname -a if on a Unix-like system): Linux ip-10-0-16-177 4.4.0-1035-aws #44-Ubuntu SMP Tue Sep 12 17:27:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description of the problem including expected versus actual behavior:
We have an index which is created daily at midnight UTC. In that index there is a field which is logged as JSON Integer, which sometimes is correctly dynamically mapped in the index to an long, and other times is mapped incorrectly as a float. There is only one log message logged by the service logging to this index, and that field is always populated with a long value.
Other fields from the same structure logged at the same time are logged as JSON integers, but have their index field values created at floats. I was unable to find a recent day-based index where these other fields mapped to anything _other_ than a float.
It's not clear to me why this is the case. From my understanding the first document indexed when the index is created sets the types of the fields, and the first document should have set that field as a long. A test where I create a new test index and then POST a document into it creates all fields expected as long.
Steps to reproduce:
I cannot reliably reproduce this, as it seems to only be happening in our production environment. I'm mostly looking for guidance as to how a logged JSON int coulf be mapped to a float, or, if the only way that is possible is that we actually logged a JSON float somehow, and I should do perhaps into our production system.
Can you try to identify the document that triggers this behaviour? I suspect it might have eg. 3.0 instead of 3 as a value, which makes Elasticsearch index it as a float. Maybe check whether you have a proxy between your client and Elasticsearch that might rewrite the json document and transform something that is an integer when it leaves the client-side into a float when it reaches Elasticsearch.
cc @elastic/es-search-aggs
@jpountz thanks for the reply. I did spend some time trying to find documents in the index which were indexed as a float, and was unable to. I'm fairly confident this is not occurring, as the particular field is one extract programmatically out of thrift struct, whose field is the i32 thrift type. Though I do acknowledge that a rogue document being indexed with a float value there would be the likely culprit.
We do not have a proxy between our logging application and elasticache either.
Do you have any index or dynamic templates configured?
Is there an application processing the thrift datastructure and converting it to JSON to send to ES? Or are you using something like Logstash?
I ask because various languages have weird edge-cases. For example, PHP (everyone's favorite punching bag) will cast an integer to double if it is greater than MAX_INT so as to prevent overflow. There are also oddities with how it numerics are encoded to JSON in PHP. It implicitly truncates 3.0 to 3 unless you tell it otherwise.
Perhaps your language has similar oddities?
If you know the mappings ahead of time (since you have the thrift schema) setting explicit mappings in ES would be the easiest solution. If you make the mappings strict, any future rogue documents will throw an exception and you might be able to identify what the culprit is.
We were unable to identify the cause here, most likely a client problem. Closing, but feel free to reopen if you find out more.
Most helpful comment
Do you have any index or dynamic templates configured?
Is there an application processing the thrift datastructure and converting it to JSON to send to ES? Or are you using something like Logstash?
I ask because various languages have weird edge-cases. For example, PHP (everyone's favorite punching bag) will cast an integer to double if it is greater than MAX_INT so as to prevent overflow. There are also oddities with how it numerics are encoded to JSON in PHP. It implicitly truncates
3.0to3unless you tell it otherwise.Perhaps your language has similar oddities?
If you know the mappings ahead of time (since you have the thrift schema) setting explicit mappings in ES would be the easiest solution. If you make the mappings strict, any future rogue documents will throw an exception and you might be able to identify what the culprit is.