Elasticsearch: exists query on numeric attributes with null_value gives unexpected result

Created on 21 Oct 2018  Â·  11Comments  Â·  Source: elastic/elasticsearch

Hi all,
I'm working with elastic 6.2.4.
my index-mapping contains "null_value".

I've notice that exists query gives different result when working with numeric value.

example:

PUT person { "mappings": { "_doc": { "properties": { "nickname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256, "null_value": "null" } } }, "grade":{ "type":"double", "null_value": -1 } } } } }

POST /person/_doc/1 { "name":"David", "nickname":null, "grade":null }

GET person/_search { "size": 0 , "query": { "exists":{ "field":"grade" } } }

gives unexpected result: hits.total: 1.

where...

GET person/_search { "size": 0 , "query": { "exists":{ "field":"nickname" } } }

gives expected result: hits.total: 0.

I think it's a BUG.

:AnalyticAggregations >bug

All 11 comments

Pinging @elastic/es-search-aggs

@azulay7 using the example you pasted above I am not able to reproduce the issue on 6.2.4 or the latest 6.x or master branches. in all cases the receive 0 hits back when performing the exists query on the grade field

Hi. @colings86, thank you for checking this out.
I know for sure that putting "null_value": -1 on the mapping causes the issue.

Can you please re-check yourself, if you put the mapping for grade field?
GET person/_mapping
you should get on grade property:

"null_value": -1

Thank you very much :)

@azulay7 I ran it again on another fresh install of Elasticsearch 6.2.4 but unfortunately I'm still not able to reproduce the error (I also tried again with the latest 6.x and master branches without success too). This is the console output for the commands I ran against Elasticsearch:

➜  11:32:34 elasticsearch-6.2.4 $  curl -XPUT "http://localhost:9200/person?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "_doc": {
      "properties": {
        "nickname": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256,
              "null_value": "null"
            }
          }
        },
        "grade": {
          "type": "double",
          "null_value": -1
        }
      }
    }
  }
}'
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "person"
}
➜  11:33:08 elasticsearch-6.2.4 $  curl -XGET "http://localhost:9200/person/?pretty" -H 'Content-Type: application/json'
{
  "person" : {
    "aliases" : { },
    "mappings" : {
      "_doc" : {
        "properties" : {
          "grade" : {
            "type" : "double",
            "null_value" : -1.0
          },
          "nickname" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "null_value" : "null",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1540290787216",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "B8cvUNZ3S32U-4RkPSF4dA",
        "version" : {
          "created" : "6020499"
        },
        "provided_name" : "person"
      }
    }
  }
}
➜  11:33:12 elasticsearch-6.2.4 $ curl -XPOST "http://localhost:9200/person/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "name": "David",
  "nickname": null,
  "grade": null
}'
{
  "_index" : "person",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
➜  11:33:28 elasticsearch-6.2.4 $  curl -XGET "http://localhost:9200/person/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "exists": {
      "field": "grade"
    }
  }
}'
{
  "took" : 44,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

Are you able to reproduce this issue on a completely fresh install of 6.2.4?

Actually I see that I misread your description above. Thetotal.hits is where the error is and above I too see it as 1 but the actual hits returned is empty. This is indeed a bug. Sorry for the confusion earlier

@colings86, Great!! Hope to hear for any progress :)

@azulay7 actually looking again at this its not a bug but expected behaviour.

Firstly the difference in the number of hits returned being 0 and the total.hits: 1 is explained by the size: 0 in the request.

Secondly I would expect the total.hits to be returned as 1 for the grade field because the document does have a value for that field since you configured a null_value. This means the field exists because at index time the null is replaced by the null_value you configured. The same behaviour occurs if you run the search request on the nickname.keyword field which also has a null_value configured. Searching on the nickname field returns total.hits: 0 because there is no null_value configured (in fact null_value is not supported for text fields) so nothing is indexed for the document in the nickname field and so an exists query will return no hits

@colings86 I understand your perspective answer.. but I am bit confused on 'null_value':

  1. first of all, I thought that null_value is being stored under the hood and really repalce null in the document. because if I search for the document I get original null.
    GET person/_search
    result:

"_source": {
"name": "David",
"nickname": null,
"grade": null
}

only when I query aggs, I get "null_value"

GET person/_search { "aggs":{ "nickname":{ "terms": { "field": "nickname.keyword", "size": 10 } } } }

eventually what I am trying to perform is to query exists on thus field, can you advice me ho to do soo?

I thought about doing aggs and count if I have only 1 bucket of null_value, and the total docs of it is equal to my all my document..
but it's a complex and quite expensive than exists query.

thank for attention,
I do really appreciate it :)

@azulay7 The _source is never modified by Elasticsearch and always returns the exact JSON that was used int eh index request at index time, this is why the _source shows null for fields with a null_value. The reason for doing this is that it gives you the original document you indexed rather than a response that contains all the modifications that are made to surface the document better in searches.

If you want to perform an exists query that also filters out documents which have the null value you can combine an exists query with a term query for the null value in a boolean query and this should achieve what you are looking for?

@colings86 Thank you for clarifying this point for me. I understand why elasitc preserve the original
_source.

and yes I'm looking for a query like this. can you add example of your own?
it will be really helpful.

Thanks!

Something liek this should give the desired result:

GET person/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "exists": {
            "field": "grade"
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "grade": -1
                }
              }
            ]
          }
        }
      ]
    }
  }
}
Was this page helpful?
0 / 5 - 0 ratings