Elasticsearch: Using a missing filter for attributes of a nested object always returns an empty set

Created on 13 Aug 2013  路  9Comments  路  Source: elastic/elasticsearch

Hi there,

I was trying to use the exists/missing filters when I stumbled upon this behavior: When I use the missing filter for nested objects, it always returns an empty set if the containing nested object is missing, too.

Here is my document mapping:

{
  "site": {
    "properties": {
      "host": {
        "type": "string",
        "index": "not_analyzed",
        "omit_norms": true,
        "index_options": "docs"
      },
      "ip": {,
        "type": "string",
        "index": "not_analyzed",
        "omit_norms": true,
        "index_options": "docs"
      },
      ...
      "modules": {
        "type": "nested",
        "properties": {
          "module_id": {
            "type": "integer"
          },
          "name": {
            "type": "string",
            "index": "not_analyzed",
            "omit_norms": true,
            "index_options": "docs"
          },
          ...
        }
      }
    }
  }
}

My Document looks like this:

{
  "host": "6c1bb1fb58e8c48cabbd1e4382e55871f31ad776.com",
  "ip" : "0.0.0.0",
  ...
  "modules": [ ]
}

If I now use a query with a nested filter to select every document where modules.name is missing, I only get an empty set.

{
  "query": {
    "filtered": {
      "query": { "match_all": { } },

      "filter": {
        "nested": {
          "path": "modules",
          "query": {
            "filtered": {
              "query": { "match_all": { } },

              "filter": {
                "missing": { "field": "modules.name" }
              }

            }
          }
        }
      }
    }
  }
}

It seems to work if I submit a document which contains a module:

{
  "host": "6c1bb1fb58e8c48cabbd1e4382e55871f31ad776.com",
  "ip" : "0.0.0.0",
  ...
  "modules": [ { "version" : "foo" } ]
}

When using documents where the modules object isn't empty, use a missing filter which looks for "deeper" missing attributes seems to work, too.

{
  "query": {
    "filtered": {
      "query": { "match_all": { } },

      "filter": {
        "nested": {
          "path": "modules",
          "query": {
            "filtered": {
              "query": { "match_all": { } },

              "filter": {
                "missing": { "field": "modules.foo.bar.baz" }
              }

            }
          }
        }
      }
    }
  }
}

I was expecting, that a missing filter also returns documents if the containing nested object is missing or empty.

Update: Wrapping an exists filter in a not filter doesn't return any documents, either.

:SearcSearch discuss

Most helpful comment

Since the "not" is deprecated , you can use the must not .

POST /my_index/my_type/_search
{ "filter": { "bool": { "must_not": [ { "nested": { "path": "path_to_nested_doc", "query": { "match_all": {} } } } ] } } }

All 9 comments

Yeah I've also been a little stumped trying to figure out how to find documents without a nested object... in my case the nested object is an array of objects.... I'd like to find them when the array is empty.

Ah I found a solution at http://grokbase.com/t/gg/elasticsearch/13bfq5qbse/missing-filter-with-nested-objects

curl -XPOST "http://ocvli-apw602:9200/test2/IR/_search" -d'
{
    "filter": {
       "not": {
          "nested": {
             "path": "priosenio",
             "filter": {
                "match_all": {}
             }
          }
       }
    }
}'

I just ran into this. The workaround highlighted by @drewish feels pretty clunky though :confused:

Here's a simple recreation that describes the problem:

PUT t
{
  "mappings": {
    "t": {
      "properties": {
        "foo": {
          "type": "nested"
        }
      }
    }
  }
}

PUT t/t/1
{
  "foo": {
    "bar": "bar"
  }
}

PUT t/t/2
{
  "xyz": "xyz"
}

This request matches doc 1, because it has a nested doc which is missing the field, but not doc 2 because it has no nested docs:

GET t/_search
{
  "query": {
    "nested": {
      "path": "foo",
      "query": {
        "missing": {
          "field": "foo.baz"
        }
      }
    }
  }
}

This workaround works correctly for both docs:

GET t/_search
{
  "query": {
    "not": {
      "nested": {
        "path": "foo",
        "query": {
          "exists": {
            "field": "foo.baz"
          }
        }
      }
    }
  }
}

@martijnvg @jpountz is this fixable?

It is not fixable, unless the missing query can detect it is being used within a nested query, which I would like to avoid at all costs. We don't index missing fields in documents, only existing fields, so the missing query is internally implemented as the negation of an exists query. This raises problems as described here given that putting the not inside of the nested query has a totally different effect than putting it outside as your workaround does.

I think the way to fix this trap would be to deprecate the missing query in favor of explicit negations of the exists query.

@jpountz ++ makes sense.

Closing in favour of #14112

Since the "not" is deprecated , you can use the must not .

POST /my_index/my_type/_search
{ "filter": { "bool": { "must_not": [ { "nested": { "path": "path_to_nested_doc", "query": { "match_all": {} } } } ] } } }

This works for me

GET /type/_search?pretty=true
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "outcome",
"query": {
"exists": {
"field": "outcome.outcomeName"
}
}
}
}
]
}
}
}

Any update on this ? the query from @manuprasanth does not return any document for me, while some of my nested elements are empty. What I find even stranger is that if I do the same query with _must_ instead of _mustnot_, I get the correct output (my empty elements are not returned).

Was this page helpful?
0 / 5 - 0 ratings