Elasticsearch: Support min_children & max_children for nested docs

Created on 9 Mar 2015  路  18Comments  路  Source: elastic/elasticsearch

I am opening this as a separate issue since the previous issue was closed with support for parent-child docs (https://github.com/elasticsearch/elasticsearch/issues/6019#issuecomment-77785163).

We would love to have support for min_children & max_children or similar also for nested filters/docs. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

Thanks a keep up the great work.

:SearcSearch >feature Search help wanted stalled

Most helpful comment

Any update on this? Would be really, really good to have this feature!

All 18 comments

+1 we should add this! I think we should also open an issue for this in Lucene, because the nested query uses the ToParentBlockJoinQuery Lucene query to do the actual work.

I opened: https://issues.apache.org/jira/browse/LUCENE-6354 to get this in Lucene

+1

+1

+1

+1

+1

+1

+1

This would be a great feature to have. The only reason why we are using parent/child instead of nested mapping is the lack of min_children/max_children options in the nested query. Considering that:

  1. Nested queries are much faster than has_child queries;
  2. Elasticsearch is moving in the one-type-per-index direction;

I would very much like to see this implemented. Please let me know if there's anything I can do to help.

Any update on this? Would be really, really good to have this feature!

@elastic/es-search-aggs

Stalled waiting for https://issues.apache.org/jira/browse/LUCENE-6354 to be completed and merged

+1

+1

I'd love to see this.

To give some context - while the main reason for us to migrate to a parent/child model from nested was indexing speed we also did so because of the min_children and max_children feature.

However, we have become painfully aware of the cost of has_child queries (joins) as the number of child documents and/or complexity of queries increases. OOM exceptions have become too frequent for comfort.

For stability reasons, we are re-considering the nested model even if it means decreased indexing speed. Knowing, that min_children and max_children for that model are still being planned would re-assure us.

Thank you!

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

Alternative solution

To support min_children and max_children for your nested query, all you have to do is to use a function_score query. To make this more concrete: you have an index called Person with the following mapping:

  • first_name (text)
  • email (text)
  • children (nested)

    • first_name (text)

    • last_name (text)

Cases

To effectively support min_children and max_children, there are multiple queries you need to consider:

  • Find all Persons who don't have children.
  • Find all Persons who have more than _n_ children:

    • requires: n > 0

    • if n is 0, it means you're asking for _any_ Person, regardless if they have or don't have children.

  • Find all Persons who have between _n_ and _m_ children:

    • requires: n > 0

    • if n is 0, it means you're asking to find all Persons who have less than _m_ children.

  • Find all persons who have less than _n_ children.

Depending on the scenario, the request will look different.

Notes about function_score:

  • function_score supports min_score. It filters out any document where the score is lower than the min_score.
  • function_score has a max_boost. This doesn't filter documents returned, it simply caps the score to a specific value. For instance: if after calculating the score, you end up with 500, and the max_boost is 50, 50 will be returned.
  • If you don't want this function_score to pollute the overall score of the document, apply a boost of 0.

Find all persons have no children

Explanation: easiest query, you simply have to verify there are no nested documents. It is significantly faster than using the function_score.

{
    "query": {
        "bool": {
            "must_not": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    }
                }
            }    
        }
    }
}

Find all persons have a minimum of _n_ children

Explanation: each matching document is boosted by _10_ and the nested query sums them. The function_score filters out any document that is less than what is expected.

Example: Find all persons who have a minimum of _2_ children: the boost applied here is _10_ (you can set any number you want here), as such the min_score is _20_ (2 * 10).

_Before Elastic 7_:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

_Elastic 7+_:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have between _n_ and _m_ children (n > 0).

Explanation: same as above, except here we apply a script to alter the score. This script checks that the sum of all boosts is not exceeding m * boost, if it does, it returns 0 which automatically guarantee the document will be excluded (0 < min_score).

Example: Find all persons who have a minimum of 2 children and a maximum of 5 children (inclusive).

_Before Elastic 7_:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "functions": {
                "script_score": {
                    "script": {
                        "source": "if (_score > 50) { return 0; } return _score;",
                        "lang": "painless"
                    }
                }
            },
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

_Elastic 7+_:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "functions": [
                {
                    "filter": {
                        "match_all": {
                            "boost": 1
                        }
                    },
                    "script_score": {
                        "filter": {
                            "match_all": {
                                "boost": 1
                            }
                        },
                        "script": {
                            "source": "if (_score > 50) { return 0; } return _score;",
                            "lang": "painless"
                        }
                    }
                }
            ],
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have less than _n_ children (n > 0).

Explanation: this request means you are asking for persons who have no children and persons who have been 1 to _n_ children. Expressing this with elastic can be tricky, so taking the negation makes it easier: you're asking to not find all persons who have more than n + 1 children.

Example: find all persons who have less than 2 children.

_Before Elastic 7_:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "query": {
                        "nested": {
                            "path": "children",
                            "query": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            },
                            "boost": 10,
                            "score_mode": "sum"
                        }
                    }
                }
            }
        }
    }
}

_Elastic 7+_:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "score_mode": "multiply",
                    "boost_mode": "replace",
                    "query": {
                        "nested": {
                            "path": "children",
                            "boost": 10,
                            "score_mode": "sum",
                            "query": {
                                "constant_score": {
                                    "boost": 1,
                                    "filter": {
                                        "exists": {
                                            "field": "children.first_name"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Hope this helps.

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

Any update on this?

I'm trying to filter my results based on an exact length. @xethorn I can't seem to get your solution working with filters, could you point me in the right direction?

Here's my search with filters, which don't support scoring:

GET /test/_search
{
  "query" : {
    "function_score": {
      "min_score": 20,
      "boost": 1,
      "functions": [
        {
          "script_score": {
            "script": {
                "source": "if (_score > 20) { return - 1; } return _score;"
            }
          }
        }
      ],
      "query": {
        "bool" : {
          "filter": [
            { "range": { "distance": { "lt": 5 }}},
            {
              "nested": {
                "score_mode": "sum",
                "boost": 10,
                "path": "dates",
                "query": {
                  "bool": {
                    "filter": [
                      { "range": { "dates.rooms": { "gte": 1 } } },
                      { "range": { "dates.timestamp": { "lte": 2 }}},
                      { "range": { "dates.timestamp": { "gte": 1 }}}
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

A few more details here: https://stackoverflow.com/questions/63226805/filter-query-by-length-of-nested-objects-ie-min-child

Question was answered on slack overflow. For comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4. :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

martijnvg picture martijnvg  路  3Comments

matthughes picture matthughes  路  3Comments

makeyang picture makeyang  路  3Comments

ttaranov picture ttaranov  路  3Comments

rpalsaxena picture rpalsaxena  路  3Comments