Feature request:
Aggregations that can count only parent documents with matching children documents.
I've been working on a BI system with ES 0.90 and we needed count "users" which have certain attributes, for instance let's say gender and star sign. A user is a parent-level document and the attributes are child documents.
From the sample above, we were doing so by creating a query for each combination of male / female and the star signs and querying individually, as one can imagine, this was slow, but the results are exactly what we want. We could run this in roughly 2 minutes.
We considered using the msearch query to get these results in a single query and we ended up with something similar to this: https://gist.github.com/chaos-generator/9133118
The sample above runs in 40 seconds give or take.
And along came elastic search 1.0.0 and now we have aggregations, so we simplified our query to this: https://gist.github.com/chaos-generator/9133139
This runs lightning fast and we get the results in 200ms on average, which is ideal for us, BUT we get the total number of documents with the attributes, rather than the count on the parent documents.
Our problem, as you can see in the msearch gist, is that we have a parent level document and child documents, which would only be updated if another document with the exact same attributes came in, this means that a parent level user document can have three child documents that will have gender and star sign, but I only want to count the parent document, rather than each individual child document.
As we don't know in advance the attributes our users will be searching, we cannot use a script in index time to help us do this aggregation. We tried to use a script in search time like this: https://gist.github.com/chaos-generator/9133321 , but it didn't work as we wanted too:
You can use this gist to simulate the issue we have: https://gist.github.com/chaos-generator/9143655
Hi @martijnvg and @clintongormley, please, any update regarding the possibility to implement this feature? Is it feasible?
We strongly need this feature as we currently duplicate all the child data into the parent, just to be able to perform the parent aggregation.
Hi @gmenegatti
Currently the parent aggregation is stalled - we ran into a significant barrier to implementation. It may be that we end up reimplementing parent-child completely using a different design, so I'm going to leave this ticket open so that we can revisit it later.
Hi @chaos-generator
While the parent aggregation is not supported, you can use the cardinality aggregation on the _parent field to get an estimated count of matching parents. Here's an example based on the gist you provided:
PUT /my_test/user/1
{
"user_id": "1",
"user_name": "John Smith"
}
PUT /my_test/personal_data/_mapping
{
"personal_data": {
"_parent": {
"type": "user"
}
}
}
PUT /my_test/personal_data/2?parent=1
{
"gender": "male",
"sign": "LEO",
"DOB": "1979-01-01"
}
PUT /my_test/personal_data/3?parent=1
{
"gender": "male",
"sign": "LEO"
}
PUT /my_test/user/4
{
"user_id": "1",
"user_name": "Jane Smith"
}
PUT /my_test/personal_data/5?parent=4
{
"gender": "female",
"sign": "LEO",
"DOB": "1979-01-01"
}
PUT /my_test/personal_data/6?parent=4
{
"gender": "female",
"sign": "LEO"
}
GET /my_test/_msearch?pretty=true
{}
{"size":0,"query":{"bool":{"must":[{"has_child":{"type":"personal_data","query":{"match":{"gender":"male"}}}},{"has_child":{"type":"personal_data","query":{"match":{"sign":"LEO"}}}}]}}}
{}
{"size":0,"query":{"bool":{"must":[{"has_child":{"type":"personal_data","query":{"match":{"gender":"female"}}}},{"has_child":{"type":"personal_data","query":{"match":{"sign":"LEO"}}}}]}}}
GET /_search?search_type=count
{
"aggs": {
"gender": {
"terms": {
"field": "gender"
},
"aggs": {
"sign": {
"terms": {
"field": "sign"
},
"aggs": {
"parents": {
"cardinality": {
"field": "_parent"
}
}
}
}
}
}
}
}
Hi @clintongormley
Now that the Parent/Child code was refactored, do you think this can move forward or it will still be stalled?
Thanks
Hi @gmenegatti
I'm afraid that this aggregator still suffers from the same problem as before the parent/child refactoring.
It is unlikely that we're going to be able to implement a parent aggregation, which steps up from the child aggregation, so I'm going to close this.