When performing an aggregation over a large index is suggested to create search object to not return any hits.
This however can give inconsistent boolean representation of the response in elasticsearch-dsl.
s = Search()
s = s.extra(size=0)
s.aggs.bucket('articles_per_day', 'date_histogram', field='publish_date', interval='day')\
.metric('clicks_per_day', 'sum', field='clicks')
r = s.execute()
bool(r) # false
r.hits.totlal # != 0
Since no hits was returned Response.hits list is empty while Response.hits.total may be a positive number, thus meaning that the search was successful.
We suggest that Response.__bool__ should check hits.total instead then hits.__bool__ to test if the query was succesful.
There is a success() method on the response to check whether a search has been successful, bool was meant to check if there are any hits, meaning will I get anything back if I iterate on the object. Does that make sense?
Sorry, I was not clear.
I mean that it may cause confusion to have bool(response) == False when hits.total > 0 but no hits was actually returned because of size=0. Since is intended to not return any hits but the boolean check is meant to check if it _could_ have returned any.
I think we see what the bool should represent differently, you say:
the boolean check is meant to check if it could have returned any
while the implementation has the meaning (as of now) as:
will I get anything back if I iterate on the object
Both are valid approaches and I see where your expectation is coming from, but I also would find it weird if bool(response) returned True and then iterating over it would bring no results.
I wonder if the best solution is to not implement bool and instead rely on explicit methods. What do you think?
Yes, probabily _explicit_ methods could also make code more readable.
I think we see what the
boolshould represent differently, you say:the boolean check is meant to check if it could have returned any
while the implementation has the meaning (as of now) as:
will I get anything back if I iterate on the object
Both are valid approaches and I see where your expectation is coming from, but I also would find it weird if
bool(response)returnedTrueand then iterating over it would bring no results.I wonder if the best solution is to not implement
booland instead rely on explicit methods. What do you think?
I know this is an old post, but I'm in a similar situation: I perform a query where I'm not interested in any hits, I just want the aggs.
When I perform a truthy check on the Response object, it returns false, even though aggregations are present in it. So wouldn't it be more logical to do something like:
def __nonzero__(self):
return bool(self.hits) or bool(self.aggs)
instead of only checking self.hits?
Your previous statement about iterating makes sense, but then again, returning a falsy result doesn't make sense when there was indeed a correct result..
Response.__bool__ will continue to reflect the hits length. To check whether there are aggs results should use the bool(resp.aggs)