Elasticsearch-dsl-py: Response __bool__ gives not reliable results when using aggregations

Created on 25 Sep 2019  路  6Comments  路  Source: elastic/elasticsearch-dsl-py

When performing an aggregation over a large index is suggested to create search object to not return any hits.
This however can give inconsistent boolean representation of the response in elasticsearch-dsl.

s = Search()
s = s.extra(size=0)
s.aggs.bucket('articles_per_day', 'date_histogram', field='publish_date', interval='day')\
    .metric('clicks_per_day', 'sum', field='clicks')

r = s.execute()
bool(r)       # false
r.hits.totlal # != 0

Since no hits was returned Response.hits list is empty while Response.hits.total may be a positive number, thus meaning that the search was successful.

We suggest that Response.__bool__ should check hits.total instead then hits.__bool__ to test if the query was succesful.

All 6 comments

There is a success() method on the response to check whether a search has been successful, bool was meant to check if there are any hits, meaning will I get anything back if I iterate on the object. Does that make sense?

Sorry, I was not clear.

I mean that it may cause confusion to have bool(response) == False when hits.total > 0 but no hits was actually returned because of size=0. Since is intended to not return any hits but the boolean check is meant to check if it _could_ have returned any.

I think we see what the bool should represent differently, you say:

the boolean check is meant to check if it could have returned any

while the implementation has the meaning (as of now) as:

will I get anything back if I iterate on the object

Both are valid approaches and I see where your expectation is coming from, but I also would find it weird if bool(response) returned True and then iterating over it would bring no results.

I wonder if the best solution is to not implement bool and instead rely on explicit methods. What do you think?

Yes, probabily _explicit_ methods could also make code more readable.

I think we see what the bool should represent differently, you say:

the boolean check is meant to check if it could have returned any

while the implementation has the meaning (as of now) as:

will I get anything back if I iterate on the object

Both are valid approaches and I see where your expectation is coming from, but I also would find it weird if bool(response) returned True and then iterating over it would bring no results.

I wonder if the best solution is to not implement bool and instead rely on explicit methods. What do you think?

I know this is an old post, but I'm in a similar situation: I perform a query where I'm not interested in any hits, I just want the aggs.

When I perform a truthy check on the Response object, it returns false, even though aggregations are present in it. So wouldn't it be more logical to do something like:

def __nonzero__(self):

    return bool(self.hits) or bool(self.aggs)

instead of only checking self.hits?

Your previous statement about iterating makes sense, but then again, returning a falsy result doesn't make sense when there was indeed a correct result..

Response.__bool__ will continue to reflect the hits length. To check whether there are aggs results should use the bool(resp.aggs)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ypkkhatri picture ypkkhatri  路  4Comments

rokcarl picture rokcarl  路  4Comments

gabrielpjordao picture gabrielpjordao  路  3Comments

abuzakaria picture abuzakaria  路  4Comments

MauriJHN picture MauriJHN  路  4Comments