Elasticsearch-dsl-py: Scripting a range aggregation

Created on 27 Apr 2015 · 7Comments · Source: elastic/elasticsearch-dsl-py

Hi. I have this multi-level aggregation, that I would like to convert. Problem is, I don't know how to code the ranges part is Python.

CURL code

POST /_search
{
    "size": 0,
    "aggs": {
        "by_property" : {
            "terms": {
                "field": "propertyId",
                "size": 0
            },
            "aggs": {
                "twitter_count": {
                    "range": {
                        "field": "twitterAccount.followers",
                        "ranges": [
                            { "to" : 5001},
                            { "from" : 5001, "to" : 10001},
                            { "from" : 10001, "to" : 50001},
                            { "from" : 50001}
                        ]
                    },
                    "aggs" : {
                        "email_addy": {
                            "terms" : {
                                "field": "emails.value",
                                "size": 0
                            }
                        }
                    }
                }
            }
        }
    }
}

Python DSL code

s.aggs.bucket('by_property', 'terms', field='propertyId', size=0) \
    .bucket('twitter_count', 'range', field='twitterAccount.followers')

How do I continue the aggregation and say what ranges, the range uses?
.ranges( )? body=range{ } ?

Source

vbaii

Most helpful comment

Hi,

the mechanism is always the same, whatever you would put inside of the json object, just pass in as kwargs, in this case:

s.aggs.bucket('by_property', 'terms', field='propertyId', size=0)\
    .bucket('twitter_count', 'range',
        field='twitterAccount.followers',
        ranges=[
            {'to': 5001},
            {'from': 5001, 'to': 10001},
            {'from': 10001, 'to': 50001},
            {'from': 50001}
        ]
    )

Note that you can always use Search.from_dict to just pass it the json you would send by curl and then inspect the resulting object and it's repr:

s = Search.from_dict({...})
print(repr(s.aggs['by_property']['twitter_count']))

Hope this helps

HonzaKral on 27 Apr 2015

👍2

All 7 comments

Hi,

the mechanism is always the same, whatever you would put inside of the json object, just pass in as kwargs, in this case:

s.aggs.bucket('by_property', 'terms', field='propertyId', size=0)\
    .bucket('twitter_count', 'range',
        field='twitterAccount.followers',
        ranges=[
            {'to': 5001},
            {'from': 5001, 'to': 10001},
            {'from': 10001, 'to': 50001},
            {'from': 50001}
        ]
    )

Note that you can always use Search.from_dict to just pass it the json you would send by curl and then inspect the resulting object and it's repr:

s = Search.from_dict({...})
print(repr(s.aggs['by_property']['twitter_count']))

Hope this helps

HonzaKral on 27 Apr 2015

👍2

Thank so much Honza. I'm new to both Python and ElasticSearch, which conflates my syntax issues. I will also try Search.from_dict. That will come in handy for some other queries down the line. This definitely helped. Thanks again.

vbaii on 27 Apr 2015

👍1

Happy to help

HonzaKral on 27 Apr 2015

I attempted using from_dict, however, the query ignores the index I specified now.

client = connections.create_connection(hosts=['http://some_location:9200'])
s = Search(using=client, index="g", doc_type="prop")

body = {
    "query": {
        "match_all": {}
   },
    "aggs": {
        "by_property" : {
            "terms": {
                "field": "propertyId",
                "size": 3
            },
            "aggs": {
                "twitter_count": {
                    "range": {
                        "field": "twitterAccount.followers",
                        "ranges": [
                            { "from" : 1000, "to" : 5000},
                            { "from" : 5000, "to" : 10000},
                            { "from" : 10000, "to" : 50000},
                            { "from" : 50000}
                        ]
                    },
                    "aggs" : {
                        "email_addy": {
                            "terms" : {
                                "field": "emails.value",
                                "size": 3
                            }
                        }
                    }
                }
            }
        }
    }
}
s = Search.from_dict(body)
#s.index("g")
#s.doc_type("prop")
body = s.to_dict()

r = s.execute()
print repr(r)
print
for i in vars(r):
    print i
print
for i in r:
    print vars(i)
print

My results should only include stuff from the g index, but it's pulling across multiple indices, with results like:

{'_meta': {u'index': u'logevents1409', u'score': 1.0, u'id': u'ab3...}, '_d_': {u'status': u'success', u'count': 1, u'project': u'proj1', u'info': {u'args': u'', u'command': u'x:cronjob:hourly', u'queuename': u'cronjob'}, u'date': u'2014-09-01T23:59:59+00:00', u'type': u'jobQueue', u'id': u'rAnDoMlEtErS', u'propertyId': None}}

Does the index and doc_type info get erases/reset when I declare s.to_dict()? How and where in the script would I set them back to what I want?

vbaii on 15 May 2015

Ah, the index parameter is not being passed in the body, but in the URL so it won't be in the to_dict output.

Also, you have to write: s = s.index('g') because .index (and all other methods on Search) returns a copy of the Search object, it doesn't mutate it in place.

HonzaKral on 15 May 2015

Thanks for the help, it saved lot of time. Can i also know how to i specify keyed aggregation in DSL, like how it is in the elastic docs

{
    "aggs" : {
        "price_ranges" : {
            "range" : {
                "field" : "price",
                **"keyed" : true**,
                "ranges" : [
                    { "to" : 100 },
                    { "from" : 100, "to" : 200 },
                    { "from" : 200 }
                ]
            }
        }
    }
}

kslsantosh on 20 Jun 2018

@kslsantosh
you need to write something like that:
```search = Search(using=client, index=index, doc_type=doc_type)
search.aggs.bucket("bucketName", "range", field='fieldName', keyed=True, ranges=[
{'to': 1},
{'from': 2, 'to': 3},
{'from': 4}
])

result:

print(search.aggs.to_dict()
{'bucketName': {'buckets': {'-1.0': {'to': 1.0, 'doc_count': 0}, '2.0-3.0': {'from': 2.0, 'to': 3.0, 'doc_count': 0}, '4.0-': {'from': 4.0, 'doc_count': 777}}}}
```