Elasticsearch-dsl-py: Clean way to make pandas DataFrame from search results?

Created on 15 Sep 2017  路  6Comments  路  Source: elastic/elasticsearch-dsl-py

I would like to create a DataFrame from Elasticsearch Query DSL results. Try as I might, I cannot figure out how to create a DataFrame just from the results. E.g. I have tried:

# Create a basic ES client
client = Elasticsearch(['url'])

# Search
search = Search(using=client)

results = search.execute()

search_dict = results.hits.hits

results_df = pd.DataFrame(search_dict)

However, my DataFrame looks like this:
screenshot_20170915_153533

How can I get the values of the _source column to populate a pandas DataFrame?

Most helpful comment

s = Search()
# add any filters/queries....
s = s.query(...)

results_df = pd.DataFrame((d.to_dict() for d in s.scan()))

should be exactly what you need. Alternatively omit the .scan() if you only want the top 10 results (useful for testing).

Hope this helps

All 6 comments

s = Search()
# add any filters/queries....
s = s.query(...)

results_df = pd.DataFrame((d.to_dict() for d in s.scan()))

should be exactly what you need. Alternatively omit the .scan() if you only want the top 10 results (useful for testing).

Hope this helps

@HonzaKral Can you help me if I need to get more than 10000 rows. Like in python there is a limitation that we cant get more than 10,000 observation. What if i need to get 1 Million observation into pandas or if I can directly dump that into csv or some Nosql

@abhimanyu3 just use the scan method as mentioned above, it will work for any number of results

I am doing the below

es_index = "logstash-2018.08.26"
documento = "your_doc_type"


body = {"from": 0, "size": 100000,
    "query": {
        "constant_score" : {
            "filter" : {
                 "bool" : {
                    "must" : [{
              "range": {"@timestamp" : {
                "gte": "2018-08-26T14:00:08.000Z", 
                "lte": "2018-08-26T16:00:00.000Z"

            }}
          }],
                   "filter": [
                        {"term"  :{"type" :"vx_apache_json"}},
                        {"term"  :{"api" :"viv_signin.php"}},
                        {"term"  :{"domain":"fnwp"}}




                   ]
                 }}}}}

res = helpers.scan(
                client = es,
                scroll = '2m',
                query = body, 
                index = es_index)

and when i trying to get the value of res its giving me

when i am using

for i in res:
 print(i)
I am getting some kind of list of dictionary..

Can you help me in getting directly a pandas df with it

and when i am trying to execute what you suggested it gives me error dict has no attribute scan
and with your method i am also getting Search not defined.

@abhimanyu3 I am sorry, but I cannot help you with any details that are related to pandas or python in general. This issue is for elasticsearch_dsl library. You did get all of the data out of elasticsearch as dicts, to trasnform them into a data frame just pass them into pd.DataFrame as seen in previous examples on this very same ticket. Otherwise please seek help in a forum dedicated to pandas directly on how to transform an interator of dictionaries into a data frame.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

leoliuxd picture leoliuxd  路  4Comments

njoannin picture njoannin  路  3Comments

takaomag picture takaomag  路  3Comments

gabrielpjordao picture gabrielpjordao  路  3Comments

quasiben picture quasiben  路  4Comments