I would like to create a DataFrame from Elasticsearch Query DSL results. Try as I might, I cannot figure out how to create a DataFrame just from the results. E.g. I have tried:
# Create a basic ES client
client = Elasticsearch(['url'])
# Search
search = Search(using=client)
results = search.execute()
search_dict = results.hits.hits
results_df = pd.DataFrame(search_dict)
However, my DataFrame looks like this:

How can I get the values of the _source column to populate a pandas DataFrame?
s = Search()
# add any filters/queries....
s = s.query(...)
results_df = pd.DataFrame((d.to_dict() for d in s.scan()))
should be exactly what you need. Alternatively omit the .scan() if you only want the top 10 results (useful for testing).
Hope this helps
@HonzaKral Can you help me if I need to get more than 10000 rows. Like in python there is a limitation that we cant get more than 10,000 observation. What if i need to get 1 Million observation into pandas or if I can directly dump that into csv or some Nosql
@abhimanyu3 just use the scan method as mentioned above, it will work for any number of results
I am doing the below
es_index = "logstash-2018.08.26"
documento = "your_doc_type"
body = {"from": 0, "size": 100000,
"query": {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [{
"range": {"@timestamp" : {
"gte": "2018-08-26T14:00:08.000Z",
"lte": "2018-08-26T16:00:00.000Z"
}}
}],
"filter": [
{"term" :{"type" :"vx_apache_json"}},
{"term" :{"api" :"viv_signin.php"}},
{"term" :{"domain":"fnwp"}}
]
}}}}}
res = helpers.scan(
client = es,
scroll = '2m',
query = body,
index = es_index)
and when i trying to get the value of res its giving me
when i am using
for i in res:
print(i)
I am getting some kind of list of dictionary..
Can you help me in getting directly a pandas df with it
and when i am trying to execute what you suggested it gives me error dict has no attribute scan
and with your method i am also getting Search not defined.
@abhimanyu3 I am sorry, but I cannot help you with any details that are related to pandas or python in general. This issue is for elasticsearch_dsl library. You did get all of the data out of elasticsearch as dicts, to trasnform them into a data frame just pass them into pd.DataFrame as seen in previous examples on this very same ticket. Otherwise please seek help in a forum dedicated to pandas directly on how to transform an interator of dictionaries into a data frame.
Thank you!
Most helpful comment
should be exactly what you need. Alternatively omit the
.scan()if you only want the top 10 results (useful for testing).Hope this helps