Hello,
System:
I am trying to a run a (messy) search like this:
from elasticsearch import Elasticsearch, client, ImproperlyConfigured
from elasticsearch_dsl import Search, Q
range_start_time = dt.datetime(month=11, day=29, hour=0, year=2017)
range_end_time = dt.datetime(month=11, day=30, hour=0, year=2017)
es_address = "https://some.address.com/"
es = Elasticsearch([es_address])
s = Search(using=es)
# define query
s = s.query('term', **{"method.keyword": "my_method"})
# filter for time range
s = s.query("range", timestamp={"from": range_start_time.strftime("%Y-%m-%dT%H:%M:%S"),
"to": range_end_time.strftime("%Y-%m-%dT%H:%M:%S")})
# filter 1
s = s.query('term', **{"field.keyword": "some_field_keyword"})
# sort
s = s.sort({"timestamp" : {"order" : "desc"}})
# execute query
response = s.scan()
all_data_1d = {}
for hit in response:
d = hit.to_dict() # convert to Python dictionary for easier access by key
try:
end_time = dt.datetime.strptime(d["timestamp"], "%Y-%m-%dT%H:%M:%S.%fZ")
time_delta = dt.timedelta(seconds = d["time_delta"])
start_time = end_time - time_delta
### Code
### for
### getting `key`.
# Note `key` needs to be generated from data within this `hit`, hence the nesting within the outer hit loop
key = some_string
try:
s_sub = Search(using=es)
s_sub = s_sub.query('term', **{"key_field.keyword": key})
s_sub = s_sub.query("range", timestamp={"from": range_start_time.strftime("%Y-%m-%dT%H:%M:%S"),
"to": range_end_time.strftime("%Y-%m-%dT%H:%M:%S")})
for hit_sub in s_sub.scan():
d_sub = hit_sub.to_dict()
other_time = dt.datetime.strptime(d_sub["timestamp"], "%Y-%m-%dT%H:%M:%S.%fZ")
# populate data
all_data_1d[d["some_id"]] = {"name" : d["method"],
"end_time" : end_time,
"start_time" : start_time,
"time_delta": time_delta,
"other_time" : other_time}
except:
print "Problem with searching for log record with es_key:", key
except:
print "Problem with extracting es_key"
I get the following error after about 5-10 mins of successful runtime:
NotFoundError Traceback (most recent call last)
<ipython-input-14-7d4eaeb60678> in <module>()
5 T0 = time.time()
6
----> 7 for hit in response:
8 d = hit.to_dict() # convert to Python dictionary for easier access by key
9 try:
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch_dsl/search.py in scan(self)
673 index=self._index,
674 doc_type=self._doc_type,
--> 675 **self._params
676 ):
677 callback = self._doc_type_map.get(hit['_type'], Hit)
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py in scan(client, query, scroll, raise_on_error, preserve_order, size, request_timeout, clear_scroll, scroll_kwargs, **kwargs)
377 resp = client.scroll(scroll_id, scroll=scroll,
378 request_timeout=request_timeout,
--> 379 **scroll_kwargs)
380
381 for hit in resp['hits']['hits']:
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/client/utils.py in _wrapped(*args, **kwargs)
71 if p in kwargs:
72 params[p] = kwargs.pop(p)
---> 73 return func(*args, params=params, **kwargs)
74 return _wrapped
75 return _wrapper
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/client/__init__.py in scroll(self, scroll_id, body, params)
1031
1032 return self.transport.perform_request('GET', '/_search/scroll',
-> 1033 params=params, body=body)
1034
1035 @query_params()
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/transport.py in perform_request(self, method, url, params, body)
310
311 try:
--> 312 status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
313
314 except TransportError as e:
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py in perform_request(self, method, url, params, body, timeout, ignore)
126 if not (200 <= response.status < 300) and response.status not in ignore:
127 self.log_request_fail(method, full_url, url, body, duration, response.status, raw_data)
--> 128 self._raise_error(response.status, raw_data)
129
130 self.log_request_success(method, full_url, url, body, response.status,
/home/andri/anaconda2/lib/python2.7/site-packages/elasticsearch/connection/base.py in _raise_error(self, status_code, raw_data)
123 logger.warning('Undecodable raw error response from server: %s', err)
124
--> 125 raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
126
127
NotFoundError: TransportError(404, u'search_phase_execution_exception', u'No search context found for id [8664053]')
Note that from empirical observation, this search is successful when I run it on a range of range_end_time - range_start_time of O(hours), not O(days).
I tried to find relevant posts online, but no luck. Any advice would be appreciated. If this is an Elasticsearch issue (not elasticsearch_dsl), please advise on any next steps you would take to further diagnose (out of the kindness of your heart :) ) and you can close the issue. If there is further useful data I can provide to help diagnose/resolve/improve the module, I can provide.
Thanks,
Andri
You are processing the data while scrolling through Elasticsearch, I would guess that your scroll simply expires.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#CO27-3
You can either do the processing after finishing the scan, or set the scroll timeout to be more suitable to the time you need to do the processing.
Set the scroll timeout like this: s = s.params(scroll='25m').
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units
This indeed sounds like a timing issue as pointed by @NirBenor, please feel free to reopen the issue with additional information if it's not the case. Thanks!
Sorry for not closing this issue earlier. I meant to come back to it after testing out the scroll setting ( s = s.params(scroll='25m') ) suggestion. After testing the change, it did solve the problem I described in this issue. Thanks for the reminder and the assist.
I hope this information on keepinh the search context alive will be helpful : https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-context
Most helpful comment
You are processing the data while scrolling through Elasticsearch, I would guess that your scroll simply expires.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#CO27-3
You can either do the processing after finishing the scan, or set the scroll timeout to be more suitable to the time you need to do the processing.
Set the scroll timeout like this:
s = s.params(scroll='25m').https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units