According to the docs, Terms Query lookup mechanism is supposed to be used to filter a lot of terms.
I tested it with Elasticsearch version 2.3.4 with a 2 million documents index, and filtering with around 20 thousand terms make the query quite slow (more than 500ms).
There is a related issue here: https://github.com/elastic/elasticsearch/issues/18829
If it's not prepared to filter thousands of terms properly, taking into account that in Twitter there are many users with thousands of followers, I think a Twitter example should not be used in the official docs. Maybe a warning about performance should be written there.
Is there any way to speed up this kind of queries? Any plan for improvements?
I think we should set expectations in the docs indeed. Large terms filters are problematic in general, even if the lookup mechanism is not used.
Closing this issue, as this has been addressed in https://github.com/elastic/elasticsearch/pull/27968
in the terms-query.asciidoc
a Twitter example should not be used in the official docs
@toniov Hi I wonder which example do you mean? Is it this example, or there was some more advanced ones? Thanks!
@fzyzcjy It was just a short example in the lookup mechanism documentation.
@toniov Thanks!
Hello,
According to this post, 20k terms is slow. Yet, according to https://github.com/elastic/elasticsearch/pull/27968/files#diff-4e93d5bf58e29274e79023abea854a60R133 65k terms is defensive, i.e. possibly not slow. This seems like a contradiction, or did something change between the writing of this post and a later version of Elasticsearch.
Thanks!