Hitting /_search causes elasticsearch to search all types, all indices.
With very large indices this is a problem.
We currently have ~30Tb of data across multiple logstash indices.
If someone talks to ES (eg, using sense or a other method) and doesn't specify the index name, the cluster effectively grinds to a half until we restart it.
Can we have an option to require specifying at least one index name to the search endpoint?
Thanks!
The setting 'allow_no_indices' requires at least one available index to be specified:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/multi-index.html
The above comment doesn't make sense. The index expansion should be disabled in this your case. However there're only two values for 'expand_wildcards', either 'open' for open indices and 'closed' for closed indices. If you don't have closed indices setting this to 'close' will make specifying no index (and * and _all as index) not execute the search request. I think a third option 'none' should be added, which disables index expansion.
Doesn't 'allow_no_indices' have to be set in the query?
We want a servers side setting to prevent this problem where people don't
use 'allow_no_indices' with their query.
This can too easily happen by accident right now.
On Jun 12, 2014 5:08 AM, "Martijn van Groningen" [email protected]
wrote:
The setting 'allow_no_indices' requires at least one available index to be
specified:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/multi-index.html
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/issues/6470#issuecomment-45845945
.
Yes, the 'allow_no_indices' and other multi index settings have to be set in the search request.
Right, that's exactly the problem :-)
Here's an analogy to the problem I face, where I have many people who will
want to query Elasticsearch directly:
Imagine if the rm command, when run with no arguments, defaulted to
defaulted to wiping your hard drive.
I can tell people to always use arguments, I can remind them constantly,
but it only takes one person one time to run rm without any arguments,
expecting everything to be safe.. Oops.
Without server-side enforcement here to protect the cluster, we have to
assume that everyone know and will always be able to use the right
arguments, which isn't possible.
On Thu, Jun 12, 2014 at 12:57 PM, Martijn van Groningen <
[email protected]> wrote:
Yes, the 'allow_no_indices' and other multi index settings have to be set
in the search request.
Reply to this email directly or view it on GitHub
https://github.com/elasticsearch/elasticsearch/issues/6470#issuecomment-45918514
.
Right, I understand :-) a server side setting disallowing a search without index or type specified makes sense.
I'm also looking to enforce well formed queries by disabling _all and wildcard expansion on the index name. I've disabled creation of _all on the index level, but over-eager queries are still able to search every index (when they should be restricted by date range and specify all relevant indices). Any update on this one?
I'm on the fence about this one. It seems simple to add, but then everybody has a different policy that they want to enforce, so a tiny feature becomes complex. This is already easily solved by putting a proxy like nginx in front of ES and specifying the exact rules that you want to implement. I'm not sure that we should go down this road...
It seems simple to add, but then everybody has a different policy that they want to enforce, so a tiny feature becomes complex.
I'm not sure the slippery slope argument applies here - its probably a bad idea to query _all_ indexes in a decently large cluster and if that is simple to make an option then its probably worth it. Anyone with any more complex set of requirements can break out nginx.
OTOH if it isn't a simple option to implement just telling people to break out nginx if they want any filtering isn't that bad. Its not like Elasticsearch _needs_ to be the one to enforce this. It'd just be convenient if it did.
I would agree with Nik.
This is a pain point currently, and even with nginx in place, it is _easy_
to directly access elasticsearch (eg on localhost) and accidentally query
all indices.
Note that I'm not talking about malicious actors, but actual accidents and
those are otherwise hard to protect. It would be most appropriate I think
for this single toggle to be in elasticsearch.
On Thu, May 28, 2015, 12:26 Nik Everett [email protected] wrote:
It seems simple to add, but then everybody has a different policy that
they want to enforce, so a tiny feature becomes complex.I'm not sure the slippery slope argument applies here - its probably a bad
idea to query _all_ indexes in a decently large cluster and if that is
simple to make an option then its probably worth it. Anyone with any more
complex set of requirements can break out nginx.OTOH if it isn't a simple option to implement just telling people to break
out nginx if they want any filtering isn't that bad. Its not like
Elasticsearch _needs_ to be the one to enforce this. It'd just be
convenient if it did.—
Reply to this email directly or view it on GitHub
https://github.com/elastic/elasticsearch/issues/6470#issuecomment-106572409
.
I find it sickening that it's not possible to limit ES to _only_ being reachable through nginx (or something else which can control access in complex ways), even on localhost, but that being the case I have to say I find the answer "use nginx for that", even for something as simple as preventing _all indices being queried, to be consistent and therefore satisfactory.
Side note: if there is a way to make ES require a specific header (/token/anything), to make it more difficult or cumbersome to access without eg nginx in the middle, I'd love to hear about it.
Could it be considered superseded by the soft limit on the number of queried shards at once? #17396
@jpountz i think the soft limit does the job better than disabling querying on all indices - it get to the root of the problem
Hello,
I have the following case for disabling querying on all indices. I have a multi-tenant ElasticSearch cluster, where tenant1 data is in "index1" and tenant2 data is in "index2". There are two instances of client applications for tenant1 and tenant2 respectively.
It is strictly forbidden to mix search results for these tenants, i.e. if people of tenant1 see some data that doesn't belong them and find out it is some other tenant data they will go crazy about that.
So right now all search queries are written so they specify the exact index the need to search against. But it happened a couple of times that someone forgot to specify index and some queries would return mixed data. Luckily this was caught early enough.
So I'm thinking having some switch that just prevents searching against all indices makes a lot of sense due to the above scenario.
P.S. If there is another way of tenant bulletproof isolation within a cluster I would really like to know it. Thanks!
I currently have the same case as @aides, and would love to know how to go about this in Y2020.
Hi @takwas, thanks very much for your interest in Elasticsearch.
You appear to be asking a user question, and we'd like to direct these kinds of things to the Elasticsearch forum.
It is not feasible for us to manage these sorts of questions on GitHub - particular when they are posted on long-closed issues.
Most helpful comment
Hello,
I have the following case for disabling querying on all indices. I have a multi-tenant ElasticSearch cluster, where tenant1 data is in "index1" and tenant2 data is in "index2". There are two instances of client applications for tenant1 and tenant2 respectively.
It is strictly forbidden to mix search results for these tenants, i.e. if people of tenant1 see some data that doesn't belong them and find out it is some other tenant data they will go crazy about that.
So right now all search queries are written so they specify the exact index the need to search against. But it happened a couple of times that someone forgot to specify index and some queries would return mixed data. Luckily this was caught early enough.
So I'm thinking having some switch that just prevents searching against all indices makes a lot of sense due to the above scenario.
P.S. If there is another way of tenant bulletproof isolation within a cluster I would really like to know it. Thanks!