There are a number of places where a naive user can break Elasticsearch very easily. We should add more (dynamically overridable) safeguards that prevent users from hurting themselves.
Note:
policy. to make them easier to document together and to understand their purpose.Accepted limits:
from/size nested fields per index to 50 (Yannick) window_size in rescore API (@nik9000)_source fields by defaultfrom/size in top hits and inner hits (much smaller than a normal query) (MVG)scroll expiry time (Jim)For discussion:
Any other ideas?
Limit the max number of shards
I'm wondering if we should do it per index or per cluster. If we do it per index, then we might also want to have a max number of indices per cluster.
Limit the size of a bulk request
I guess it would also apply to multi-get and multi-search.
Some of this could go into a "sanity checker"-kind of plugin akin to the migration plugin that runs a bunch of tests as well.
That one could warn when e.g. minimum master nodes looks wrong, and when the number of shards/indexes/fields looks silly / approaches the above limits.
@alexbrasetvik the requires the user to actually run the check. Often poor sysadmins are at the mercy of their users. What I'd like to do is to prevent users from blowing things up by mistake.
@clintongormley Agreed! I still think there's room for both, though such a tool should be another issue.
For example, a high number of indexes with few documents and identical mappings can be a sign that the user is doing per-user index partitioning when he shouldn't. That will turn into a problem, even if the current values are far from hitting above mentioned limits.
Any other ideas?
- Limit the max number of indices
- It's effectively covered by limiting by shards, but touching too many indices may indicate more of a logical issue than the shard count (e.g., with daily indices, it's much easier to realize that sending a request to 5 indices represents five days rather than 25 shards with default counts).
- Limit the _concurrent_ request size
- Request circuit breaker across all concurrent requests
Limit the concurrent request size
This is already available with the thread pools and queue_sizes to limit the number of requests per-node and apply backpressure.
EDIT: I guess I am taking "size" as "count", is that what you mean?
@dakrone Size of an actual request. For instance, if one request comes in with an aggregation that uses size: 0 at the same time as another, then maybe we should block the second one (or at least delay).
Another protection to add: check mapping depth #14370
Limit the max value that can be set for queue_size for our search, bulk, index, etc.. thread pools so users can't set them to unlimited, millions, etc..?
dose the size in term aggregation is considered by this issue?
@makeyang it's covered by https://github.com/elastic/elasticsearch/issues/14046, which is under discussion
is it reasonable to add max_doc_number per index?
is it reasonable to add enable_all_for_search?
is it reasonable to add max_doc_number per index?
Well, there's already a hard limit but what are you trying to achieve with this one? And what is the user supposed to do instead of indexing into the same index?
is it reasonable to add enable_all_for_search?
What problem are you trying to prevent with disabling access to _all? Why not just disable the _all field if you don't want it used?
@clintongormley
some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.
OK, we have a better solution for this that we're thinking about - basically an alias that will generate a new index when it reaches a specified limit (eg size, number of docs, time)
enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.
Querying all indices is not a problem per se. Rather, it is the total number of shards, which is already handled by https://github.com/elastic/elasticsearch/pull/17396
@clintongormley thanks a lot. that's all I need.
btw: when will the better solution you are mentioned above will be formed a issue?
Would you consider howto use cgroup to control resource usage of search/index/percolator... threads?
elasticsearch need to run cross linux/windows...so, maybe there is a quick way: ES only need to give all thread a threadname, for example, a search thread named search-thread-1 etc, then the linux users can get thread ids by grep threadname and then put tids into cgroup.
I'd like to put in a vote for an additional safeguard: some kind of protection on Terms queries that have hundreds or thousands of terms. I've seen many times where applications will produce Terms queries with hundreds or thousands of terms, and it craters Elasticsearch very easily. It'd be nice to have a default cap and truncate the query, like have a default terms limit (similar to default hits) that can be increased. Knowing that doing this is a problem early on can help application developers to architect their application to avoid needing terms queries that are so huge.
@jonaf i like the idea. You want to open a separate issue where we can discuss it, and we can link it this this meta issue
I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?
I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?
I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?
Nice idea. Similarly, for multitenant use cases that may have a ton of single sharded per-user indices, it can be nice to have a limit or warning when the # of shards per node becomes ridiculous. Not sure what this limit will be based on, perhaps a combination of # of file descriptor, cores and heap. But it will be nice to prevent users from having something like N # of shards per node, etc..
@clintongormley I think we missed one rather important aspect when it comes to soft-limits. Today the user can override those limits via dynamic properties which is ok most of the time but in the case of a cloud hosting infrastructure where the org that runs the infrastructure needs to have full control over these limits they should be able to disable the dynamic property or should disable setting these settings entirely?
Most of the work has been done, and items that have not been done have an assigned issue so I'll close this issue. Thanks everyone!
Most helpful comment
I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?