Elasticsearch: Add safeguards to prevent simple user errors

Created on 5 Jun 2015  路  24Comments  路  Source: elastic/elasticsearch

There are a number of places where a naive user can break Elasticsearch very easily. We should add more (dynamically overridable) safeguards that prevent users from hurting themselves.

Note:

  • We are adding high limits to start so that we don't suddenly disable things that users already do today, but so that sysadmins have tools that they can use to protect their clusters. We can revisit the limits later on.
  • All these settings should be prefixed by policy. to make them easier to document together and to understand their purpose.

Accepted limits:

  • [x] #9311 Hard limit on from/size
  • [x] #12149 Global default value for search timeouts (Could be ridiculously high like an hour and it would still help)
  • [x] #17386 Disable fielddata-loading on analyzed text fields by default (Adrien)
  • [x] #17396 Limit the max number of shards to 1000 (Adrien)
  • [x] #17133 Limit the size of all in-flight requests (Daniel)
  • [x] #17357 Limit the number of fields that can be added to a mapping to 1000
  • [x] #17400 Add maximum mapping depth to 20
  • [ ] Add sane limits for thread size and queue size (Jim)
  • ~#26423 Don't allow search requests greater than (eg) 10MB (Adrien)~
  • [x] #14983 Limit the number of nested fields per index to 50 (Yannick)
  • [x] #17522 Limit window_size in rescore API (@nik9000)
  • [x] #17558 Disable script access to _source fields by default
  • [ ] #18739 Limit the number of shards that can be rerouted at the same time
  • [x] #26492 Hard limit on from/size in top hits and inner hits (much smaller than a normal query) (MVG)
  • [x] #19694 Limit script compilation rate to avoid hard coding of params in scripts
  • [ ] #20705 Max number of shards per node (enforced as total shards per cluster)
  • [ ] #20760 Limit index creation rate
  • [ ] #23268 Add upper limit for scroll expiry time (Jim)
  • [x] #26390 Add upper limit for the number of requested doc-values fields (Christoph)

For discussion:

  • [ ] #29050 Disable certain query types, eg wildcard, span etc?
  • [ ] #14046 Limit on the number of buckets returned by aggs
  • [ ] #9310 Limit the size of the response (eg for very large doc bodies)
  • [ ] Kill slow scripts when search timeout has lapsed aka while(true) should not require a rolling restart to recover from Don't run a script a second time when the first execution takes longer than 1 second
  • [ ] #6470 Disable searching on all indices by default Handled by max number of shards
  • [ ] Limit the number nested Lucene documents per document. #26962

Any other ideas?

:CorFeatureIndices APIs :CorInfrCore :SearcMapping :SearcSearch >enhancement Meta v6.0.3

Most helpful comment

I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?

All 24 comments

Limit the max number of shards

I'm wondering if we should do it per index or per cluster. If we do it per index, then we might also want to have a max number of indices per cluster.

Limit the size of a bulk request

I guess it would also apply to multi-get and multi-search.

Some of this could go into a "sanity checker"-kind of plugin akin to the migration plugin that runs a bunch of tests as well.

That one could warn when e.g. minimum master nodes looks wrong, and when the number of shards/indexes/fields looks silly / approaches the above limits.

@alexbrasetvik the requires the user to actually run the check. Often poor sysadmins are at the mercy of their users. What I'd like to do is to prevent users from blowing things up by mistake.

@clintongormley Agreed! I still think there's room for both, though such a tool should be another issue.

For example, a high number of indexes with few documents and identical mappings can be a sign that the user is doing per-user index partitioning when he shouldn't. That will turn into a problem, even if the current values are far from hitting above mentioned limits.

Any other ideas?

  • Limit the max number of indices

    • It's effectively covered by limiting by shards, but touching too many indices may indicate more of a logical issue than the shard count (e.g., with daily indices, it's much easier to realize that sending a request to 5 indices represents five days rather than 25 shards with default counts).

  • Limit the _concurrent_ request size

    • Request circuit breaker across all concurrent requests

Limit the concurrent request size

This is already available with the thread pools and queue_sizes to limit the number of requests per-node and apply backpressure.

EDIT: I guess I am taking "size" as "count", is that what you mean?

@dakrone Size of an actual request. For instance, if one request comes in with an aggregation that uses size: 0 at the same time as another, then maybe we should block the second one (or at least delay).

Another protection to add: check mapping depth #14370

Limit the max value that can be set for queue_size for our search, bulk, index, etc.. thread pools so users can't set them to unlimited, millions, etc..?

dose the size in term aggregation is considered by this issue?

@makeyang it's covered by https://github.com/elastic/elasticsearch/issues/14046, which is under discussion

is it reasonable to add max_doc_number per index?
is it reasonable to add enable_all_for_search?

is it reasonable to add max_doc_number per index?

Well, there's already a hard limit but what are you trying to achieve with this one? And what is the user supposed to do instead of indexing into the same index?

is it reasonable to add enable_all_for_search?

What problem are you trying to prevent with disabling access to _all? Why not just disable the _all field if you don't want it used?

@clintongormley

  1. some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.
  2. enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.

some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.

OK, we have a better solution for this that we're thinking about - basically an alias that will generate a new index when it reaches a specified limit (eg size, number of docs, time)

enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.

Querying all indices is not a problem per se. Rather, it is the total number of shards, which is already handled by https://github.com/elastic/elasticsearch/pull/17396

@clintongormley thanks a lot. that's all I need.
btw: when will the better solution you are mentioned above will be formed a issue?

Would you consider howto use cgroup to control resource usage of search/index/percolator... threads?

elasticsearch need to run cross linux/windows...so, maybe there is a quick way: ES only need to give all thread a threadname, for example, a search thread named search-thread-1 etc, then the linux users can get thread ids by grep threadname and then put tids into cgroup.

I'd like to put in a vote for an additional safeguard: some kind of protection on Terms queries that have hundreds or thousands of terms. I've seen many times where applications will produce Terms queries with hundreds or thousands of terms, and it craters Elasticsearch very easily. It'd be nice to have a default cap and truncate the query, like have a default terms limit (similar to default hits) that can be increased. Knowing that doing this is a problem early on can help application developers to architect their application to avoid needing terms queries that are so huge.

@jonaf i like the idea. You want to open a separate issue where we can discuss it, and we can link it this this meta issue

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

Nice idea. Similarly, for multitenant use cases that may have a ton of single sharded per-user indices, it can be nice to have a limit or warning when the # of shards per node becomes ridiculous. Not sure what this limit will be based on, perhaps a combination of # of file descriptor, cores and heap. But it will be nice to prevent users from having something like N # of shards per node, etc..

@clintongormley I think we missed one rather important aspect when it comes to soft-limits. Today the user can override those limits via dynamic properties which is ok most of the time but in the case of a cloud hosting infrastructure where the org that runs the infrastructure needs to have full control over these limits they should be able to disable the dynamic property or should disable setting these settings entirely?

Most of the work has been done, and items that have not been done have an assigned issue so I'll close this issue. Thanks everyone!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DhairyashilBhosale picture DhairyashilBhosale  路  3Comments

abrahamduran picture abrahamduran  路  3Comments

clintongormley picture clintongormley  路  3Comments

jasontedor picture jasontedor  路  3Comments

clintongormley picture clintongormley  路  3Comments