Elasticsearch: Add safeguards to prevent simple user errors

Created on 5 Jun 2015 · 24Comments · Source: elastic/elasticsearch

There are a number of places where a naive user can break Elasticsearch very easily. We should add more (dynamically overridable) safeguards that prevent users from hurting themselves.

Note:

We are adding high limits to start so that we don't suddenly disable things that users already do today, but so that sysadmins have tools that they can use to protect their clusters. We can revisit the limits later on.
All these settings should be prefixed by policy. to make them easier to document together and to understand their purpose.

Accepted limits:

[x] #9311 Hard limit on from/size
[x] #12149 Global default value for search timeouts (Could be ridiculously high like an hour and it would still help)
[x] #17386 Disable fielddata-loading on analyzed text fields by default (Adrien)
[x] #17396 Limit the max number of shards to 1000 (Adrien)
[x] #17133 Limit the size of all in-flight requests (Daniel)
[x] #17357 Limit the number of fields that can be added to a mapping to 1000
[x] #17400 Add maximum mapping depth to 20
[ ] Add sane limits for thread size and queue size (Jim)
~#26423 Don't allow search requests greater than (eg) 10MB (Adrien)~
[x] #14983 Limit the number of nested fields per index to 50 (Yannick)
[x] #17522 Limit window_size in rescore API (@nik9000)
[x] #17558 Disable script access to _source fields by default
[ ] #18739 Limit the number of shards that can be rerouted at the same time
[x] #26492 Hard limit on from/size in top hits and inner hits (much smaller than a normal query) (MVG)
[x] #19694 Limit script compilation rate to avoid hard coding of params in scripts
[ ] #20705 Max number of shards per node (enforced as total shards per cluster)
[ ] #20760 Limit index creation rate
[ ] #23268 Add upper limit for scroll expiry time (Jim)
[x] #26390 Add upper limit for the number of requested doc-values fields (Christoph)

For discussion:

[ ] #29050 Disable certain query types, eg wildcard, span etc?
[ ] #14046 Limit on the number of buckets returned by aggs
[ ] #9310 Limit the size of the response (eg for very large doc bodies)
[ ] ~~Kill slow scripts when search timeout has lapsed aka while(true) should not require a rolling restart to recover from~~ Don't run a script a second time when the first execution takes longer than 1 second
[ ] ~~#6470 Disable searching on all indices by default~~ Handled by max number of shards
[ ] Limit the number nested Lucene documents per document. #26962

Any other ideas?

:CorFeatureIndices APIs :CorInfrCore :SearcMapping :SearcSearch >enhancement Meta v6.0.3

Source

clintongormley

👍1

Most helpful comment

I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?

s1monw on 5 Aug 2016

👍3

All 24 comments

Limit the max number of shards

I'm wondering if we should do it per index or per cluster. If we do it per index, then we might also want to have a max number of indices per cluster.

Limit the size of a bulk request

I guess it would also apply to multi-get and multi-search.

jpountz on 5 Jun 2015

Some of this could go into a "sanity checker"-kind of plugin akin to the migration plugin that runs a bunch of tests as well.

That one could warn when e.g. minimum master nodes looks wrong, and when the number of shards/indexes/fields looks silly / approaches the above limits.

alexbrasetvik on 5 Jun 2015

@alexbrasetvik the requires the user to actually run the check. Often poor sysadmins are at the mercy of their users. What I'd like to do is to prevent users from blowing things up by mistake.

clintongormley on 5 Jun 2015

@clintongormley Agreed! I still think there's room for both, though such a tool should be another issue.

For example, a high number of indexes with few documents and identical mappings can be a sign that the user is doing per-user index partitioning when he shouldn't. That will turn into a problem, even if the current values are far from hitting above mentioned limits.

alexbrasetvik on 7 Jun 2015

Any other ideas?

Limit the max number of indices

It's effectively covered by limiting by shards, but touching too many indices may indicate more of a logical issue than the shard count (e.g., with daily indices, it's much easier to realize that sending a request to 5 indices represents five days rather than 25 shards with default counts).

Limit the _concurrent_ request size

Request circuit breaker across all concurrent requests

pickypg on 15 Jul 2015

Limit the concurrent request size

This is already available with the thread pools and queue_sizes to limit the number of requests per-node and apply backpressure.

EDIT: I guess I am taking "size" as "count", is that what you mean?

dakrone on 16 Jul 2015

@dakrone Size of an actual request. For instance, if one request comes in with an aggregation that uses size: 0 at the same time as another, then maybe we should block the second one (or at least delay).

pickypg on 17 Jul 2015

Another protection to add: check mapping depth #14370

jpountz on 30 Oct 2015

Limit the max value that can be set for queue_size for our search, bulk, index, etc.. thread pools so users can't set them to unlimited, millions, etc..?

ppf2 on 2 Nov 2015

dose the size in term aggregation is considered by this issue?

makeyang on 11 May 2016

@makeyang it's covered by https://github.com/elastic/elasticsearch/issues/14046, which is under discussion

clintongormley on 11 May 2016

is it reasonable to add max_doc_number per index?
is it reasonable to add enable_all_for_search?

makeyang on 13 May 2016

is it reasonable to add max_doc_number per index?

Well, there's already a hard limit but what are you trying to achieve with this one? And what is the user supposed to do instead of indexing into the same index?

is it reasonable to add enable_all_for_search?

What problem are you trying to prevent with disabling access to _all? Why not just disable the _all field if you don't want it used?

clintongormley on 13 May 2016

@clintongormley

some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.
enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.

makeyang on 16 May 2016

some of users even put daily rolling log data into one index. so with max_doc_number parameter, I actually want to force users to think about put data into multi indices.

OK, we have a better solution for this that we're thinking about - basically an alias that will generate a new index when it reaches a specified limit (eg size, number of docs, time)

enable_all_for_search is not about _all field, it is 'http://localhost:9200/_all/_query?q=tag:wow', when i put one cluster for multi users, I really don't want users to search _all indices.

Querying all indices is not a problem per se. Rather, it is the total number of shards, which is already handled by https://github.com/elastic/elasticsearch/pull/17396

clintongormley on 16 May 2016

@clintongormley thanks a lot. that's all I need.
btw: when will the better solution you are mentioned above will be formed a issue?

makeyang on 16 May 2016

Would you consider howto use cgroup to control resource usage of search/index/percolator... threads?

elasticsearch need to run cross linux/windows...so, maybe there is a quick way: ES only need to give all thread a threadname, for example, a search thread named search-thread-1 etc, then the linux users can get thread ids by grep threadname and then put tids into cgroup.

chenryn on 17 May 2016

I'd like to put in a vote for an additional safeguard: some kind of protection on Terms queries that have hundreds or thousands of terms. I've seen many times where applications will produce Terms queries with hundreds or thousands of terms, and it craters Elasticsearch very easily. It'd be nice to have a default cap and truncate the query, like have a default terms limit (similar to default hits) that can be increased. Knowing that doing this is a problem early on can help application developers to architect their application to avoid needing terms queries that are so huge.

jonaf on 6 Jun 2016

@jonaf i like the idea. You want to open a separate issue where we can discuss it, and we can link it this this meta issue

clintongormley on 7 Jun 2016

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

s1monw on 5 Aug 2016

👍2

I also wonder if we should hard limit it and follow moors law and increase it every N years? :) lets start with 256 and force multi index?

s1monw on 5 Aug 2016

👍3

I think we should also limit the number of shards in an index. If somebody creates an index with 10k shards the node might go nuts immediately. I think we should limit this to 32 or maybe 128?

Nice idea. Similarly, for multitenant use cases that may have a ton of single sharded per-user indices, it can be nice to have a limit or warning when the # of shards per node becomes ridiculous. Not sure what this limit will be based on, perhaps a combination of # of file descriptor, cores and heap. But it will be nice to prevent users from having something like N # of shards per node, etc..

ppf2 on 5 Aug 2016

@clintongormley I think we missed one rather important aspect when it comes to soft-limits. Today the user can override those limits via dynamic properties which is ok most of the time but in the case of a cloud hosting infrastructure where the org that runs the infrastructure needs to have full control over these limits they should be able to disable the dynamic property or should disable setting these settings entirely?

s1monw on 28 Mar 2017

👍1

Most of the work has been done, and items that have not been done have an assigned issue so I'll close this issue. Thanks everyone!

jpountz on 14 Mar 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to append data to exist data in array?

Praveen82 · 3Comments

Check available disk space before starting a build

dadoonet · 3Comments

Bad geopoint field should throw error

clintongormley · 3Comments

More Lucene suggesters

clintongormley · 3Comments

The max_clause_count setting is not documented

dawi · 3Comments