Elasticsearch: Production/development mode should be a configurable option

Created on 18 Nov 2016  路  13Comments  路  Source: elastic/elasticsearch

Currently trying to develop app that uses tcp (:9300) transport triggers "production" mode, with all annoying checks that developer doesn't care about, or in some cases (lack of root access to machine) can't fix (like sysctl settings)

It should be configurable option and only use automated guessing when that option is not set.

Most helpful comment

We have server A with ES and dev server B, C and D with app. How does that help?

Not even to mention the fact that you are enforcing system settings that are outright stupid on small hosts

All 13 comments

I'm sorry @XANi but this has been discussed and considered carefully previously (see #20511 for example). See also our blog post.

Currently trying to develop app that uses tcp (:9300) transport triggers "production" mode, with all annoying checks that developer doesn't care about, or in some cases (lack of root access to machine) can't fix (like sysctl settings)

You can develop and test against localhost and not trip the bootstrap checks.

We have server A with ES and dev server B, C and D with app. How does that help?

Not even to mention the fact that you are enforcing system settings that are outright stupid on small hosts

We have added a new discovery type suitable for binding transport to an external interface for the purpose of testing the transport client against Elasticsearch in situations where might not have permissions to modify the configuration of the host to pass the bootstrap checks (for example, CI providers). To summarize the situation:

  • we have added a new discovery type called single-node
  • when a node uses the discovery type single-node it can not form a cluster with another node
  • when a node uses the discovery type single-node the bootstrap checks are disabled
  • for users that are running a node using discovery type single-node in production, they should enable the system property es.enable.bootstrap.checks to force the bootstrap checks to run; they are at their own risk if they do not do this

This is documented in the bootstrap checks docs.

This will be available starting with Elasticsearch 5.4.0.

Relates #23585, relates #23595, relates #23598

Hello all, my team and I have the same problem. I think enforcing system settings is a really bad idea... We are using elasticsearch for research purposes, we don't need to enforce those conditions to form a cluster to store experimental data... Now we have a 20 machine cluster with CentOS(Which we don't have root access) completely useless since ElasticSearch 5.3 does not want to pass the bootstrapping...

@mastayoda I think that you are running into the system call filter check (read the logs); then read the docs on the system call filter check and you'll understand that you can be back up and running. If I'm wrong about this, come back and I'll try to help you.

hello @jasontedor , thanks for the quick response! Sadly that does not solve my problem :/

These are my logs:
[2017-05-02 10:54:55 pm] - debug: Elasticsearch: [2017-05-02T22:54:55,465][INFO ][o.e.b.BootstrapChecks ] [pdsl14] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks [2017-05-02 10:54:55 pm] - debug: Elasticsearch: ERROR: bootstrap checks failed max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
I may increase the memory, but the file descriptor needs root access. I added this setting to the config:

bootstrap.system_call_filter: false

Okay, so here's the situation. I see:

Now we have a 20 machine cluster with CentOS(Which we don't have root access) completely useless since ElasticSearch 5.3 does not want to pass the bootstrapping...

and I think this must not be a new cluster? However, if you're running into all these checks, now I have to revisit that assumption. So, I ask you: is this a new cluster otherwise how have you not run into this before, none of these checks are new (the only new check in 5.3.0 is the system call filter check which is why I pointed you there)?

Good question, let me elaborate more:

We are building Trueno(Graph Database) in top of ElasticSearch(we are aware of all the consistency etc, we just need a powerful enough storage engine). We choose elasticsearch for it's filter capabilities and be able to query over edges and vertices as a property graph. Fast forward the story, we have gain tremendos speedup by using elasticsearch. We where using elasticsearch 2.3. We also have Spark to compute graph algorithms, which requires to load all the graph data into GraphX RDD, but elasticsearch 2.3 did not had the Sliced Scrolls capability, so we migrate to 5.3. We tested the cluster setting in local virtual machines, and indeed we got HUGE speedup by reading all the graph in Parallel using Scroll Slicing. Sadly the bootstrap checks is giving us trouble, since ES 2.3 did not had those, we were able to setup the cluster in the 20 machines without root, now we can't :(

Okay, I want to help you. Can you please open a topic on the forum and link to it from here and we will try to work through this. I want to level set some expectations though: the bootstrap checks are not going away. When you open that topic, would you please state the version of CentOS that you're on?

ES 2.3 did not had those

Right, and users lost data because they were not aware that they needed to configure things like the number of file descriptors. A 20-node cluster is a large cluster, it would be a terrible thing to lose to misconfiguration.

For more about the bootstrap checks, see our blog post on this subject.

@jasontedor totally agree! ours is a special case, the bootstrap checks are indeed needed, but may be a good idea to provide a way to bypass them with a big warning. Thanks so much for your help I will open a topic.

Right, and users lost data because they were not aware that they needed to configure things like the number of file descriptors. A 20-node cluster is a large cluster, it would be a terrible thing to lose to misconfiguration

First, losing data on standard OS setup should just not happen, that's just bad design. Working slowly or shutting down sure, but not data loss

Second, the right way is not to "just up those variables and hope for best" but to monitor current usage of system resources and alert on it. Change cluster state to yellow if it is close (say 80%), but not say "you have to do this even if your working set is 100MB and you have cluster only for redundancy"

@jasontedor this works to bypass the bootstrap checks

  elasticsearch:
    container_name: elasticsearch
#    build: ./elasticsearch
    restart: always
    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.1
    ports:
      - 9200:9200
    environment:
      - ENVIRONMENT
      - discovery.type=single-node # when a node uses the discovery type single-node it can not form a cluster with another node and the bootstrap checks are disabled

@jasontedor this DOES NOT work to bypass the bootstrap checks

  elasticsearch:
    container_name: elasticsearch
    build: ./elasticsearch
    restart: always
#    image: docker.elastic.co/elasticsearch/elasticsearch:5.6.1
    ports:
      - 9200:9200
    environment:
      - ENVIRONMENT
      - discovery.type=single-node # when a node uses the discovery type single-node it can not form a cluster with another node and the bootstrap checks are disabled

As soon as I try to add a build directory and use this Dockerfile to add on to it... I would think I'd be able to set this to false but, it only accepts true

es.enable.bootstrap.checks

Is there something i'm missing. This is my Dockerfile

FROM docker.elastic.co/elasticsearch/elasticsearch:5.6.1

ARG ENVIRONMENT

CMD ["elasticsearch"]

I want to install some plugins and a cert into the container. omitted here.

I should note the thing I'm trying to bypass is this

elasticsearch    | ERROR: [1] bootstrap checks failed
elasticsearch    | [1]: max virtual memory areas vm.max_map_count [62144] is too low, increase to at least [262144]
Was this page helpful?
0 / 5 - 0 ratings