Cloud-on-k8s: Set default resource limits for ES/Kibana/APM

Created on 1 Aug 2019 · 12Comments · Source: elastic/cloud-on-k8s

Currently, we:

don't set any default resources for APM & Kibana (no JVM, not stateful)
set a default memory requests for Elasticsearch (2GB to match the 1GB default Elasticsearch JVM heap size) if the user does not provide resources deafults or limits in the Elasticsearch

With no default limits, the Elasticsearch pod can be evicted from a k8s host if the host is out of memory. For stateful workloads such as Elasticsearch, this can be pretty bad.

We discussed 2 approaches to solve this:

1. Set default memory requests AND default memory limits

Just enforce 2GB memory (requests and limits) for the ES pod. We already do this for requests, so that's just also adding it for limits.

2. Remove any default and encourage the user to set one

We would remove our default memory request, and ask the user to provide one. Probably changing our quickstart example from:

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.2.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
EOF

to:

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.2.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
     podTemplate:
       spec:
         containers:
         - name: elasticsearch
           env:
           - name: ES_JAVA_OPTS
             value: -Xms1g -Xmx1g
           resources:
             requests:
               memory: 2Gi
             limits:
               memory: 2Gi
EOF

discuss v1.0.0-beta1

Source

sebgl

Most helpful comment

I feel like that these defaults are mainly targeting the 'getting started experience' and will never be able to provide a production grade configuration.

Thoughts:
1) A default of 1gb heap (JVM) matches Elasticsearch's other packaging (rpm, etc)
2) A default memory limit of 2gb matches our recommendation to leave half the memory for fs cache
3) 1gb JVM heap is unsuitable for most production use cases, so users will most likely need to change this anyway.

If the failure scenarios for these defaults on k8s (memory limit 2gb) are obvious kinds of failures, such as a pod fails to schedule, etc, then I think it makes sense.

Having no memory limit by default seems dangerous and asking for a PoC to start failing unpredictably due to pod evictions, etc, especially with scenarios like #1716.

Each time I've tried to abstract things in Kubernetes, it feels like doing it wrong. Kubernetes yaml is, typically, painfully verbose, even when using templating tools like Helm. I've experimented with Jsonnet to help abstract things ("memory 2g" would set memory request+limit to 2g and JVM Heap to half that), and that helps quite a bit, I feel there's little we can do automatically in a Kubernetes resource definition.

When I look at the helm chart, there's a vast array of configurations such that abstraction isn't even really possible with all the templating values choices it supports.

While I'd love to set "cpu X memory Y" and compute the rest, I don't know if that really aligns with how users expect Kubernetes operators to behave (I have no experience there). From helm charts, I find abstractions like this are uncommon, which is frustrating for me as a user since I kind of revel in such abstractions and want to build them when they don't exist.

Focusing on abstractions, here's what I had experimented with in jsonnet:

local epsilon = import "epsilon.libjsonnet";

local name = "demo";
local resources = {
  data: { cpu: 2, heap: "4g", memory: "6Gi", disk: "100Gi" },
  master: { cpu: 2, heap: "4g", memory: "6Gi", disk: "30Gi" }
};

epsilon.values("demo", resources)

This would create a cluster with the default replica count (statefulset, replica count 1). This also included an abstraction for disk (which created a volumeClaimTemplate as appropriate).

As much as I want/need abstractions (to minimize mistakes), it's possible a Kubernetes resource is the wrong level to attempt such abstractions given there are so many different Kubernetes environments with weird, wild, and unforeseen configurations under which abstractions will fail.

My current leaning is, therefore, to set defaults (1gb JVM heap, 2GB memory request+limit, some appropriate CPU if needed, etc).

jordansissel on 19 Sep 2019

❤1 👍1

All 12 comments

I would be in favor of option 1.
Option 2 makes it more explicit to the user that these should be overridden (since users probably don't read the doc), but feels much more verbose for a quickstart.

sebgl on 1 Aug 2019

Related to #236, #815, #819, #1097 and #1029.

thbkrkr on 1 Aug 2019

🎉1

I don't think if we expand the quickstart example, we would also need to remove the default requests (or limits). Personally I think I'm okay with both.

As far as I know, the only performance downside of adding a default limit is that the pod will not be able to use >2Gi for the cache. But that's a relatively small downside and I am okay with it.

Another downside: other resources that use pod templates (e.g. deployments, ssets) just create pods and don't set defaults, so you just end up with regular pod defaults. So this is a little bit of a departure from "normal" behavior for types that create pods. That said, we know a little bit more about our use case here than those more generic types, so I think it is justified.

But just to spell out the upside of a default limit:
Having the request == limits means it will run as a Guaranteed QoS and should be in the last group of pods to be evicted in the event of resource pressure.

For expanding the quickstart example, I think starting people closer to a "Production" configuration outweighs the extra verbosity. If people want to change it in the future, they don't need to come back to our documentation to look up how to set resource requests, they can just kubectl edit elasticsearch quickstart and be done. But having a default limit also gets a lot of the same benefit, so I don't feel too strongly. Curious what other people's thoughts are.

anyasabo on 1 Aug 2019

The main concern I have with adding some default behind the scene is that it could break the user experience when there are some Limit Ranges applied in the cluster _(must confess here that maybe I'm more sensitive to that because I have used them to manage some projects in my previous shop)_

tbh I think that applying an empty limits as workaround is a little bit odd:

yaml podTemplate: spec: containers: - name: elasticsearch resources: limits: {}

I don't think that expanding the quickstart example is really that bad and I agree with @anyasabo that it's closer to a "Production" configuration, it is a "quick start" and a "good start" 😄

What should be improved imho is that the user should not have to manage the memory settings in 2 different places, Xmx could be calculated automatically, see https://github.com/elastic/cloud-on-k8s/pull/1029#pullrequestreview-247393459
It would save 3 lines :)

So I would vote for 2 but in the mean time check if the Docker image could be improved to automatically adjust the JVM settings depending on the control groups applied to the container.

barkbay on 7 Aug 2019

~We decided to go with option 2 for now.~ This decision did not stand.

pebrc on 22 Aug 2019

I'm wondering if it could be worth adding an optional extra field to the ES CRD. See memory below:

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.2.0
  nodes:
  - name:  master-nodes
    memory: 2Gi
    nodeCount: 3
  - name:  data-nodes
    memory: 32Gi
    nodeCount: 3
EOF

Behaviour:

If memory is set (>0), ECK will set resources limits and resources requests to that value, and JVM heap size to half that value.
If not set (=0), ECK ignores it and we end up in option 2 above.
podTemplate can still be used to further tweak memory settings.
If both podTemplate memory values and memory are set, one gets the priority over the other (priority to memory is probably simpler?).

Benefits:

Less verbose approach
Does not require getting into/understanding podTemplate stuff
Sets sane pod resource vs. JVM heap size ratio

Concerns:

An extra field in the CRD
Possible conflicting values with the podTemplate
Should we do the same for Kibana and APM?

Note this does not prevent to move forward with Option 2 above as a first step.

sebgl on 22 Aug 2019

👍2

Hey,

I'm kind of late to the game here, but wanted to voice my opinion on this.

While I understand we need a good solution here, I don't think I like option 2. that we've decided on, because:

It feels weird to see some random (deeply nested) options set. Our CRD offers quite a few settings, yet only these are explicitly set.
Relation between min and max memory is not explained nor obvious.
Same as above but for relation between jvm and pod memory, yet there is very clear recommendation for those. At the same time I don't think quickstart is a good place to dive into that.
I thought we strive to have sane defaults for everything we can, so users are not overloaded with complexity when it's not needed.
I'm not sure whether it should be our goal to have quickstart sample config production-hardened - ES getting started guide isn't. Also, I wouldn't expect anyone assuming it's something ready to deploy to production as is.

Maybe we should we have ECK production checklist similar to the one ES has? It's also possible that I'm over sensitive on the config looks and that it doesn't matter/is obvious for someone that is already familiar with ES :)

I really like the suggestion @sebgl made. It's clean, incorporates the logic behind recommended ratios, while still allows to be more precise/verbose using the pod template.

david-kow on 26 Aug 2019

👍1

Trying summarise the discussion and the arguments:

Two (and a half) solutions were suggested:

set the limit as well
pods will be QoS Guaranteed (assuming CPU is the same too) 👍
[I am ignoring the argument against this solution, that it necessitates the awkward {} to turn off defaults in case of limit ranger use, because that is already the case with our current implementation and documented as such]

2 a. remove all defaulting and encourage users to set resource requests and limits and JVM options themselves

using just k8s primitives no new ECK specific semantics introduced 👍
close to what a production config would look like
quickstart becomes unnecessary complex and confusing 👎

2 b. introduce a new attribute "memory" (in addition to removing the defaults)

spec:
  version: 7.2.0
  nodes:
  - name:  data-nodes
    memory: 2Gi
    nodeCount: 3

succinct and allows us to derive limits/requests and JVM settings 👍
not the k8s way to specify resources (?)
introduces a second way of setting memory resources competing with the podTemplate and making additional explanation (in the docs) and prioritisation (when choosing what to apply) necessary 👎
does not set CPU so makes it necessary to use both memory and podTemplate resources if one wants to specify CPU. Or one has to not use memory and rely on resources for both CPU and memory and set JVM options separately as well

Thoughts

I feel like that these defaults are mainly targeting the 'getting started experience' and will never be able to provide a production grade configuration.
That maybe helps understanding that the priority of fixing this is maybe not as high as one might think. Having said that, if we can give a better experience in the 'quickstart' case where we don't have to explain why nodes get evicted we should try.
Just setting the limit (solution 1.) seems in that light appealing as it simply improves on our current approach
If we want to do away with magic defaults (solution 2) then we need to look at other areas as well (eg. PDBs) which means we need maybe have a broader discussion around defaults in ECK in general

tl,dr

I would be in favor of option 1.

+1 for now (and let's maybe discuss our approach to defaults)

pebrc on 19 Sep 2019

I feel like that these defaults are mainly targeting the 'getting started experience' and will never be able to provide a production grade configuration.

If the failure scenarios for these defaults on k8s (memory limit 2gb) are obvious kinds of failures, such as a pod fails to schedule, etc, then I think it makes sense.

Having no memory limit by default seems dangerous and asking for a PoC to start failing unpredictably due to pod evictions, etc, especially with scenarios like #1716.

When I look at the helm chart, there's a vast array of configurations such that abstraction isn't even really possible with all the templating values choices it supports.

Focusing on abstractions, here's what I had experimented with in jsonnet:

local epsilon = import "epsilon.libjsonnet";

local name = "demo";
local resources = {
  data: { cpu: 2, heap: "4g", memory: "6Gi", disk: "100Gi" },
  master: { cpu: 2, heap: "4g", memory: "6Gi", disk: "30Gi" }
};

epsilon.values("demo", resources)

This would create a cluster with the default replica count (statefulset, replica count 1). This also included an abstraction for disk (which created a volumeClaimTemplate as appropriate).

My current leaning is, therefore, to set defaults (1gb JVM heap, 2GB memory request+limit, some appropriate CPU if needed, etc).

jordansissel on 19 Sep 2019

❤1 👍1

I feel like that these defaults are mainly targeting the 'getting started experience' and will never be able to provide a production grade configuration.
That maybe helps understanding that the priority of fixing this is maybe not as high as one might think. Having said that, if we can give a better experience in the 'quickstart' case where we don't have to explain why nodes get evicted we should try.

Doesn't not setting any defaults, but adding requests/limits in the quickstart accomplish the same goal but with less downside (e.g. having to read how to set it in the docs rather than starting off with an example you can just edit right there, us having to maintain defaulting code, interaction with limitranges)?

Having no memory limit by default seems dangerous and asking for a PoC to start failing unpredictably due to pod evictions, etc, especially with scenarios like #1716.

That's true, but it's also the case for any native resources you spin up too.

That said, we have knowledge of what our application requires (which appsv1/Deployments dont really) so I can see the argument for adding the defaults. Overall I think I agree with setting the 2Gi request/limit as default, but just wanted to bring up the (somewhat persuasive imo) counter.

anyasabo on 23 Sep 2019

We had some more discussions offline and the decision is

to go with the limit of 2G for Elasticsearch
users can turn defaulting off by either specifying podTemplate.spec.containers[].resources themselves or by setting it to {}
to implement similar default limits for APM and Kibana (we did not discuss limits in detail but Elastic Cloud defaults to 512M for APM server and 1G for Kibana)

pebrc on 24 Sep 2019

https://github.com/elastic/cloud-on-k8s/pull/1810 applies the default 2Gi memory limit for Elasticsearch.

However, since we don't apply any CPU requirements, the pod still ends up in QoS Burstable. Which means it will be evicted before Pods with QoS Guaranteed.
Picking a default CPU value (eg. 1000m) feels a bit more complicated, since it will effectively slow down the ES Pod compared to not setting anything.

sebgl on 27 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Custom elasticsearch configuration option should be available

Sakib37 · 3Comments

Expose the Kibana keystore

nkvoll · 4Comments

Kibana does not support rolling upgrades

pebrc · 3Comments

TestMutationLessNodes is flaky

barkbay · 4Comments

Could not update cluster license: failed to revert to basic

thbkrkr · 5Comments