Cockroach: k8s: Detect memory limit

Created on 11 Mar 2019 · 5Comments · Source: cockroachdb/cockroach

Our k8s templates set --cache and --max-sql-memory to 25%, on the assumption that kubernetes sets the cgroup memory limit appropriately (and that we detect this, which is also an issue: #31750). That is apparently not true, so kubernetes deployments commonly exceed their memory limits and crash. We need to update the k8s templates to communicate the memory limit to cockroach in a way it will understand.

A-orchestration C-enhancement S-2

Source

bdarnell

👍2

All 5 comments

So we have a 12 node cluster of which nodes are OOMKilled frequently. However, this has no impact on production uptime or availability. I assume this is kind of "expected" then and ok for now? Or would I be able to reduce the amount of restarts by providing a higher limit?

christianhuening on 4 May 2019

CockroachDB can tolerate nodes being OOM killed, but it's not good for performance and it's not something that should happen under normal usage. Until we implement an automatic fix for this, you can work around it in one of two ways:

Increase the memory allocated to the cockroach containers so they are allowed to use all of the machine's memory
Adjust this line to reflect the amount of memory available. Replace both uses of 25% (in the --cache and --max-sql-memory flags) with amounts appropriate for the container's memory allocation. For example, if you're allocating 8GB of memory to the container, use --cache 2GB --max-sql-memory 2GB.

bdarnell on 4 May 2019

👍1

@timveil, since we were just talking about k8s in production, this is probably the type of thing we need to get in our docs asap. I'll work on that.

jseldess on 20 Sep 2019

👍1

Isn't this closed by now ?

https://github.com/cockroachdb/cockroach/pull/44549
https://github.com/cockroachdb/cockroach/issues/31750
https://github.com/cockroachdb/cockroach/pull/43137

mscbpi on 3 Oct 2020

🚀1

Not quite - those issues make CockroachDB understand the memory limit of the container it's running in. However, I believe there's still work to do here because our default k8s configuration templates don't set a memory limit (I think. I haven't verified this recently). This issue is about updating the k8s templates and not the database itself.

bdarnell on 5 Oct 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings