Kind: Use memory storage for etcd

Created on 7 Sep 2019 · 40Comments · Source: kubernetes-sigs/kind

What would you like to be added:

Configure etcd storage in memory to improve the performance

Why is this needed:

etcd causes a very high disk io, and this can cause performance issues, especially if there are several kind clusters running in the same system, because you end with a lot of process writing to disk causing latency and affecting the other applications using the same disk,

Since https://github.com/kubernetes-sigs/kind/pull/779 , the var filesystems was no longer running on the container filesystem, improving the performance, however, the etcd storage continues to be on the disk, as we can see in the pod manifest:

  etcd-data:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/etcd
    HostPathType:  DirectoryOrCreate
````

Ideally, we should have `/var/lib/etcd/`in memory, since the clusters are created to be created and destroyed and the information shouldn't be persistent.

I have doubts about the best approach:
* Should be this modified in `kind` creating a new `tmpfs` volume for etcd?
* Can this be modified in `kubeadm` so we can mount the `etcd-data` in memory or in another location of the node that's in memory?
* ...


** NOTES **

etcd io accumulated `iotop -a`

26206 be/4 root 0.00 B 192.00 K 0.00 % 1.04 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26196 be/4 root 0.00 B 224.00 K 0.00 % 0.98 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26288 be/4 root 0.00 B 216.00 K 0.00 % 0.94 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26249 be/4 root 0.00 B 180.00 K 0.00 % 0.88 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26266 be/4 root 0.00 B 52.00 K 0.00 % 0.47 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26187 be/4 root 0.00 B 52.00 K 0.00 % 0.42 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26267 be/4 root 0.00 B 48.00 K 0.00 % 0.37 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26192 be/4 root 0.00 B 60.00 K 0.00 % 0.36 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26263 be/4 root 0.00 B 52.00 K 0.00 % 0.31 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26261 be/4 root 0.00 B 64.00 K 0.00 % 0.28 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
19155 be/4 root 0.00 B 0.00 B 0.00 % 0.19 % [kworker/1:2]
26286 be/4 root 0.00 B 28.00 K 0.00 % 0.18 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26289 be/4 root 0.00 B 32.00 K 0.00 % 0.16 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
578 be/4 root 0.00 B 2.00 M 0.00 % 0.16 % [btrfs-transacti]
26268 be/4 root 0.00 B 28.00 K 0.00 % 0.11 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
```

kindesign kinfeature lifecyclfrozen prioritbacklog

Source

aojea

👍1

Most helpful comment

ok, here is how to run etcd using memory storage for reference

Create the memory storaga

sudo mkdir /tmp/etcd
sudo mount -t tmpfs  /tmp/etcd

Mount it on the control nodes

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
  extraMounts:
  - containerPath: /var/lib/etcd
    hostPath: /tmp/etcd

aojea on 8 Sep 2019

👍5

All 40 comments

/cc @BenTheElder @neolit123

aojea on 7 Sep 2019

Can this be modified in kubeadm so we can mount the etcd-data in memory or in another location of the node that's in memory?

kubeadm passes --data-dir=/var/lib/etcd to etcd and mounts this directory using hostPath.
we can just try:

        emptyDir:
          medium: Memory

but this means kubeadm init / join commands need to:
1) use phases to skip / customize the "manifests" phase
or
2) deploy etcd, patch manifest, restart static pod

etcd causes a very high disk io, and this can cause performance issues, especially if there are several kind clusters running in the same system, because you end with a lot of process writing to disk causing latency and affecting the other applications using the same disk,

k/k master just moved to 3.3.15, while 1.15 uses an older version.
is this a regression? and IDLE cluster should not have high disk i/o.

if this disk i/o suddenly became a problem this should be in a k/k issue.

neolit123 on 7 Sep 2019

Etcd is going to be writing all the constantly updated objects, no? (Eg node status)

It would be trivial to test kind with memory backed etcd by adjusting node creation, but I don't think you'd ever run a real cluster not on disk... 🤔

BenTheElder on 7 Sep 2019

Etcd is going to be writing all the constantly updated objects, no? (Eg node status)

yeah, data need to persist to disk to provide consistency

It would be trivial to test kind with memory backed etcd by adjusting node creation, but I don't think you'd ever run a real cluster not on disk... 🤔

Absolutely, real clusters must use disks, this is only meant to be used for testing, my rationale is that these k8s cluster are ephemeral, thus the etcd clusters don't need to "persist" data on disk

Can this be patched with the kind config? It will be enough with passing a different folder than --data-dir=/var/lib/etcd

aojea on 7 Sep 2019

You can test this more or less with no changes by making a tmpfs on the host and configuring it to mount there on a control plane.

You could also edit the kind control plane creation process to put a tmpfs here on the node

We should experiment, but I think we do eventually want durable etcd for certain classes of testing..

BenTheElder on 7 Sep 2019

👍1

Also worth pointing out:

our CI is backed by SSD
I'm not aware of any other cluster implementation not backing etcd with disk, including eg hack/local-up-cluster

BenTheElder on 8 Sep 2019

yeah, for k8s CI is not a big problem, but for users that run kind locally, it is. It took me a while to understand what was slowing down my system until I've found that my kind clusters were causing big latency in one of my disks.
I just want to test and document the differences :)

aojea on 8 Sep 2019

👍1

ok, here is how to run etcd using memory storage for reference

Create the memory storaga

sudo mkdir /tmp/etcd
sudo mount -t tmpfs  /tmp/etcd

Mount it on the control nodes

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
  extraMounts:
  - containerPath: /var/lib/etcd
    hostPath: /tmp/etcd

aojea on 8 Sep 2019

👍5

/reopen per conversation in slack https://kubernetes.slack.com/archives/CEKK1KTN2/p1570202642295000?thread_ts=1570196798.288800&cid=CEKK1KTN2

I'd like to find a way to make this easier to configure, mainly for people that want to use kind in their laptops and not in CIs, etcd writing constantly to disk directly is no adding any benefit in this particular scenario

aojea on 4 Oct 2019

You could also edit the kind control plane creation process to put a tmpfs here on the node

I think this will work

We should experiment, but I think we do eventually want durable etcd for certain classes of testing..

I was thinking more about this, and can't see the "durability" difference between using a folder inside the container or using a tmpfs volume for the etcd data dir, the data will be available as long as the container is alive, no?

However, etcd writing to a tmpfs volume will be a big performance improvement, at a cost of less memory available, of course

home/aojeagarcia/docker/volumes/5d2d2cab7dcb7c93b9a8a5f8591462caf4fbca5c332e663aa4628702b3d2dc50/_data/lib/etcd/member # du -sh *
1.5M    snap
245M    wal

aojea on 7 Oct 2019

However, etcd writing to a tmpfs volume will be a big performance improvement, at a cost of less memory available, of course

i'd be interested if this will prevent me from testing 3 CP setups with kind on my setup.
it doesn't have RAM for 4 CPs :)

neolit123 on 7 Oct 2019

😄1

I was thinking more about this, and can't see the "durability" difference between using a folder inside the container or using a tmpfs volume for the etcd data dir, the data will be available as long as the container is alive, no?

It's NOT a folder inside the container, it's on a volume.

When we fix kind to survive host reboots (and we will) then this will break it again.

It also will consume more RAM of course.

BenTheElder on 7 Oct 2019

👍1

It's NOT a folder inside the container, it's on a volume.

https://github.com/kubernetes-sigs/kind/blob/master/pkg/internal/cluster/providers/docker/provision.go#L164-L169

I see it now :man_facepalming:

aojea on 7 Oct 2019

can this be causing timeouts in the CI with slow disks?

aojea on 15 Oct 2019

https://github.com/kubernetes-sigs/kind/issues/928#issuecomment-541964546

^^ possibly for istio, doesn't look like Kubernetes CI is seeing timeouts at this point. That's not the pattern with the broken pipe.

Even for istio, I doubt it's "because they aren't doing this" but it could be "because they are otherwise using too much IOPs for the allocated disks" IIRC they are also on GCP PD-SSD which is quite fast.

BenTheElder on 15 Oct 2019

for CI I think the better pattern I want to try is to use a pool of PDs from some storage class to replace the emptyDir.

I've been mulling how we could do this and persist some of the images in a clean and sane way, but imo this is well out of scope for the kind project.

BenTheElder on 6 Nov 2019

for CI I think the better pattern I want to try is to use a pool of PDs from some storage class to replace the emptyDir.

I've been mulling how we could do this and persist some of the images in a clean and sane way, but imo this is well out of scope for the kind project.

I think that this is only an issue for people using kind in their laptops or workstations, totally agree with you on the CI use case

aojea on 6 Nov 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Feb 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 5 Mar 2020

did we wind up testing this in CI?

BenTheElder on 5 Mar 2020

did we wind up testing this in CI?

nope, what option do you want to test in the CI, using etcd in memory?

aojea on 5 Mar 2020

nope, what option do you want to test in the CI, using etcd in memory?

yeah, we should see how it actually performs

BenTheElder on 10 Mar 2020

nope, what option do you want to test in the CI, using etcd in memory?

yeah, we should see how it actually performs

hehe, when I was working in Midonet it used zookeeper as a source of truth, the CI started to fly once we put it in memory, IIRC etcd and zookeeper need to flush the data to guarantee the consistency that means lot of IOPS, the improvement will be the difference of IOPS between memory and disk (SSD) ... that should be considerable

aojea on 10 Mar 2020

theory is nice, measurements are better :-)

On Tue, Mar 10, 2020 at 9:39 AM Antonio Ojea notifications@github.com
wrote:

nope, what option do you want to test in the CI, using etcd in memory?

yeah, we should see how it actually performs

hehe, when I was working in Midonet it used zookeeper as a source of
truth, the CI started to fly once we put it in memory, IIRC etcd and
zookeeper need to flush the data to guarantee the consistency that means
lot of IOPS, the improvement will be the difference of IOPS between memory
and disk (SSD) ... that should be considerable

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845?email_source=notifications&email_token=AAHADK4VJGYOEXHNL5HUS73RGZUK3A5CNFSM4IUPOOHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOMFXII#issuecomment-597187489,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKYO6YJC7RDZRDT5V3TRGZUK3ANCNFSM4IUPOOHA
.

BenTheElder on 10 Mar 2020

looking forward to it :smile:

aojea on 10 Mar 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 9 Apr 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 9 Apr 2020

/reopen
/lifecycle frozen
/assign

The goal is to do a serious benchmarking, comparing with and without etcd using memory storage to understand better the pros and cons.

The configuration to use etcd in memory, for reference,

 cat <<EOF > "${ARTIFACTS}/kind-config.yaml"
# config for 1 control plane node and 2 workers (necessary for conformance)
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:
- |
  kind: ClusterConfiguration
  metadata:
    name: config
  etcd:
    local:
      dataDir: "/tmp/lib/etcd"
EOF

aojea on 19 Apr 2020

👍1

@aojea: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen
/assign

The goal is to do a serious benchmarking, comparing with and without etcd using memory storage to understand better the pros and cons.

The configuration to use etcd in memory, for reference,
cat <<EOF > "${ARTIFACTS}/kind-config.yaml"
# config for 1 control plane node and 2 workers (necessary for conformance)
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:
- |
 kind: ClusterConfiguration
 metadata:
   name: config
 etcd:
   local:
     dataDir: "/tmp/lib/etcd"
EOF

k8s-ci-robot on 19 Apr 2020

I still think this is a bad idea and conflicts with host reboot support.

Besides losing persistence you also consume more memory, and we're already allowing swap.

In CI the CI SSD should perform fairly well, locally it depends but memory tends to be more of an issue for users than disk.

BenTheElder on 19 Apr 2020

I'm just curious about the difference and want to document it, I agree that this most likely is not going to be part of KIND, but it can make a difference for users with CIs without SSD, per example.

I want to understand how much memory allocates etcd too :).

aojea on 19 Apr 2020

@aojea Thank you for snippet with tmpfs, now my cluster is running much smoother.

koxu1996 on 26 May 2020

❤1

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

role: control-plane

role: worker

role: worker
kubeadmConfigPatches:

|
kind: ClusterConfiguration
metadata:
name: config
etcd:
local:
dataDir: "/tmp/lib/etcd"

Is this supposed to work still? It confers no performance increase for me. None of the manifests in this issue decreased the amount of time it took kind to create a cluster. The configs were accepted without errors, but there was no change in execution time.

I'm terribly interested in helping out with this feature. I've been working on https://github.com/midcontinentcontrols/kindest to assist with microservice development. Etcd initialization is a bottleneck with the dev workflow, and persistence is unnecessary.

thavlik on 28 May 2020

None of the manifests in this issue decreased the amount of time it took kind to create a cluster

and that's not the bottleneck creating a cluster, this patch is because etcd is very IO intensive, if you are using slow disks or a laptop with other apps running you will notice the difference, but the time to create a cluster does not depend on this.

persistence is unnecessary.

well, for a CI or dev environment it may not be necessary, but any production clusters needs to persist the data 😅

aojea on 28 May 2020

Persistence beyond host reboot was the most highly requested issue in the
tracker, people do use kind outside of CI ...

For performance improvements to startup the most impact will be had
improving the upstream bootstrapping / upstream component performance.
You'll be hard pressed to find a kubeadm environment starting much faster
than kind with the node image already downloaded...

Apiserver, kubeadm, kubelet etc. are all upstream and the majority of the
boot time is spent on those things coming up.

On Thu, May 28, 2020, 08:03 Antonio Ojea notifications@github.com wrote:

None of the manifests in this issue decreased the amount of time it took
kind to create a cluster

and that's not the bottleneck creating a cluster, this patch is because
etcd is very IO intensive, if you are using slow disks or a laptop with
other apps running you will notice the difference, but the time to create a
cluster does not depend on this.

persistence is unnecessary.

well, for a CI or dev environment it may not be necessary, but any
production clusters needs to persist the data 😅

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845#issuecomment-635406892,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKY2VNS3T5NEIZQNLOLRTZ4MRANCNFSM4IUPOOHA
.

BenTheElder on 28 May 2020

I've tried using tmpfs for Docker's data-root (silly, yes) and there is no performance benefit to that either, so I am wondering how exactly I should go about optimizing cluster creation. I am able to confirm that my tmpfs grows by about 1.2gb while my persistent disks are untouched by the cluster creation process. While the tmpfs grows, all cores are basically idle. Sometimes I will see a relevant process (e.g. kubeadm) jump to ~1% usage.

Any ideas? Obviously setting data-root is far from ideal. At this point I'm just trying to figure out how this all should behave.

thavlik on 29 May 2020

We've already tried this and taken nearly all of the obvious steps that
don't require upstream changes. Boot time is very important to us.

As I said, the upstream Kubernetes components may be optimizable. The
bootstrapping process with kubeadm is suspiciously long, but you'll have to
track down what's slow yourself, we haven't gotten to this yet.

On Fri, May 29, 2020, 07:52 Thomas Havlik notifications@github.com wrote:

I've tried using tmpfs for Docker's data-root (silly, yes) and there is no
performance benefit to that either, so I am wondering how exactly I should
go about optimizing cluster creation. I am able to confirm that my tmpfs
grows by about 1.2gb while my persistent disks are untouched by the cluster
creation process. While the tmpfs grows, all cores are basically idle.
Sometimes I will see a relevant process (e.g. kubeadm) jump to ~1% usage.

Any ideas? Obviously setting data-root is far from ideal. At this point
I'm just trying to figure out how this all should behave.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845#issuecomment-636017110,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK6TSVY7MWNMVBCEPL3RT7D3VANCNFSM4IUPOOHA
.

BenTheElder on 29 May 2020

👍1

The bootstrapping process with kubeadm is suspiciously long

If you are not afraid of security, and if is possible in kubeadm ( I really don't know) avoid the certificate generation ... Maybe is possible to include some well known certificate

aojea on 29 May 2020

There is an unsafe "--unsafe-no-fsync" flag added in etcd to disables fsync.

FYI: https://github.com/etcd-io/etcd/pull/11946