What would you like to be added:
Configure etcd storage in memory to improve the performance
Why is this needed:
etcd causes a very high disk io, and this can cause performance issues, especially if there are several kind clusters running in the same system, because you end with a lot of process writing to disk causing latency and affecting the other applications using the same disk,
Since https://github.com/kubernetes-sigs/kind/pull/779 , the var filesystems was no longer running on the container filesystem, improving the performance, however, the etcd storage continues to be on the disk, as we can see in the pod manifest:
etcd-data:
Type: HostPath (bare host directory volume)
Path: /var/lib/etcd
HostPathType: DirectoryOrCreate
````
Ideally, we should have `/var/lib/etcd/`in memory, since the clusters are created to be created and destroyed and the information shouldn't be persistent.
I have doubts about the best approach:
* Should be this modified in `kind` creating a new `tmpfs` volume for etcd?
* Can this be modified in `kubeadm` so we can mount the `etcd-data` in memory or in another location of the node that's in memory?
* ...
** NOTES **
etcd io accumulated `iotop -a`
26206 be/4 root 0.00 B 192.00 K 0.00 % 1.04 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26196 be/4 root 0.00 B 224.00 K 0.00 % 0.98 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26288 be/4 root 0.00 B 216.00 K 0.00 % 0.94 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26249 be/4 root 0.00 B 180.00 K 0.00 % 0.88 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26266 be/4 root 0.00 B 52.00 K 0.00 % 0.47 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26187 be/4 root 0.00 B 52.00 K 0.00 % 0.42 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26267 be/4 root 0.00 B 48.00 K 0.00 % 0.37 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26192 be/4 root 0.00 B 60.00 K 0.00 % 0.36 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26263 be/4 root 0.00 B 52.00 K 0.00 % 0.31 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26261 be/4 root 0.00 B 64.00 K 0.00 % 0.28 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
19155 be/4 root 0.00 B 0.00 B 0.00 % 0.19 % [kworker/1:2]
26286 be/4 root 0.00 B 28.00 K 0.00 % 0.18 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
26289 be/4 root 0.00 B 32.00 K 0.00 % 0.16 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
578 be/4 root 0.00 B 2.00 M 0.00 % 0.16 % [btrfs-transacti]
26268 be/4 root 0.00 B 28.00 K 0.00 % 0.11 % etcd --advertise-client-urls=htt~e=/etc/kubernetes/pki/etcd/ca.crt
```
/cc @BenTheElder @neolit123
Can this be modified in kubeadm so we can mount the etcd-data in memory or in another location of the node that's in memory?
kubeadm passes --data-dir=/var/lib/etcd to etcd and mounts this directory using hostPath.
we can just try:
emptyDir:
medium: Memory
but this means kubeadm init / join commands need to:
1) use phases to skip / customize the "manifests" phase
or
2) deploy etcd, patch manifest, restart static pod
etcd causes a very high disk io, and this can cause performance issues, especially if there are several kind clusters running in the same system, because you end with a lot of process writing to disk causing latency and affecting the other applications using the same disk,
k/k master just moved to 3.3.15, while 1.15 uses an older version.
is this a regression? and IDLE cluster should not have high disk i/o.
if this disk i/o suddenly became a problem this should be in a k/k issue.
Etcd is going to be writing all the constantly updated objects, no? (Eg node status)
It would be trivial to test kind with memory backed etcd by adjusting node creation, but I don't think you'd ever run a real cluster not on disk... ๐ค
Etcd is going to be writing all the constantly updated objects, no? (Eg node status)
yeah, data need to persist to disk to provide consistency
It would be trivial to test kind with memory backed etcd by adjusting node creation, but I don't think you'd ever run a real cluster not on disk... ๐ค
Absolutely, real clusters must use disks, this is only meant to be used for testing, my rationale is that these k8s cluster are ephemeral, thus the etcd clusters don't need to "persist" data on disk
Can this be patched with the kind config? It will be enough with passing a different folder than --data-dir=/var/lib/etcd
You can test this more or less with no changes by making a tmpfs on the host and configuring it to mount there on a control plane.
You could also edit the kind control plane creation process to put a tmpfs here on the node
We should experiment, but I think we do eventually want durable etcd for certain classes of testing..
Also worth pointing out:
yeah, for k8s CI is not a big problem, but for users that run kind locally, it is. It took me a while to understand what was slowing down my system until I've found that my kind clusters were causing big latency in one of my disks.
I just want to test and document the differences :)
ok, here is how to run etcd using memory storage for reference
sudo mkdir /tmp/etcd
sudo mount -t tmpfs /tmp/etcd
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
extraMounts:
- containerPath: /var/lib/etcd
hostPath: /tmp/etcd
/reopen per conversation in slack https://kubernetes.slack.com/archives/CEKK1KTN2/p1570202642295000?thread_ts=1570196798.288800&cid=CEKK1KTN2
I'd like to find a way to make this easier to configure, mainly for people that want to use kind in their laptops and not in CIs, etcd writing constantly to disk directly is no adding any benefit in this particular scenario
You could also edit the kind control plane creation process to put a tmpfs here on the node
I think this will work
We should experiment, but I think we do eventually want durable etcd for certain classes of testing..
I was thinking more about this, and can't see the "durability" difference between using a folder inside the container or using a tmpfs volume for the etcd data dir, the data will be available as long as the container is alive, no?
However, etcd writing to a tmpfs volume will be a big performance improvement, at a cost of less memory available, of course
home/aojeagarcia/docker/volumes/5d2d2cab7dcb7c93b9a8a5f8591462caf4fbca5c332e663aa4628702b3d2dc50/_data/lib/etcd/member # du -sh *
1.5M snap
245M wal
However, etcd writing to a tmpfs volume will be a big performance improvement, at a cost of less memory available, of course
i'd be interested if this will prevent me from testing 3 CP setups with kind on my setup.
it doesn't have RAM for 4 CPs :)
I was thinking more about this, and can't see the "durability" difference between using a folder inside the container or using a tmpfs volume for the etcd data dir, the data will be available as long as the container is alive, no?
It's NOT a folder inside the container, it's on a volume.
When we fix kind to survive host reboots (and we will) then this will break it again.
It also will consume more RAM of course.
It's NOT a folder inside the container, it's on a volume.
I see it now :man_facepalming:
can this be causing timeouts in the CI with slow disks?
https://github.com/kubernetes-sigs/kind/issues/928#issuecomment-541964546
^^ possibly for istio, doesn't look like Kubernetes CI is seeing timeouts at this point. That's not the pattern with the broken pipe.
Even for istio, I doubt it's "because they aren't doing this" but it could be "because they are otherwise using too much IOPs for the allocated disks" IIRC they are also on GCP PD-SSD which is quite fast.
for CI I think the better pattern I want to try is to use a pool of PDs from some storage class to replace the emptyDir.
I've been mulling how we could do this and persist some of the images in a clean and sane way, but imo this is well out of scope for the kind project.
for CI I think the better pattern I want to try is to use a pool of PDs from some storage class to replace the emptyDir.
I've been mulling how we could do this and persist some of the images in a clean and sane way, but imo this is well out of scope for the kind project.
I think that this is only an issue for people using kind in their laptops or workstations, totally agree with you on the CI use case
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
did we wind up testing this in CI?
did we wind up testing this in CI?
nope, what option do you want to test in the CI, using etcd in memory?
nope, what option do you want to test in the CI, using etcd in memory?
yeah, we should see how it actually performs
nope, what option do you want to test in the CI, using etcd in memory?
yeah, we should see how it actually performs
hehe, when I was working in Midonet it used zookeeper as a source of truth, the CI started to fly once we put it in memory, IIRC etcd and zookeeper need to flush the data to guarantee the consistency that means lot of IOPS, the improvement will be the difference of IOPS between memory and disk (SSD) ... that should be considerable
theory is nice, measurements are better :-)
On Tue, Mar 10, 2020 at 9:39 AM Antonio Ojea notifications@github.com
wrote:
nope, what option do you want to test in the CI, using etcd in memory?
yeah, we should see how it actually performs
hehe, when I was working in Midonet it used zookeeper as a source of
truth, the CI started to fly once we put it in memory, IIRC etcd and
zookeeper need to flush the data to guarantee the consistency that means
lot of IOPS, the improvement will be the difference of IOPS between memory
and disk (SSD) ... that should be considerableโ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845?email_source=notifications&email_token=AAHADK4VJGYOEXHNL5HUS73RGZUK3A5CNFSM4IUPOOHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOMFXII#issuecomment-597187489,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKYO6YJC7RDZRDT5V3TRGZUK3ANCNFSM4IUPOOHA
.
looking forward to it :smile:
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/reopen
/lifecycle frozen
/assign
The goal is to do a serious benchmarking, comparing with and without etcd using memory storage to understand better the pros and cons.
The configuration to use etcd in memory, for reference,
cat <<EOF > "${ARTIFACTS}/kind-config.yaml"
# config for 1 control plane node and 2 workers (necessary for conformance)
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
metadata:
name: config
etcd:
local:
dataDir: "/tmp/lib/etcd"
EOF
@aojea: Reopened this issue.
In response to this:
/reopen
/lifecycle frozen
/assignThe goal is to do a serious benchmarking, comparing with and without etcd using memory storage to understand better the pros and cons.
The configuration to use etcd in memory, for reference,
cat <<EOF > "${ARTIFACTS}/kind-config.yaml" # config for 1 control plane node and 2 workers (necessary for conformance) kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker - role: worker kubeadmConfigPatches: - | kind: ClusterConfiguration metadata: name: config etcd: local: dataDir: "/tmp/lib/etcd" EOF
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I still think this is a bad idea and conflicts with host reboot support.
Besides losing persistence you also consume more memory, and we're already allowing swap.
In CI the CI SSD should perform fairly well, locally it depends but memory tends to be more of an issue for users than disk.
I'm just curious about the difference and want to document it, I agree that this most likely is not going to be part of KIND, but it can make a difference for users with CIs without SSD, per example.
I want to understand how much memory allocates etcd too :).
@aojea Thank you for snippet with tmpfs, now my cluster is running much smoother.
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
kubeadmConfigPatches:- |
kind: ClusterConfiguration
metadata:
name: config
etcd:
local:
dataDir: "/tmp/lib/etcd"
Is this supposed to work still? It confers no performance increase for me. None of the manifests in this issue decreased the amount of time it took kind to create a cluster. The configs were accepted without errors, but there was no change in execution time.
I'm terribly interested in helping out with this feature. I've been working on https://github.com/midcontinentcontrols/kindest to assist with microservice development. Etcd initialization is a bottleneck with the dev workflow, and persistence is unnecessary.
None of the manifests in this issue decreased the amount of time it took kind to create a cluster
and that's not the bottleneck creating a cluster, this patch is because etcd is very IO intensive, if you are using slow disks or a laptop with other apps running you will notice the difference, but the time to create a cluster does not depend on this.
persistence is unnecessary.
well, for a CI or dev environment it may not be necessary, but any production clusters needs to persist the data ๐
Persistence beyond host reboot was the most highly requested issue in the
tracker, people do use kind outside of CI ...
For performance improvements to startup the most impact will be had
improving the upstream bootstrapping / upstream component performance.
You'll be hard pressed to find a kubeadm environment starting much faster
than kind with the node image already downloaded...
Apiserver, kubeadm, kubelet etc. are all upstream and the majority of the
boot time is spent on those things coming up.
On Thu, May 28, 2020, 08:03 Antonio Ojea notifications@github.com wrote:
None of the manifests in this issue decreased the amount of time it took
kind to create a clusterand that's not the bottleneck creating a cluster, this patch is because
etcd is very IO intensive, if you are using slow disks or a laptop with
other apps running you will notice the difference, but the time to create a
cluster does not depend on this.persistence is unnecessary.
well, for a CI or dev environment it may not be necessary, but any
production clusters needs to persist the data ๐โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845#issuecomment-635406892,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKY2VNS3T5NEIZQNLOLRTZ4MRANCNFSM4IUPOOHA
.
I've tried using tmpfs for Docker's data-root (silly, yes) and there is no performance benefit to that either, so I am wondering how exactly I should go about optimizing cluster creation. I am able to confirm that my tmpfs grows by about 1.2gb while my persistent disks are untouched by the cluster creation process. While the tmpfs grows, all cores are basically idle. Sometimes I will see a relevant process (e.g. kubeadm) jump to ~1% usage.
Any ideas? Obviously setting data-root is far from ideal. At this point I'm just trying to figure out how this all should behave.
We've already tried this and taken nearly all of the obvious steps that
don't require upstream changes. Boot time is very important to us.
As I said, the upstream Kubernetes components may be optimizable. The
bootstrapping process with kubeadm is suspiciously long, but you'll have to
track down what's slow yourself, we haven't gotten to this yet.
On Fri, May 29, 2020, 07:52 Thomas Havlik notifications@github.com wrote:
I've tried using tmpfs for Docker's data-root (silly, yes) and there is no
performance benefit to that either, so I am wondering how exactly I should
go about optimizing cluster creation. I am able to confirm that my tmpfs
grows by about 1.2gb while my persistent disks are untouched by the cluster
creation process. While the tmpfs grows, all cores are basically idle.
Sometimes I will see a relevant process (e.g. kubeadm) jump to ~1% usage.Any ideas? Obviously setting data-root is far from ideal. At this point
I'm just trying to figure out how this all should behave.โ
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/845#issuecomment-636017110,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADK6TSVY7MWNMVBCEPL3RT7D3VANCNFSM4IUPOOHA
.
The bootstrapping process with kubeadm is suspiciously long
If you are not afraid of security, and if is possible in kubeadm ( I really don't know) avoid the certificate generation ... Maybe is possible to include some well known certificate
There is an unsafe "--unsafe-no-fsync" flag added in etcd to disables fsync.
Yeah, we're very interested in that once it's available in kubeadm's etcd.
Most helpful comment
ok, here is how to run etcd using memory storage for reference