I found that etcd's VIRT is really high and then I tried pmap'ing to understanding what is happening. Here is what I found:
Address Kbytes RSS Dirty Mode Mapping
0000000000010000 8256 7104 0 r-x-- etcd
0000000000820000 7808 6528 0 r---- etcd
0000000000fc0000 384 384 256 rw--- etcd
0000000001020000 192 192 192 rw--- [ anon ]
000000c000000000 192 192 192 rw--- [ anon ]
000000c41fa10000 200256 151296 151296 rw--- [ anon ]
00003ffd31d00000 9664 4032 4032 rw--- [ anon ]
00003ffd32670000 10485760 44736 0 r--s- db
00003fffb2670000 2624 2496 2496 rw--- [ anon ]
00003fffb2900000 128 64 0 r-x-- [ anon ]
00003fffc4dd0000 192 64 64 rw--- [ stack ]
---------------- ------- ------- -------
total kB 10715456 217088 158528
top command:
# top | grep etcd
28538 root 20 0 10.076g 54756 13608 S 1.3 1.4 2:33.84 etcd
Can someone help me understand what is that ~10Gig block which is holding most of the space?
Steps to reproduce:
Install the kubernetes via kubeadm on any ubuntu vm following: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
/cc @gyuho @xiang90
I also have same issue in my k8s cluster, I was using k8s v1.11.1
root@gyliu-c11:~/cluster# kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1+icp-ee", GitCommit:"5803c3b1f9422c43a963e0610b3a4cad565e127e", GitTreeState:"clean", BuildDate:"2018-09-04T09:29:02Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1+icp-ee", GitCommit:"5803c3b1f9422c43a963e0610b3a4cad565e127e", GitTreeState:"clean", BuildDate:"2018-09-04T09:29:02Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
root@gyliu-c11:~/cluster# kubectl get nodes
NAME STATUS ROLES AGE VERSION
172.16.250.138 Ready etcd,management,master,proxy 20d v1.11.1+icp-ee
172.16.250.140 Ready worker 20d v1.11.1+icp-ee
md5-9751470214831296ccec38c7a53d92a2
root@gyliu-c11:~/cluster# docker images | grep etcd
mycluster.icp:8500/ibmcom/etcd-amd64 v3.2.18 e21fb69683f3 6 months ago 37.2MB
mycluster.icp:8500/ibmcom/etcd v3.2.18 e21fb69683f3 6 months ago 37.2MB
md5-9751470214831296ccec38c7a53d92a2
root@gyliu-c11:~/cluster# docker ps | grep etcd
20b9b4a08d8b e21fb69683f3 "etcd --name=etcd0 -…" 2 weeks ago Up 2 weeks k8s_etcd_k8s-etcd-172.16.250.138_kube-system_7ed3edad6e50c7486a61f12e79944a63_0
6e33c51ffd90 mycluster.icp:8500/ibmcom/pause:3.1 "/pause" 2 weeks ago Up 2 weeks k8s_POD_k8s-etcd-172.16.250.138_kube-system_7ed3edad6e50c7486a61f12e79944a63_0
md5-9751470214831296ccec38c7a53d92a2
root@gyliu-c11:~/cluster# docker status 20b9b4a08d8b
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
20b9b4a08d8b k8s_etcd_k8s-etcd-172.16.250.138_kube-system_7ed3edad6e50c7486a61f12e79944a63_0 2.61% 149.6MiB / 15.67GiB 0.93% 0B / 0B 73.9MB / 297GB 22
md5-9751470214831296ccec38c7a53d92a2
root@gyliu-c11:~/cluster# ps -ef | grep etcd | grep snapshot
root 23109 23075 3 Sep27 ? 16:16:59 etcd --name=etcd0 --data-dir=/var/lib/etcd --wal-dir=/var/lib/etcd-wal/wal --max-wals=0 --initial-advertise-peer-urls=https://172.16.250.138:2380 --listen-peer-urls=https://0.0.0.0:2380 --listen-client-urls=https://0.0.0.0:4001 --advertise-client-urls=https://172.16.250.138:4001 --cert-file=/etc/cfc/conf/etcd/server.pem --key-file=/etc/cfc/conf/etcd/server-key.pem --client-cert-auth --trusted-ca-file=/etc/cfc/conf/etcd/ca.pem --initial-cluster-token=etcd-cluster-1 --initial-cluster=etcd0=https://172.16.250.138:2380 --peer-cert-file=/etc/cfc/conf/etcd/member-172.16.250.138.pem --peer-key-file=/etc/cfc/conf/etcd/member-172.16.250.138-key.pem --peer-trusted-ca-file=/etc/cfc/conf/etcd/ca.pem --peer-client-cert-auth=true --peer-auto-tls=false --grpc-keepalive-timeout=0 --grpc-keepalive-interval=0 --snapshot-count=10000 --initial-cluster-state=new
md5-9751470214831296ccec38c7a53d92a2
top -p 23109
top - 06:46:45 up 22 days, 4:55, 2 users, load average: 3.28, 3.39, 2.77
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 11.6 us, 4.3 sy, 0.0 ni, 82.7 id, 0.3 wa, 0.0 hi, 0.4 si, 0.7 st
KiB Mem : 16431940 total, 272896 free, 10530900 used, 5628144 buff/cache
KiB Swap: 4190204 total, 1116416 free, 3073788 used. 4826056 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23109 root 20 0 10.161g 139320 12064 S 2.7 0.8 977:00.61 etcd
It's mmap for bolt DB. There's no allocation overhead.
Can someone help me understand what is that ~10Gig block which is holding most of the space?
@mkumatag pmap -px $PID will shed a little more light on this for you.
Thanks a lot for quick help on this @gyuho @hexfusion
Most helpful comment
It's mmap for bolt DB. There's no allocation overhead.
https://github.com/etcd-io/etcd/blob/7a759c18d294698f537f8be91927354818a71e51/mvcc/backend/backend.go#L40-L43