etcd version: 3.2.24
k8s version: 1.12.4
We have a HA cluster with 3 etcd node, meets lots of etcd timeout log
2019-03-23 17:01:49.987159 W | etcdserver: read-only range request "key:\"xxxx\" " with result "range_response_count:1 size:82" took too long (1.483848046s) to execute
...
2019-03-24 16:47:08.632540 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 45.654372ms)
...
2019-03-24 16:34:27.597162 W | etcdserver: server is likely overloaded
As most of the timeout issue is due to disk performance, so we did many investigation on disk performance, but everything looks fine.
+-------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-------------------+------------------+---------+---------+-----------+-----------+------------+
| xxx:4001 | 302ec63cdf6a78a1 | 3.2.24 | 44 MB | false | 3700 | 34725561 |
| xxx:4001 | ac589e4da6ba316a | 3.2.24 | 44 MB | false | 3700 | 34725561 |
| xxx:4001 | 75b98122171add03 | 3.2.24 | 44 MB | true | 3700 | 34725561 |
+-------------------+------------------+---------+---------+-----------+-----------+------------+
etcd check perf
60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0sPASS: Throughput is 150 writes/s
PASS: Slowest request took 0.313866s
PASS: Stddev is 0.018807s
PASS
dd bs=1M count=200 if=/dev/zero of=test.dd conv=fsync 2>> ${_mylogfile}
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 1.291 s, 162 MB/s
le "0.001" "438012"
le "0.002" "504743"
le "0.004" "540407"
le "0.008" "546545"
le "0.016" "549721"
le "0.032" "551710"
le "0.064" "553345"
le "0.128" "554658"
le "0.256" "555587"
le "0.512" "556101"
le "1.024" "556281"
le "2.048" "556331"
le "4.096" "556341"
le "8.192" "556341"
le "0.001" "38912"
le "0.002" "167317"
le "0.004" "171912"
le "0.008" "173698"
le "0.016" "175229"
le "0.032" "176239"
le "0.064" "176816"
le "0.128" "177231"
le "0.256" "177498"
le "0.512" "177608"
le "1.024" "177641"
le "2.048" "177645"
le "4.096" "177645"
le "8.192" "177645"
{"log":"2019-06-01 09:40:12.911065 W | etcdserver: read-only range request \"key:\\\"/xxx\\\" \" with result \"error:etcdserver: request timed out\" took too long (10.321355002s) to execute\n","stream":"stderr","time":"2019-06-01T09:40:12.911617679Z"}
DO you have any suggestion how to do next investigation ?
Can this disk performance be considered insufficient for etcd ?
Can this be the cause of the unexpected Etcd timeout and leader switch ?
2019-03-24 16:47:08.632540 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 45.654372ms)
2019-03-24 16:34:27.597162 W | etcdserver: server is likely overloaded
Can we see full metrics? While the disk is usually the issue these messages can be network and or CPU related.[1]
Tried to set --heartbeat-interval=500 --election-timeout=5000, this make timeout even worse, timeout is up to 10s.
Generally, we see timeout around 5x heartbeat interval this is 10x.
1.) The db size is only 40M
this is v3 what about v2 do you have v2 data?
$ ETCDCTL_API=2 etcdctl ls -r /
[1] https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-does-the-etcd-warning-failed-to-send-out-heartbeat-on-time-mean
@hexfusion Thank you.
Below is the full metrics. The disk looks good even not reach 99% in 10ms. Do you think so ?
And as you suggest, we need further check CPU and network. Will append the result later.
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance 鈥渪.x.x.x:4001"
job "etcd"
le "+Inf"
value
0 1555576363.96
1 "556341"
1
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.001"
value
0 1555576363.96
1 "438012"
2
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.002"
value
0 1555576363.96
1 "504743"
3
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.004"
value
0 1555576363.96
1 "540407"
4
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.008"
value
0 1555576363.96
1 "546545"
5
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.016"
value
0 1555576363.96
1 "549721"
6
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.032"
value
0 1555576363.96
1 "551710"
7
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance 鈥渪.x.x.x:4001"
job "etcd"
le "0.064"
value
0 1555576363.96
1 "553345"
8
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance 鈥渪.x.x.x:4001"
job "etcd"
le "0.128"
value
0 1555576363.96
1 "554658"
9
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.256"
value
0 1555576363.96
1 "555587"
10
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.512"
value
0 1555576363.96
1 "556101"
11
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "1.024"
value
0 1555576363.96
1 "556281"
12
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "2.048"
value
0 1555576363.96
1 "556331"
13
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "4.096"
value
0 1555576363.96
1 "556341"
14
metric
__name__ "etcd_disk_wal_fsync_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "8.192"
value
0 1555576363.96
1 "556341"
15
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance 鈥渪.x.x.x:4001"
job "etcd"
le "+Inf"
value
0 1555576402.956
1 "177645"
1
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.001"
value
0 1555576402.956
1 "38912"
2
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.002"
value
0 1555576402.956
1 "167317"
3
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.004"
value
0 1555576402.956
1 "171912"
4
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.008"
value
0 1555576402.956
1 "173698"
5
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.016"
value
0 1555576402.956
1 "175229"
6
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.032"
value
0 1555576402.956
1 "176239"
7
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.064"
value
0 1555576402.956
1 "176816"
8
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.128"
value
0 1555576402.956
1 "177231"
9
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.256"
value
0 1555576402.956
1 "177498"
10
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "0.512"
value
0 1555576402.956
1 "177608"
11
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "1.024"
value
0 1555576402.956
1 "177641"
12
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "2.048"
value
0 1555576402.956
1 "177645"
13
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "4.096"
value
0 1555576402.956
1 "177645"
14
metric
__name__ "etcd_disk_backend_commit_duration_seconds_bucket"
instance "x.x.x.x:4001"
job "etcd"
le "8.192"
value
0 1555576402.956
1 "177645"
15
@hexfusion
For the cpu, we checked and no cpu starvation.
For network, any best practice beside ping command?
For the disk, below is the fio result, we have another cluster with 2MB/s bandwidth and is running well, so do you think the disk is slow from the fio result?
# fio --name=sequential_write_iops_test --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd --size=22m --bs=2300
sequential_write_iops_test: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=2630KiB/s][r=0,w=1171 IOPS][eta 00m:00s]
sequential_write_iops_test: (groupid=0, jobs=1): err= 0: pid=2185: Tue Jun 11 16:44:54 2019
write: IOPS=1192, BW=2678KiB/s (2742kB/s)(21.0MiB/8411msec)
clat (nsec): min=1784, max=6381.4k, avg=326269.48, stdev=417837.65
lat (nsec): min=1867, max=6381.6k, avg=326607.36, stdev=417866.63
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 8],
| 30.00th=[ 10], 40.00th=[ 14], 50.00th=[ 221], 60.00th=[ 474],
| 70.00th=[ 502], 80.00th=[ 545], 90.00th=[ 627], 95.00th=[ 824],
| 99.00th=[ 2212], 99.50th=[ 2573], 99.90th=[ 3556], 99.95th=[ 4490],
| 99.99th=[ 5932]
bw ( KiB/s): min= 2412, max= 2952, per=100.00%, avg=2696.50, stdev=139.83, samples=16
iops : min= 1074, max= 1314, avg=1200.56, stdev=62.18, samples=16
lat (usec) : 2=0.02%, 4=3.34%, 10=28.45%, 20=11.14%, 50=0.73%
lat (usec) : 100=0.12%, 250=6.85%, 500=18.35%, 750=24.97%, 1000=2.72%
lat (msec) : 2=1.99%, 4=1.24%, 10=0.09%
cpu : usr=1.13%, sys=6.72%, ctx=15671, majf=0, minf=31
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,10029,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=2678KiB/s (2742kB/s), 2678KiB/s-2678KiB/s (2742kB/s-2742kB/s), io=21.0MiB (23.1MB), un=8411-8411msec
Disk stats (read/write):
dm-0: ios=5626/10515, merge=0/0, ticks=2969/4828, in_queue=7798, util=89.71%, ggrios=5632/10341, aggrmerge=0/212, aggrticks=2968/4800, aggrin_queue=7762, aggrutil=89.46%
sda: ios=5632/10341, merge=0/212, ticks=2968/4800, in_queue=7762, util=89.6%
@xiang90 @hexfusion any thoughts/recommendations with this issue please? Thanks!
/cc @gyuho @jingyih - do you have any thoughts that can help @haoqing0110 debug the issue further? Thanks!
@haoqing0110 providing full metrics including network, snapshot may help get a better pic. Thanks!
@xiang90 @hexfusion @gyuho @spzala below is the whole metrics:
etcd_network_peer_round_trip_time_seconds_bucket is up to 0.8 seconds, while in https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/performance.md#understanding-performance it said A typical RTT within the United States is around 50ms, and can be as slow as 400ms between continents.
Do you think the network is too slow ?
And from the whole pic, any idea what causes the frequently timeout issue ?
# HELP etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds Bucketed histogram of db compaction pause duration.
# TYPE etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds histogram
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="1"} 0
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="2"} 0
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="4"} 133
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="8"} 447
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="16"} 523
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="32"} 571
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="64"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="128"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="256"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="512"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="1024"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="2048"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="4096"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_bucket{le="+Inf"} 592
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_sum 5270
etcd_debugging_mvcc_db_compaction_pause_duration_milliseconds_count 592
# HELP etcd_debugging_mvcc_db_compaction_total_duration_milliseconds Bucketed histogram of db compaction total duration.
# TYPE etcd_debugging_mvcc_db_compaction_total_duration_milliseconds histogram
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="100"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="200"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="400"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="800"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="1600"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="3200"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="6400"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="12800"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="25600"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="51200"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="102400"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="204800"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="409600"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="819200"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_bucket{le="+Inf"} 6085
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_sum 0
etcd_debugging_mvcc_db_compaction_total_duration_milliseconds_count 6085
# HELP etcd_debugging_mvcc_db_total_size_in_bytes Total size of the underlying database physically allocated in bytes. Use etcd_mvcc_db_total_size_in_bytes
# TYPE etcd_debugging_mvcc_db_total_size_in_bytes gauge
etcd_debugging_mvcc_db_total_size_in_bytes 7.7975552e+08
# HELP etcd_debugging_mvcc_delete_total Total number of deletes seen by this member.
# TYPE etcd_debugging_mvcc_delete_total counter
etcd_debugging_mvcc_delete_total 23680
# HELP etcd_debugging_mvcc_events_total Total number of events sent by this member.
# TYPE etcd_debugging_mvcc_events_total counter
etcd_debugging_mvcc_events_total 1.0270154e+07
# HELP etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds Bucketed histogram of index compaction pause duration.
# TYPE etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds histogram
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="0.5"} 0
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="1"} 0
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="2"} 0
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="4"} 653
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="8"} 3003
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="16"} 4567
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="32"} 5907
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="64"} 6002
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="128"} 6045
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="256"} 6068
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="512"} 6083
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="1024"} 6085
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_bucket{le="+Inf"} 6085
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_sum 84300
etcd_debugging_mvcc_index_compaction_pause_duration_milliseconds_count 6085
# HELP etcd_debugging_mvcc_keys_total Total number of keys.
# TYPE etcd_debugging_mvcc_keys_total gauge
etcd_debugging_mvcc_keys_total 8141
# HELP etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.
# TYPE etcd_debugging_mvcc_pending_events_total gauge
etcd_debugging_mvcc_pending_events_total 0
# HELP etcd_debugging_mvcc_put_total Total number of puts seen by this member.
# TYPE etcd_debugging_mvcc_put_total counter
etcd_debugging_mvcc_put_total 6.496934e+06
# HELP etcd_debugging_mvcc_range_total Total number of ranges seen by this member.
# TYPE etcd_debugging_mvcc_range_total counter
etcd_debugging_mvcc_range_total 6.4044228e+07
# HELP etcd_debugging_mvcc_slow_watcher_total Total number of unsynced slow watchers.
# TYPE etcd_debugging_mvcc_slow_watcher_total gauge
etcd_debugging_mvcc_slow_watcher_total 0
# HELP etcd_debugging_mvcc_txn_total Total number of txns seen by this member.
# TYPE etcd_debugging_mvcc_txn_total counter
etcd_debugging_mvcc_txn_total 34212
# HELP etcd_debugging_mvcc_watch_stream_total Total number of watch streams.
# TYPE etcd_debugging_mvcc_watch_stream_total gauge
etcd_debugging_mvcc_watch_stream_total 198
# HELP etcd_debugging_mvcc_watcher_total Total number of watchers.
# TYPE etcd_debugging_mvcc_watcher_total gauge
etcd_debugging_mvcc_watcher_total 12191
# HELP etcd_debugging_server_lease_expired_total The total number of expired leases.
# TYPE etcd_debugging_server_lease_expired_total counter
etcd_debugging_server_lease_expired_total 43687
# HELP etcd_debugging_snap_save_marshalling_duration_seconds The marshalling cost distributions of save called by snapshot.
# TYPE etcd_debugging_snap_save_marshalling_duration_seconds histogram
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.001"} 1371
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.002"} 1386
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.004"} 1390
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.008"} 1395
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.016"} 1403
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.032"} 1403
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.064"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.128"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.256"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="0.512"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="1.024"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="2.048"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="4.096"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="8.192"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_bucket{le="+Inf"} 1404
etcd_debugging_snap_save_marshalling_duration_seconds_sum 0.7202120359999999
etcd_debugging_snap_save_marshalling_duration_seconds_count 1404
# HELP etcd_debugging_snap_save_total_duration_seconds The total latency distributions of save called by snapshot.
# TYPE etcd_debugging_snap_save_total_duration_seconds histogram
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.001"} 0
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.002"} 0
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.004"} 204
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.008"} 1269
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.016"} 1361
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.032"} 1376
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.064"} 1388
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.128"} 1401
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.256"} 1403
etcd_debugging_snap_save_total_duration_seconds_bucket{le="0.512"} 1404
etcd_debugging_snap_save_total_duration_seconds_bucket{le="1.024"} 1404
etcd_debugging_snap_save_total_duration_seconds_bucket{le="2.048"} 1404
etcd_debugging_snap_save_total_duration_seconds_bucket{le="4.096"} 1404
etcd_debugging_snap_save_total_duration_seconds_bucket{le="8.192"} 1404
etcd_debugging_snap_save_total_duration_seconds_bucket{le="+Inf"} 1404
etcd_debugging_snap_save_total_duration_seconds_sum 9.981503429000005
etcd_debugging_snap_save_total_duration_seconds_count 1404
# HELP etcd_debugging_store_expires_total Total number of expired keys.
# TYPE etcd_debugging_store_expires_total counter
etcd_debugging_store_expires_total 87
# HELP etcd_debugging_store_reads_total Total number of reads action by (get/getRecursive), local to this member.
# TYPE etcd_debugging_store_reads_total counter
etcd_debugging_store_reads_total{action="get"} 884022
etcd_debugging_store_reads_total{action="getRecursive"} 2
# HELP etcd_debugging_store_watch_requests_total Total number of incoming watch requests (new or reestablished).
# TYPE etcd_debugging_store_watch_requests_total counter
etcd_debugging_store_watch_requests_total 0
# HELP etcd_debugging_store_watchers Count of currently active watchers.
# TYPE etcd_debugging_store_watchers gauge
etcd_debugging_store_watchers 0
# HELP etcd_debugging_store_writes_total Total number of writes (e.g. set/compareAndDelete) seen by this member.
# TYPE etcd_debugging_store_writes_total counter
etcd_debugging_store_writes_total{action="create"} 1.679808e+06
etcd_debugging_store_writes_total{action="set"} 13
etcd_debugging_store_writes_total{action="update"} 976637
# HELP etcd_disk_backend_commit_duration_seconds The latency distributions of commit called by backend.
# TYPE etcd_disk_backend_commit_duration_seconds histogram
etcd_disk_backend_commit_duration_seconds_bucket{le="0.001"} 651122
etcd_disk_backend_commit_duration_seconds_bucket{le="0.002"} 2.383396e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.004"} 2.490115e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.008"} 2.647812e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.016"} 4.402987e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.032"} 4.547933e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.064"} 4.582884e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.128"} 4.604896e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.256"} 4.619558e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="0.512"} 4.627624e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="1.024"} 4.62993e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="2.048"} 4.630297e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="4.096"} 4.630498e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="8.192"} 4.630498e+06
etcd_disk_backend_commit_duration_seconds_bucket{le="+Inf"} 4.630498e+06
etcd_disk_backend_commit_duration_seconds_sum 37533.9014823162
etcd_disk_backend_commit_duration_seconds_count 4.630498e+06
# HELP etcd_disk_backend_defrag_duration_seconds The latency distribution of backend defragmentation.
# TYPE etcd_disk_backend_defrag_duration_seconds histogram
etcd_disk_backend_defrag_duration_seconds_bucket{le="0.1"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="0.2"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="0.4"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="0.8"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="1.6"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="3.2"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="6.4"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="12.8"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="25.6"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="51.2"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="102.4"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="204.8"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="409.6"} 0
etcd_disk_backend_defrag_duration_seconds_bucket{le="+Inf"} 0
etcd_disk_backend_defrag_duration_seconds_sum 0
etcd_disk_backend_defrag_duration_seconds_count 0
# HELP etcd_disk_backend_snapshot_duration_seconds The latency distribution of backend snapshots.
# TYPE etcd_disk_backend_snapshot_duration_seconds histogram
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.01"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.02"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.04"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.08"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.16"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.32"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="0.64"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="1.28"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="2.56"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="5.12"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="10.24"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="20.48"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="40.96"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="81.92"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="163.84"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="327.68"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="655.36"} 0
etcd_disk_backend_snapshot_duration_seconds_bucket{le="+Inf"} 0
etcd_disk_backend_snapshot_duration_seconds_sum 0
etcd_disk_backend_snapshot_duration_seconds_count 0
# HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal.
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 1.0088201e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 1.1847247e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 1.2883169e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 1.3124647e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 1.3217888e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 1.326982e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 1.3309067e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 1.333918e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 1.3355548e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 1.3362181e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 1.3363739e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 1.336514e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 1.336593e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 1.3365931e+07
etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 1.3365931e+07
etcd_disk_wal_fsync_duration_seconds_sum 27715.61056288543
etcd_disk_wal_fsync_duration_seconds_count 1.3365931e+07
# HELP etcd_grpc_proxy_cache_hits_total Total number of cache hits
# TYPE etcd_grpc_proxy_cache_hits_total gauge
etcd_grpc_proxy_cache_hits_total 0
# HELP etcd_grpc_proxy_cache_keys_total Total number of keys/ranges cached
# TYPE etcd_grpc_proxy_cache_keys_total gauge
etcd_grpc_proxy_cache_keys_total 0
# HELP etcd_grpc_proxy_cache_misses_total Total number of cache misses
# TYPE etcd_grpc_proxy_cache_misses_total gauge
etcd_grpc_proxy_cache_misses_total 0
# HELP etcd_grpc_proxy_events_coalescing_total Total number of events coalescing
# TYPE etcd_grpc_proxy_events_coalescing_total counter
etcd_grpc_proxy_events_coalescing_total 0
# HELP etcd_grpc_proxy_watchers_coalescing_total Total number of current watchers coalescing
# TYPE etcd_grpc_proxy_watchers_coalescing_total gauge
etcd_grpc_proxy_watchers_coalescing_total 0
# HELP etcd_http_failed_total Counter of handle failures of requests (non-watches), by method (GET/PUT etc.) and code (400, 500 etc.).
# TYPE etcd_http_failed_total counter
etcd_http_failed_total{code="404",method="PUT"} 23
etcd_http_failed_total{code="412",method="PUT"} 558904
etcd_http_failed_total{code="500",method="PUT"} 51
# HELP etcd_http_received_total Counter of requests received into the system (successfully parsed and authd).
# TYPE etcd_http_received_total counter
etcd_http_received_total{method="PUT"} 884021
# HELP etcd_http_successful_duration_seconds Bucketed histogram of processing time (s) of successfully handled requests (non-watches), by method (GET/PUT etc.).
# TYPE etcd_http_successful_duration_seconds histogram
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.0005"} 0
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.001"} 1099
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.002"} 154303
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.004"} 286916
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.008"} 310034
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.016"} 315991
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.032"} 317871
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.064"} 319127
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.128"} 320503
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.256"} 321954
etcd_http_successful_duration_seconds_bucket{method="PUT",le="0.512"} 323331
etcd_http_successful_duration_seconds_bucket{method="PUT",le="1.024"} 324295
etcd_http_successful_duration_seconds_bucket{method="PUT",le="2.048"} 324855
etcd_http_successful_duration_seconds_bucket{method="PUT",le="+Inf"} 325043
etcd_http_successful_duration_seconds_sum{method="PUT"} 3839.231835432929
etcd_http_successful_duration_seconds_count{method="PUT"} 325043
# HELP etcd_mvcc_db_total_size_in_bytes Total size of the underlying database physically allocated in bytes.
# TYPE etcd_mvcc_db_total_size_in_bytes gauge
etcd_mvcc_db_total_size_in_bytes 7.7975552e+08
# HELP etcd_mvcc_db_total_size_in_use_in_bytes Total size of the underlying database logically in use in bytes.
# TYPE etcd_mvcc_db_total_size_in_use_in_bytes gauge
etcd_mvcc_db_total_size_in_use_in_bytes 7.75692288e+08
# HELP etcd_mvcc_hash_duration_seconds The latency distribution of storage hash operation.
# TYPE etcd_mvcc_hash_duration_seconds histogram
etcd_mvcc_hash_duration_seconds_bucket{le="0.01"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.02"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.04"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.08"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.16"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.32"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="0.64"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="1.28"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="2.56"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="5.12"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="10.24"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="20.48"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="40.96"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="81.92"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="163.84"} 0
etcd_mvcc_hash_duration_seconds_bucket{le="+Inf"} 0
etcd_mvcc_hash_duration_seconds_sum 0
etcd_mvcc_hash_duration_seconds_count 0
# HELP etcd_network_client_grpc_received_bytes_total The total number of bytes received from grpc clients.
# TYPE etcd_network_client_grpc_received_bytes_total counter
etcd_network_client_grpc_received_bytes_total 1.2383943468e+10
# HELP etcd_network_client_grpc_sent_bytes_total The total number of bytes sent to grpc clients.
# TYPE etcd_network_client_grpc_sent_bytes_total counter
etcd_network_client_grpc_sent_bytes_total 3.538011601e+11
# HELP etcd_network_peer_received_bytes_total The total number of bytes received from peers.
# TYPE etcd_network_peer_received_bytes_total counter
etcd_network_peer_received_bytes_total{From="0"} 1.22542168e+08
etcd_network_peer_received_bytes_total{From="75b98122171add03"} 1.6351628004e+10
etcd_network_peer_received_bytes_total{From="ac589e4da6ba316a"} 1.4384438437e+10
# HELP etcd_network_peer_round_trip_time_seconds Round-Trip-Time histogram between peers.
# TYPE etcd_network_peer_round_trip_time_seconds histogram
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0001"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0002"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0004"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0008"} 17708
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0016"} 54950
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0032"} 57138
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0064"} 58145
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0128"} 58763
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0256"} 59251
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.0512"} 59660
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.1024"} 60029
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.2048"} 60367
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.4096"} 60619
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="0.8192"} 60748
etcd_network_peer_round_trip_time_seconds_bucket{To="75b98122171add03",le="+Inf"} 60982
etcd_network_peer_round_trip_time_seconds_sum{To="75b98122171add03"} 1688.2817830080123
etcd_network_peer_round_trip_time_seconds_count{To="75b98122171add03"} 60982
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0001"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0002"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0004"} 0
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0008"} 7476
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0016"} 55442
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0032"} 60210
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0064"} 60609
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0128"} 60763
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0256"} 60848
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.0512"} 60877
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.1024"} 60879
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.2048"} 60880
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.4096"} 60880
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="0.8192"} 60880
etcd_network_peer_round_trip_time_seconds_bucket{To="ac589e4da6ba316a",le="+Inf"} 60880
etcd_network_peer_round_trip_time_seconds_sum{To="ac589e4da6ba316a"} 72.32885814600004
etcd_network_peer_round_trip_time_seconds_count{To="ac589e4da6ba316a"} 60880
# HELP etcd_network_peer_sent_bytes_total The total number of bytes sent to peers.
# TYPE etcd_network_peer_sent_bytes_total counter
etcd_network_peer_sent_bytes_total{To="75b98122171add03"} 1.2271172445e+10
etcd_network_peer_sent_bytes_total{To="ac589e4da6ba316a"} 1.1019996274e+10
# HELP etcd_network_peer_sent_failures_total The total number of send failures from peers.
# TYPE etcd_network_peer_sent_failures_total counter
etcd_network_peer_sent_failures_total{To="75b98122171add03"} 50318
etcd_network_peer_sent_failures_total{To="ac589e4da6ba316a"} 2527
# HELP etcd_server_go_version Which Go version server is running with. 1 for 'server_go_version' label with current version.
# TYPE etcd_server_go_version gauge
etcd_server_go_version{server_go_version="go1.8.7"} 1
# HELP etcd_server_has_leader Whether or not a leader exists. 1 is existence, 0 is not.
# TYPE etcd_server_has_leader gauge
etcd_server_has_leader 1
# HELP etcd_server_heartbeat_send_failures_total The total number of leader heartbeat send failures (likely overloaded from slow disk).
# TYPE etcd_server_heartbeat_send_failures_total counter
etcd_server_heartbeat_send_failures_total 3718
# HELP etcd_server_is_leader Whether or not this member is a leader. 1 if is, 0 otherwise.
# TYPE etcd_server_is_leader gauge
etcd_server_is_leader 0
# HELP etcd_server_leader_changes_seen_total The number of leader changes seen.
# TYPE etcd_server_leader_changes_seen_total counter
etcd_server_leader_changes_seen_total 412
# HELP etcd_server_proposals_applied_total The total number of consensus proposals applied.
# TYPE etcd_server_proposals_applied_total gauge
etcd_server_proposals_applied_total 7.7210616e+07
# HELP etcd_server_proposals_committed_total The total number of consensus proposals committed.
# TYPE etcd_server_proposals_committed_total gauge
etcd_server_proposals_committed_total 7.7210616e+07
# HELP etcd_server_proposals_failed_total The total number of failed proposals seen.
# TYPE etcd_server_proposals_failed_total counter
etcd_server_proposals_failed_total 140
# HELP etcd_server_proposals_pending The current number of pending proposals to commit.
# TYPE etcd_server_proposals_pending gauge
etcd_server_proposals_pending 0
# HELP etcd_server_quota_backend_bytes Current backend storage quota size in bytes.
# TYPE etcd_server_quota_backend_bytes gauge
etcd_server_quota_backend_bytes 2.147483648e+09
# HELP etcd_server_slow_apply_total The total number of slow apply requests (likely overloaded from slow disk).
# TYPE etcd_server_slow_apply_total counter
etcd_server_slow_apply_total 428628
# HELP etcd_server_slow_read_indexes_total The total number of pending read indexes not in sync with leader's or timed out read index requests.
# TYPE etcd_server_slow_read_indexes_total counter
etcd_server_slow_read_indexes_total 406
# HELP etcd_server_version Which version is running. 1 for 'server_version' label with current version.
# TYPE etcd_server_version gauge
etcd_server_version{server_version="3.2.24"} 1
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 5.8326e-05
go_gc_duration_seconds{quantile="0.25"} 0.000133039
go_gc_duration_seconds{quantile="0.5"} 0.000179446
go_gc_duration_seconds{quantile="0.75"} 0.000446481
go_gc_duration_seconds{quantile="1"} 0.062807492
go_gc_duration_seconds_sum 56.910016608
go_gc_duration_seconds_count 104984
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 2692
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 5.5689764e+08
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 9.485882952296e+12
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 3.524175e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.1040450261e+10
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 3.8701056e+07
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 5.5689764e+08
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 4.14138368e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 6.15825408e+08
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 3.057442e+06
# HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes_total counter
go_memstats_heap_released_bytes_total 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.029963776e+09
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5612088433545754e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 2.1928748e+07
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.1043507703e+10
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 7200
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 8.56672e+06
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 1.3385728e+07
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 6.21747392e+08
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 3.374761e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 1.900544e+07
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 1.900544e+07
# HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.10797132e+09
# HELP grpc_server_handled_total Total number of RPCs completed on the server, regardless of success or failure.
# TYPE grpc_server_handled_total counter
grpc_server_handled_total{grpc_code="Canceled",grpc_method="LeaseKeepAlive",grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"} 25
grpc_server_handled_total{grpc_code="Canceled",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 3
grpc_server_handled_total{grpc_code="OK",grpc_method="Compact",grpc_service="etcdserverpb.KV",grpc_type="unary"} 5325
grpc_server_handled_total{grpc_code="OK",grpc_method="DeleteRange",grpc_service="etcdserverpb.KV",grpc_type="unary"} 2
grpc_server_handled_total{grpc_code="OK",grpc_method="LeaseGrant",grpc_service="etcdserverpb.Lease",grpc_type="unary"} 296550
grpc_server_handled_total{grpc_code="OK",grpc_method="MemberList",grpc_service="etcdserverpb.Cluster",grpc_type="unary"} 130232
grpc_server_handled_total{grpc_code="OK",grpc_method="Put",grpc_service="etcdserverpb.KV",grpc_type="unary"} 353661
grpc_server_handled_total{grpc_code="OK",grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 4.7755144e+07
grpc_server_handled_total{grpc_code="OK",grpc_method="Status",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 69027
grpc_server_handled_total{grpc_code="OK",grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 3.182219e+06
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="LeaseGrant",grpc_service="etcdserverpb.Lease",grpc_type="unary"} 1
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="LeaseKeepAlive",grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"} 489549
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 1380
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 64
grpc_server_handled_total{grpc_code="Unavailable",grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 39530
grpc_server_handled_total{grpc_code="Unknown",grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 353
grpc_server_handled_total{grpc_code="Unknown",grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 1
# HELP grpc_server_msg_received_total Total number of RPC stream messages received on the server.
# TYPE grpc_server_msg_received_total counter
grpc_server_msg_received_total{grpc_method="Compact",grpc_service="etcdserverpb.KV",grpc_type="unary"} 5325
grpc_server_msg_received_total{grpc_method="DeleteRange",grpc_service="etcdserverpb.KV",grpc_type="unary"} 2
grpc_server_msg_received_total{grpc_method="LeaseGrant",grpc_service="etcdserverpb.Lease",grpc_type="unary"} 296551
grpc_server_msg_received_total{grpc_method="LeaseKeepAlive",grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"} 489574
grpc_server_msg_received_total{grpc_method="MemberList",grpc_service="etcdserverpb.Cluster",grpc_type="unary"} 130232
grpc_server_msg_received_total{grpc_method="Put",grpc_service="etcdserverpb.KV",grpc_type="unary"} 353661
grpc_server_msg_received_total{grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 4.7756877e+07
grpc_server_msg_received_total{grpc_method="Status",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 69027
grpc_server_msg_received_total{grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 3.182284e+06
grpc_server_msg_received_total{grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 85862
# HELP grpc_server_msg_sent_total Total number of gRPC stream messages sent by the server.
# TYPE grpc_server_msg_sent_total counter
grpc_server_msg_sent_total{grpc_method="Compact",grpc_service="etcdserverpb.KV",grpc_type="unary"} 5325
grpc_server_msg_sent_total{grpc_method="DeleteRange",grpc_service="etcdserverpb.KV",grpc_type="unary"} 2
grpc_server_msg_sent_total{grpc_method="LeaseGrant",grpc_service="etcdserverpb.Lease",grpc_type="unary"} 296550
grpc_server_msg_sent_total{grpc_method="LeaseKeepAlive",grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"} 489564
grpc_server_msg_sent_total{grpc_method="MemberList",grpc_service="etcdserverpb.Cluster",grpc_type="unary"} 130232
grpc_server_msg_sent_total{grpc_method="Put",grpc_service="etcdserverpb.KV",grpc_type="unary"} 353661
grpc_server_msg_sent_total{grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 4.7755144e+07
grpc_server_msg_sent_total{grpc_method="Status",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 69027
grpc_server_msg_sent_total{grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 3.182219e+06
grpc_server_msg_sent_total{grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 1.0284651e+07
# HELP grpc_server_started_total Total number of RPCs started on the server.
# TYPE grpc_server_started_total counter
grpc_server_started_total{grpc_method="Compact",grpc_service="etcdserverpb.KV",grpc_type="unary"} 5325
grpc_server_started_total{grpc_method="DeleteRange",grpc_service="etcdserverpb.KV",grpc_type="unary"} 2
grpc_server_started_total{grpc_method="LeaseGrant",grpc_service="etcdserverpb.Lease",grpc_type="unary"} 296551
grpc_server_started_total{grpc_method="LeaseKeepAlive",grpc_service="etcdserverpb.Lease",grpc_type="bidi_stream"} 489574
grpc_server_started_total{grpc_method="MemberList",grpc_service="etcdserverpb.Cluster",grpc_type="unary"} 130232
grpc_server_started_total{grpc_method="Put",grpc_service="etcdserverpb.KV",grpc_type="unary"} 353661
grpc_server_started_total{grpc_method="Range",grpc_service="etcdserverpb.KV",grpc_type="unary"} 4.7756877e+07
grpc_server_started_total{grpc_method="Status",grpc_service="etcdserverpb.Maintenance",grpc_type="unary"} 69027
grpc_server_started_total{grpc_method="Txn",grpc_service="etcdserverpb.KV",grpc_type="unary"} 3.182284e+06
grpc_server_started_total{grpc_method="Watch",grpc_service="etcdserverpb.Watch",grpc_type="bidi_stream"} 39731
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} 6073.106
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} 8520.396
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} 8520.396
http_request_duration_microseconds_sum{handler="prometheus"} 1.739440468700005e+08
http_request_duration_microseconds_count{handler="prometheus"} 27809
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} 169
http_request_size_bytes{handler="prometheus",quantile="0.9"} 169
http_request_size_bytes{handler="prometheus",quantile="0.99"} 169
http_request_size_bytes_sum{handler="prometheus"} 4.699618e+06
http_request_size_bytes_count{handler="prometheus"} 27809
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",handler="prometheus",method="get"} 27809
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} 5712
http_response_size_bytes{handler="prometheus",quantile="0.9"} 40058
http_response_size_bytes{handler="prometheus",quantile="0.99"} 40058
http_response_size_bytes_sum{handler="prometheus"} 1.55730785e+08
http_response_size_bytes_count{handler="prometheus"} 27809
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 153755.16
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 371
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.885937664e+09
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.55938282766e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.1863625728e+10
From above
1.) The db size is only 40M
this is v3 what about v2 do you have v2 data?
$ ETCDCTL_API=2 etcdctl ls -r /
Not sure if you answered the above question. But your metrics also point to using v2 store.
etcd_http_failed_total{code="412",method="PUT"} 558904
etcd_http_received_total{method="PUT"} 884021
Can you explain the HTTP Puts in your metrics? Are you using both v2 and v3 store? This would explain a lot.
@hexfusion running ETCDCTL_API=2 etcdctl ls -r / in my cluster, and only get below content
ETCDCTL_API=2 etcdctl ls -r /
/mariadb_lock
Seems only one key in v2. Will this effect performance?
Can you explain the HTTP Puts in your metrics?
I run a k8s cluster + etcd, and many applications run on it. What do I need to check to answer this question?
the issue is resolved by changing to a faster disk. close it.
Most helpful comment
the issue is resolved by changing to a faster disk. close it.