I setup a cluster with three machines, the detail info of my cluster is bellow:
`etcdctl --endpoints https://127.0.0.1:2379 --cert-file /etc/ssl/client.pem --key-file /etc/ssl/client-key.pem --ca-file /etc/ssl/ca.pem member list
65a37c12abfb1cf1: name=etcd_2 peerURLs=https://10.32.xx.xx:2380 clientURLs=https://10.32.xx.xx:2379 isLeader=true
7209761f3009b6cb: name=etcd_1 peerURLs=https://10.32.xx.xx:2380 clientURLs=https://10.32.xx.xx:2379 isLeader=false
bffb8bacef83e2c6: name=etcd_1 peerURLs=https://10.32.xx.xx:2380 clientURLs=https://10.32.xx.xx:2379 isLeader=false`
and the command that i set up the cluster is:
etcd --name etcd_1 --data-dir /opt/etcd_data/ --initial-advertise-peer-urls https://10.32.xx.xx:2380 --listen-peer-urls https://10.32.xx.xx:2380 --listen-client-urls https://10.32.xx.xx:2379,https://127.0.0.1:2379 --advertise-client-urls https://10.32.xx.xx:2379 --initial-cluster-token etcd-cluster-90 --initial-cluster etcd_0=https://10.32.xx.xx:2380,etcd_1=https://10.32.xx.xx:2380,etcd_2=https://10.32.xx.xx:2380 --initial-cluster-state new --cert-file=/etc/ssl/etcd.pem --key-file=/etc/ssl/etcd-key.pem --peer-cert-file=/etc/ssl/etcd.pem --peer-key-file=/etc/ssl/etcd-key.pem --trusted-ca-file=/etc/ssl/ca.pem --peer-trusted-ca-file=/etc/ssl/ca.pem --peer-client-cert-auth=true
when i use benchmark to test the performance of my cluster, i got the result is:
`
benchmark --endpoints=https://10.32.xx.xx:2379,https://10.32.xx.xx:2379,https://10.32.xx.xx:2379 --conns=1 --clients=1 --cacert /etc/ssl/ca.pem --cert /etc/ssl/client.pem --key /etc/ssl/client-key.pem put --key-size=8 --sequential-keys --total=10000 --val-size=256
0 / 10000 B ! 0.00%INFO: 2018/11/19 09:50:12 ccResolverWrapper: sending new addresses to cc: [{https://10.32.xx.xx:2379 0
10000 / 10000 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 7m19s
Summary:
Total: 439.3978 secs.
Slowest: 0.1934 secs.
Fastest: 0.0202 secs.
Average: 0.0439 secs.
Stddev: 0.0265 secs.
Requests/sec: 22.7584
Response time histogram:
0.0202 [1] |
0.0375 [6664] |ββββββββββββββββββββββββββββββββββββββββ
0.0549 [484] |ββ
0.0722 [719] |ββββ
0.0895 [1792] |ββββββββββ
0.1068 [130] |
0.1241 [8] |
0.1415 [4] |
0.1588 [169] |β
0.1761 [26] |
0.1934 [3] |
Latency distribution:
10% in 0.0265 secs.
25% in 0.0266 secs.
50% in 0.0269 secs.
75% in 0.0634 secs.
90% in 0.0759 secs.
95% in 0.0859 secs.
99% in 0.1562 secs.
99.9% in 0.1704 secs.
`
the average QPS is only 22.7854.
i use the iperf to test the network bandwidth between my host, the result is:
`
Client connecting to 10.32.xx.xx, TCP port 5001
[ 3] local 10.32.xx.xx port 42907 connected with 10.32.xx.xx port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.0 GBytes 9.46 Gbits/sec
`
i use the hdparm to test my disk:
`
hdparm -Tt /dev/sda5
/dev/sda5:
Timing cached reads: 20444 MB in 2.00 seconds = 10231.69 MB/sec
Timing buffered disk reads: 1444 MB in 3.00 seconds = 481.31 MB/sec
`
i cann't find where is wrong, why the qps is so poor?
the version of etcd:
# etcd --version
etcd Version: 3.3.10
Git SHA: 27fc7e2
Go Version: go1.10.4
Go OS/Arch: linux/amd64
benchmark --endpoints=https://10.32.xx.xx:2379,https://10.32.xx.xx:2379,https://10.32.xx.xx:2379 --conns=1 --clients=1 --cacert /etc/ssl/ca.pem --cert /etc/ssl/client.pem --key /etc/ssl/client-key.pem put --key-size=8 --sequential-keys --total=10000 --val-size=256.
@shuangyangqian does adding --target-leader improve latency? Also you can try curl on /metrics endpoint this will give you more details on metrics like fsync. Feel free to attach metrics as a file and we can take a look.
hdparm -Tt /dev/sda5
/dev/sda5:
Timing cached reads: 20444 MB in 2.00 seconds = 10231.69 MB/sec
Timing buffered disk reads: 1444 MB in 3.00 seconds = 481.31 MB/sec
My understanding of hdparm which seems to be confirmed by the output here is that it tests reads only. Where a lot of the issues come from write latency.
Most distros have easy access to fio which gives a lot more detail. Can you post the output of below as well?
fio --randrepeat=1 \
--ioengine=libaio \
--direct=1 \
--gtod_reduce=1 \
--name=etcd-disk-io-test \
--filename=etcd_read_write.io \
--bs=4k --iodepth=64 --size=4G \
--readwrite=randrw --rwmixread=75
hi, @hexfusion , thank you for you comment.
# etcdctl --endpoints https://10.32.3.90:2379 --ca-file /etc/ssl/ca.pem --cert-file /etc/ssl/etcd.pem --key-file /etc/ssl/etcd-key.pem member list
65a37c12abfb1cf1: name=etcd_2 peerURLs=https://10.32.3.92:2380 clientURLs=https://10.32.3.92:2379 isLeader=true
7209761f3009b6cb: name=etcd_1 peerURLs=https://10.32.3.91:2380 clientURLs=https://10.32.3.91:2379 isLeader=false
bffb8bacef83e2c6: name=etcd_1 peerURLs=https://10.32.3.90:2380 clientURLs=https://10.32.3.90:2379 isLeader=falseso i use the --target-leader to test:
`# benchmark --endpoints 10.32.3.92:2379 --target-leader --cacert /etc/ssl/ca.pem --cert /etc/ssl/etcd.pem --key /etc/ssl/etcd-key.pem put --key-size 8 --sequential-keys --total=10000 --val-size=256
INFO: 2018/11/20 10:25:06 ccResolverWrapper: sending new addresses to cc: [{https://10.32.3.92:2379 0
10000 / 10000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 6m55s
Summary:
Total: 415.3413 secs.
Slowest: 0.1840 secs.
Fastest: 0.0254 secs.
Average: 0.0415 secs.
Stddev: 0.0258 secs.
Requests/sec: 24.0766
Response time histogram:
0.0254 [1] |
0.0413 [7163] |ββββββββββββββββββββββββββββββββββββββββ
0.0571 [223] |β
0.0730 [465] |ββ
0.0888 [1935] |ββββββββββ
0.1047 [22] |
0.1205 [7] |
0.1364 [6] |
0.1523 [6] |
0.1681 [169] |
0.1840 [3] |
Latency distribution:
10% in 0.0260 secs.
25% in 0.0261 secs.
50% in 0.0263 secs.
75% in 0.0626 secs.
90% in 0.0753 secs.
95% in 0.0761 secs.
99% in 0.1564 secs.
99.9% in 0.1614 secs.`
the performance seems doesn't change.
2.i use curl get the metrics with my cluster. i attach the file bellow.
metrics.txt
fio --randrepeat=1 \
--ioengine=libaio \
--direct=1 \
--gtod_reduce=1 \
--name=etcd-disk-io-test \
--filename=etcd_read_write.io \
--bs=4k --iodepth=64 --size=4G \
--readwrite=randrw --rwmixread=75
etcd-disk-io-test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.1
Starting 1 process
etcd-disk-io-test: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [m(1)][95.9%][r=165MiB/s,w=55.1MiB/s][r=42.2k,w=14.1k IOPS][eta 00m:02s]
etcd-disk-io-test: (groupid=0, jobs=1): err= 0: pid=22263: Tue Nov 20 10:32:11 2018
read: IOPS=**16.8k**, BW=65.6MiB/s (68.8MB/s)(3070MiB/46767msec)
bw ( KiB/s): min= 7680, max=173464, per=98.96%, avg=66518.55, stdev=52584.69, samples=93
iops : min= 1920, max=43366, avg=16629.62, stdev=13146.18, samples=93
write: IOPS=**5616**, BW=21.9MiB/s (23.0MB/s)(1026MiB/46767msec)
bw ( KiB/s): min= 2648, max=56952, per=98.97%, avg=22232.52, stdev=17529.32, samples=93
iops : min= 662, max=14238, avg=5558.09, stdev=4382.29, samples=93
cpu : usr=3.94%, sys=19.03%, ctx=250178, majf=0, minf=928
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwt: total=785920,262656,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=65.6MiB/s (68.8MB/s), 65.6MiB/s-65.6MiB/s (68.8MB/s-68.8MB/s), io=3070MiB (3219MB), run=46767-46767msec
WRITE: bw=21.9MiB/s (23.0MB/s), 21.9MiB/s-21.9MiB/s (23.0MB/s-23.0MB/s), io=1026MiB (1076MB), run=46767-46767msec
Disk stats (read/write):
sda: ios=773737/261609, merge=1226/237, ticks=1992810/988287, in_queue=2981641, util=99.98%
@shuangyangqian I appreciate the details a few points to note.
the majority of the records are .032ms and a fair number greater than .064ms. These seem higher then what we want to see which is below 0.016ms. My guess is this is HDD vs SDD?
# HELP etcd_disk_wal_fsync_duration_seconds The latency distributions of fsync called by wal.
# TYPE etcd_disk_wal_fsync_duration_seconds histogram
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.001"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.002"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.004"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.008"} 0
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.016"} 132
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.032"} 47595
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.064"} 56418
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.128"} 57029
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.256"} 57033
etcd_disk_wal_fsync_duration_seconds_bucket{le="0.512"} 57033
etcd_disk_wal_fsync_duration_seconds_bucket{le="1.024"} 57033
etcd_disk_wal_fsync_duration_seconds_bucket{le="2.048"} 57033
etcd_disk_wal_fsync_duration_seconds_bucket{le="4.096"} 57033
etcd_disk_wal_fsync_duration_seconds_bucket{le="8.192"} 57035
etcd_disk_wal_fsync_duration_seconds_bucket{le="+Inf"} 57035
etcd_disk_wal_fsync_duration_seconds_sum 1602.72095319499
etcd_disk_wal_fsync_duration_seconds_count 57035
read: IOPS=16.8k, BW=65.6MiB/s (68.8MB/s)(3070MiB/46767msec)
write: IOPS=5616, BW=21.9MiB/s (23.0MB/s)(1026MiB/46767msec)
This is where fio really shines IMO. let's compare these numbers to my local SSD.
read: IOPS=112k, BW=438MiB/s (459MB/s)(3070MiB/7013msec)
write: IOPS=37.5k, BW=146MiB/s (153MB/s)(1026MiB/7013msec); 0 zone resets
This is why you are getting the latency results you see, the I/O bottleneck is disk and this is why we recommend SSD vs HDD. I hope this explains things a little bit.
@hexfusion i am very appreciative with your replyγnext i will check my diskγ by the way, could you give me some links about the explanation of the metrics of etcd. such as the etcd_disk_wal_fsync_duration_seconds_bucket you mentioned aboveγ
@shuangyangqian sure please review links in https://github.com/etcd-io/etcd/issues/10276 and let us know if you have any further questions.