caddy -version)?v0.8.3 / v0.9.0
OS Windows Server 2012 R2 running on Azure instance 'Standard D3 v2' (4 cores, 14 GB memory)
Measure performance of caddy reverse proxy middleware using Apache Bench.
caddyfile_proxy:
http:// {
errors CaddyErrors.log
header / {
-Server
}
proxy / localhost:580 {
proxy_header Host {host}
proxy_header X-Real-IP {remote}
proxy_header X-Forwarded-Proto {scheme}
}
}
caddyfile_upstream:
http://:580 {
errors CaddyErrors_upstream.log
header / {
-Server
}
root WebRoot
}
The WebRoot folder contains file index.html:
<!DOCTYPE html><html><head><meta charset="utf-8"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="width=device-width,initial-scale=1"><title>1234567 12345</title></head><body></body></html>
caddy.exe -conf=Caddyfile_proxy
and then in a new shell:
caddy.exe -conf=Caddyfile_upstream
Rationale
We would like to use caddy as a simple reverse proxy for one of our backend services. To measure performance of the proxy, I used Apache Bench and got sub-optimal results. For the benchmark I created a setup with one caddy instance acting as the proxy and another caddy instance representing the backend (upstream).
To establish a baseline for caddy performance, I ran ab.exe -n 1000000 -c 1000 -k http://remote-machine-running-caddy:580/ from another machine against caddy v0.8.3. I got the following result (one of three runs, the other results were pretty similar):
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking *** (be patient)
Server Software:
Server Hostname: ***
Server Port: 580
Document Path: /
Document Length: 224 bytes
Concurrency Level: 1000
Time taken for tests: 90.600 seconds
Complete requests: 1000000
Failed requests: 0
Keep-Alive requests: 1000000
Total transferred: 433000000 bytes
HTML transferred: 224000000 bytes
Requests per second: 11037.47 [#/sec] (mean)
Time per request: 90.600 [ms] (mean)
Time per request: 0.091 [ms] (mean, across all concurrent requests)
Transfer rate: 4667.21 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 18
Processing: 0 90 15.3 94 692
Waiting: 0 90 15.3 94 692
Total: 0 90 15.3 94 692
Percentage of the requests served within a certain time (ms)
50% 94
66% 94
75% 94
80% 94
90% 109
95% 109
98% 125
99% 125
100% 692 (longest request)
Then I started the proxy and ran ab.exe -n 1000000 -c 1000 -k http://remote-machine-running-caddy/. I got the following result (again one of three runs):
Server Software:
Server Hostname: ***
Server Port: 80
Document Path: /
Document Length: 224 bytes
Concurrency Level: 1000
Time taken for tests: 236.380 seconds
Complete requests: 1000000
Failed requests: 0
Keep-Alive requests: 1000000
Total transferred: 448000000 bytes
HTML transferred: 224000000 bytes
Requests per second: 4230.47 [#/sec] (mean)
Time per request: 236.380 [ms] (mean)
Time per request: 0.236 [ms] (mean, across all concurrent requests)
Transfer rate: 1850.83 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 20
Processing: 0 236 1088.9 203 62405
Waiting: 0 236 1088.9 203 62405
Total: 0 236 1088.9 203 62405
Percentage of the requests served within a certain time (ms)
50% 203
66% 203
75% 204
80% 219
90% 234
95% 250
98% 283
99% 328
100% 62405 (longest request)
I re-ran the tests against Caddy v0.9.0 and got even worse results. Without proxy:
Server Software:
Server Hostname: ***
Server Port: 580
Document Path: /
Document Length: 224 bytes
Concurrency Level: 1000
Time taken for tests: 147.463 seconds
Complete requests: 1000000
Failed requests: 0
Keep-Alive requests: 1000000
Total transferred: 456000000 bytes
HTML transferred: 224000000 bytes
Requests per second: 6781.37 [#/sec] (mean)
Time per request: 147.463 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 3019.83 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 18
Processing: 0 147 65.4 141 1419
Waiting: 0 147 65.4 141 1419
Total: 0 147 65.4 141 1419
Percentage of the requests served within a certain time (ms)
50% 141
66% 156
75% 172
80% 177
90% 205
95% 250
98% 322
99% 368
100% 1419 (longest request)
Unfortunately I could not execute a run with proxy as it did not complete (see Failed requests), most likely because of issue #938:
Server Software:
Server Hostname: ***
Server Port: 80
Document Path: /
Document Length: 224 bytes
Concurrency Level: 1000
Time taken for tests: 90.625 seconds
Complete requests: 1000000
Failed requests: 970161
(Connect: 0, Receive: 0, Length: 970161, Exceptions: 0)
Non-2xx responses: 970161
Keep-Alive requests: 1000000
Total transferred: 207116208 bytes
HTML transferred: 22206512 bytes
Requests per second: 11034.47 [#/sec] (mean)
Time per request: 90.625 [ms] (mean)
Time per request: 0.091 [ms] (mean, across all concurrent requests)
Transfer rate: 2231.85 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 19
Processing: 0 90 186.2 49 3207
Waiting: 0 90 186.2 49 3207
Total: 0 90 186.2 49 3207
Percentage of the requests served within a certain time (ms)
50% 49
66% 94
75% 95
80% 109
90% 125
95% 150
98% 406
99% 970
100% 3207 (longest request)
Is this kind of performance expected?
I realised the poor performance may be due to CPU contention between two caddy instances so I have repeated the tests against caddy_upstream being on another machine. The results are better, however the performance drop is still about 50%.
Baseline results for caddy_upstream v0.8.3 without proxy, command .\ab.exe -n 50000 -c 1000 -k http://remote-machine-upstream/ (note reduced concurrency and request number because of issue #938):
Concurrency Level: 1000
Time taken for tests: 5.597 seconds
Complete requests: 50000
Failed requests: 0
Keep-Alive requests: 50000
Total transferred: 21650000 bytes
HTML transferred: 11200000 bytes
Requests per second: 8932.93 [#/sec] (mean)
Time per request: 111.945 [ms] (mean)
Time per request: 0.112 [ms] (mean, across all concurrent requests)
Transfer rate: 3777.30 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 16
Processing: 0 106 33.6 109 535
Waiting: 0 106 33.6 109 535
Total: 0 106 33.6 109 535
Percentage of the requests served within a certain time (ms)
50% 109
66% 109
75% 109
80% 109
90% 109
95% 110
98% 125
99% 267
100% 535 (longest request)
And results with proxy, command .\ab.exe -n 50000 -c 1000 -k http://remote-machine-running-caddy/:
Concurrency Level: 1000
Time taken for tests: 11.020 seconds
Complete requests: 50000
Failed requests: 0
Keep-Alive requests: 50000
Total transferred: 22400000 bytes
HTML transferred: 11200000 bytes
Requests per second: 4537.37 [#/sec] (mean)
Time per request: 220.392 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 1985.10 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 16
Processing: 16 162 92.4 156 3304
Waiting: 16 162 92.4 156 3304
Total: 16 162 92.4 156 3304
Percentage of the requests served within a certain time (ms)
50% 156
66% 187
75% 203
80% 207
90% 240
95% 273
98% 375
99% 461
100% 3304 (longest request)
Edit Results for caddy v0.9.0:
Baseline without proxy:
Concurrency Level: 1000
Time taken for tests: 8.428 seconds
Complete requests: 50000
Failed requests: 0
Keep-Alive requests: 50000
Total transferred: 22800000 bytes
HTML transferred: 11200000 bytes
Requests per second: 5932.95 [#/sec] (mean)
Time per request: 168.550 [ms] (mean)
Time per request: 0.169 [ms] (mean, across all concurrent requests)
Transfer rate: 2642.02 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 16
Processing: 0 162 81.0 156 2728
Waiting: 0 162 81.0 156 2728
Total: 0 162 81.0 156 2728
Percentage of the requests served within a certain time (ms)
50% 156
66% 172
75% 188
80% 203
90% 235
95% 285
98% 358
99% 404
100% 2728 (longest request)
And with proxy:
Concurrency Level: 1000
Time taken for tests: 26.484 seconds
Complete requests: 50000
Failed requests: 0
Keep-Alive requests: 50000
Total transferred: 23550000 bytes
HTML transferred: 11200000 bytes
Requests per second: 1887.95 [#/sec] (mean)
Time per request: 529.675 [ms] (mean)
Time per request: 0.530 [ms] (mean, across all concurrent requests)
Transfer rate: 868.38 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 19.0 0 3009
Processing: 16 404 653.0 313 12093
Waiting: 16 404 653.0 313 12093
Total: 16 404 653.6 313 12093
Percentage of the requests served within a certain time (ms)
50% 313
66% 359
75% 394
80% 418
90% 473
95% 531
98% 740
99% 3317
100% 12093 (longest request)
Ping between bench, proxy and upstream machines is max 2 ms.
Mind using https://github.com/wg/wrk or https://github.com/rakyll/boom ?
@abiosoft Thank you for suggesting other tools, I know ab is not exactly state of the art. I repeated tests with boom (again with proxy and upstream being different machines) and the performance differences look relatively the same.
Baseline for Caddy v0.8.3 without proxy (.\boom.exe -n 50000 -c 1000 http://remote-machine-upstream/):
Summary:
Total: 5.5015 secs
Slowest: 5.2515 secs
Fastest: 0.0000 secs
Average: 0.1024 secs
Requests/sec: 9088.4154
Total data: 11200000 bytes
Size/request: 224 bytes
Status code distribution:
[200] 50000 responses
Response time histogram:
0.000 [1536] |∎
0.525 [48399] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.050 [7] |
1.575 [6] |
2.101 [5] |
2.626 [6] |
3.151 [5] |
3.676 [5] |
4.201 [5] |
4.726 [5] |
5.252 [21] |
Latency distribution:
10% in 0.0625 secs
25% in 0.0937 secs
50% in 0.0938 secs
75% in 0.1094 secs
90% in 0.1108 secs
95% in 0.1406 secs
99% in 0.2812 secs
And with proxy (.\boom.exe -n 50000 -c 1000 http://remote-machine-with-caddy/):
Summary:
Total: 11.3033 secs
Slowest: 9.0447 secs
Fastest: 0.0000 secs
Average: 0.1403 secs
Requests/sec: 4423.4731
Total data: 11200000 bytes
Size/request: 224 bytes
Status code distribution:
[200] 50000 responses
Response time histogram:
0.000 [1790] |∎
0.904 [47981] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.809 [0] |
2.713 [0] |
3.618 [13] |
4.522 [215] |
5.427 [0] |
6.331 [0] |
7.236 [0] |
8.140 [0] |
9.045 [1] |
Latency distribution:
10% in 0.0312 secs
25% in 0.0781 secs
50% in 0.1250 secs
75% in 0.1719 secs
90% in 0.2031 secs
95% in 0.2343 secs
99% in 0.3906 secs
Baseline Caddy v0.9.0 without proxy:
Summary:
Total: 10.1248 secs
Slowest: 9.8588 secs
Fastest: 0.0000 secs
Average: 0.1360 secs
Requests/sec: 4938.3901
Total data: 11200000 bytes
Size/request: 224 bytes
Status code distribution:
[200] 50000 responses
Response time histogram:
0.000 [5909] |∎∎∎∎∎
0.986 [43676] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.972 [10] |
2.958 [47] |
3.944 [38] |
4.929 [7] |
5.915 [67] |
6.901 [31] |
7.887 [93] |
8.873 [0] |
9.859 [122] |
Latency distribution:
25% in 0.0156 secs
50% in 0.0781 secs
75% in 0.1250 secs
90% in 0.1562 secs
95% in 0.1875 secs
99% in 0.3947 secs
And with proxy:
Summary:
Total: 23.5891 secs
Slowest: 11.4876 secs
Fastest: 0.0000 secs
Average: 0.2954 secs
Requests/sec: 2119.6269
Total data: 11200000 bytes
Size/request: 224 bytes
Status code distribution:
[200] 50000 responses
Response time histogram:
0.000 [642] |
1.149 [48942] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
2.298 [96] |
3.446 [26] |
4.595 [46] |
5.744 [27] |
6.893 [12] |
8.041 [15] |
9.190 [15] |
10.339 [15] |
11.488 [164] |
Latency distribution:
10% in 0.0797 secs
25% in 0.1718 secs
50% in 0.2344 secs
75% in 0.3125 secs
90% in 0.4062 secs
95% in 0.4687 secs
99% in 0.7049 secs
does https://github.com/mholt/caddy/pull/880 play any role in this?
@tomasdeml
With #984, you should be able to pass the ab benchmarks by increasing the keepalive directive in your proxy. By default it is 2, and you should increase this depending on how many concurrent connections you expect. I'm not 100% sure what the correct value is here (I'm not super familiar with the pooling code in net/http/transport.go) so it will take some testing.
Feel free to reopen this - if you do, I'd also like to see some tests with nginx on the same hardware.
Most helpful comment
@tomasdeml
With #984, you should be able to pass the ab benchmarks by increasing the
keepalivedirective in your proxy. By default it is 2, and you should increase this depending on how many concurrent connections you expect. I'm not 100% sure what the correct value is here (I'm not super familiar with the pooling code in net/http/transport.go) so it will take some testing.Feel free to reopen this - if you do, I'd also like to see some tests with nginx on the same hardware.