Considering the necessity that we have to migrate to Microsoft and to create conditions to have unit tests in the web server, it is necessary to migrate the connection pool from epool for something more generic like libuv or something more complete like libwebsockets or libh2o, these last two libraries have the libuv as their basis.
This SPIKE will map all the functions and necessaries steps to do the migration.
@mfundul considering the issue that you created, you gave me the idea to write this spike as a documentation that you help not only you, but all future contributors to give new features for our webserver.
To have the road map to migrate the web server and our libsockets to any operating system.
The next table maps the relationship between Netdata functions and the respective functions inside LibH2O and LibWebSockets, when there is not any relationship between the functions, a dash(-) signal will be used.
Something very important to do when we compare the table results, it is not to do a direct association between functions, the functions are compared in the context that they bring the same functionality, but this does not mean that everything we have inside Netdata function will be in the other libraries.
| Socket action | Netdata File | Netdata Function | Lib H2O | Lib Websocket |
| --------------------------- | ----------------- | ------------------------- | ------------ | --------------------- |
| Create Socket | web/server/web_server.c | api_listen_sockets_setup | - | - |
| Create Socket | libnetdata/socket/socket.c | listen_sockets_setup | - | - |
| Create Socket | libnetdata/socket/socket.c | bind_to_this | - | - |
| Create Socket | libnetdata/socket/socket.c | create_listen_socketsX, create_listen_socket_unix | uv_ipX_addr, uv_tcp_bind, uv_listen | lws_create_context, lws_create_vhost |
| Create Socket | libnetdata/socket/socket.c | listen_add_sockets | - | - |
| Threads | web/server/static/static-threaded.c | socket_listen_main_static_threaded | - | - |
| Threads | libnetdata/socket/socket.c | security_start_ssl | OpenSSL API | lws_create_context, lws_create_vhost |
| Threads | libnetdata/threads/threads.c | netdata_thread_create | uv_thread_create | uv_thread_create |
| Threads | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker_cleanup | - | - |
| Threads | web/server/web_client_cache.c | web_client_cache_destroy | - | - |
| Add socket to pool | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker | - | - |
| Add socket to pool | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker | uv_run | - |
| Add socket to pool | libnetdata/socket/socket.c | poll_events | h2o_uv_socket_create | - |
| Accept client connection | libnetdata/socket/socket.c | poll_add_fd | h2o_accept | lws_service |
| Add socket to pool | libnetdata/socket/socket.c | poll_event_process | uv_accept | lws_service |
| Close socket | web/server/static/static-threaded.c | socket_listen_main_static_threaded_cleanup | - | - |
| Close socket | libnetdata/socket/socket.c | listen_sockets_close | - | - |
| Remove socket from poll | web/server/static/static-threaded.c | web_server_file_del_callback | uv_close | - |
| Remove socket from poll | web/server/static/static-threaded.c | web_server_del_callback | uv_close and h2o callback | - |
| Remove socket from poll | web/server/web_client_cache.c | web_client_release | uv_close | lws_service |
| Remove socket from poll | web/server/web_client.c | web_client_request_done | uv_close | lws_service |
| Remove socket from poll | daemon/global_statistic.c | web_client_disconnected | - | - |
| Remove socket from poll | web/server/web_client_cache.c | web_client_free | - | - |
| Web Server Action | Netdata File | Netdata Function | Lib H2O | Lib Websocket |
| --------------------------- | ----------------- | ------------------------- | ------------ | --------------------- |
| Read socket | web/server/static/static-threaded.c | web_server_rcv_callback | PicoHTTPParser library | lws_callback_function |
| Read socket | web/server/web_client.c | web_client_receive | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_process_request | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | http_request_validate | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_is_request_complete | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_valid_method | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_find_protocol | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | http_header_parser | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_split_path_query | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_map_query_string | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_decode_r | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_parse_query_string | PicoHTTPParser library | lws_callback_function |
| Process request | streaming/rrdpush.c | rrdpush_receiver_thread_spawn | A callback to libuv | lws_callback_function |
| Process request | web/server/web_client.c | web_client_process_url | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_switch_host | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | mysendfile | h2o_send | lws_callback_function |
| Process request | web/server/web_client.c | web_client_api_request | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1 | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_info | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_data | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_chart | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | api/formaters/rrd2json.c | rrd_stats_api_v1_chart | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_charts | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/formatters/chart2json.c | charts2jon | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_registry | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | registry/registry.c | registry_request_* | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/badges/web_buffer_svg.c | web_client_api_request_v1_badge | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarms | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_alarms2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_log | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_alarm_log2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_variables | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | database/rrdvar.c | health_api_v1_chart_variables2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_count | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_aggregate_alarms | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/allmetrics.c | web_client_api_request_v1_allmetrics | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/shell/allmetrics_shell.c | rrd_stats_api_v1_charts_allmetrics_json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/shell/allmetrics_shell.c | rrd_stats_api_v1_charts_allmetrics_shell | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | backends/prometheus/backend_prometeus.c | rrd_stats_api_v1_charts_allmetrics_prometheus_single_host | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | backends/prometheus/backend_prometeus.c | rrd_stats_api_v1_charts_allmetrics_prometheus_all_hosts | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/health/health_cmdapi.c | web_client_api_request_v1_mgmt_health | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | libnetdata/clocks/clocks.c | now_realtime_timeval | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | now_realtime_timeval | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/static/static-threaded.c | web_server_snd_callback | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | web_client_send | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | web_client_send_data | h2o_send_data | lws_http_transaction_completed,ltws_callback_http_dummy |
libh2o: uv ssl crypto z
libwebsockets: uv ssl crypto z cap
I wrote a simple web server that reads files from a directory and also reads /etc/netdata/netdata.conf every request is done in a security TLS channel.
The following problems were found in the test with the libraries:
pthread.h inside the file /usr/local/include/h2o/multithread.h , so this does not allow mingw-w64 to compile and probably other windows compilers.We can do the following steps to use the libraries and overcome the problems:
switch and if statement to avoid waste time testing all the options, no less important we can reduce the number of tests related to the reasons available here.To finish this SPIKE I decided to run a simple benchmark with both servers written in the previous post to compare the results, I wrote a shell script that executes a simple request for the index page in a TLS tunnel, but it does this in a loop, initially I set only 40 requests to test and I got the following results:
real 0m2.047s
user 0m0.376s
sys 0m0.104s
real 0m0.581s
user 0m0.394s
sys 0m0.128s
After this initial result, I decided to move in front with the libwebsockets only, and I ran another test, but this time with 10000 requests:
real 2m18.088s
user 1m36.917s
sys 0m28.008s
doing a simple math we can see that for a very simple request with an equally simple web server, we are having around 72 requests per second.
After to get the results from the previous tests, I decided to do more a benchmark, this time I decided to do download of dashboard.js present on Sunday, September 29th using both Netdata and the simple web server linked with libwebsockets, this time I decided to go directly in the loop with 10000 requests:
real 2m7.123s
user 1m26.926s
sys 0m27.753s
real 2m19.897s
user 1m31.669s
sys 0m27.150s
The fact that Netdata is slower than libwebsockets in this specific example it is hoped, because Netdata is a big code, while the test code has only 421 lines of code, what really matter here is the fact that Netdata was more or less 38 seconds slower only.
a simple benchmark with both servers written in the previous post to compare the results, I wrote a shell script that executes a simple request for the index page in a TLS tunnel, but it does this in a loop, initially I set only 40 requests
LibH2O
real 0m2.047s
user 0m0.376s
sys 0m0.104s
It looks strange to me, could you share your benchmark code?
cc @underhood
libh2o should be very fast
https://h2o.examp1e.net/assets/staticfile612-nginx1910-h2o170.png
I agree Ilya, I will talk with him.
This is not a real web server, it is only a simple test, libh20 does not have a shared library by default and its binary was 9 times bigger than libwebsockets binary, this is something to consider for sure.
I did another benchmark, this time I used the h2o web server compiled and I adjust it to use SSL v1.2.
I again requested 10000 times the file dashboard.js that I use in a specific computer, and I got the following results:
real 2m5.329s
user 1m27.190s
sys 0m27.122s
As we can see H2O is faster than Netdata.
So the initial benchmark had problem related to original code linked with H2O that brings a problem related with their examples, because they were the basis for I build the software here.
Final decision what library to use is pending. @thiagoftsm please define the final result of the investigation
What is the time-frame for decision here? IMHO this is not decision to be taken lightly as it will possibly go with netdata long time.
@thiagoftsm can you share your benches as @ilyam8 mentioned? Maybe create a git repo with them.
Don't worry much about code quality with the tests (point is to benchmark not to see flawless code).
Point is that other people can take a look as well potentially spotting some accidental bottlenecks in the benches.
Do you run all tests single-threaded? e.g. 10000 requests one by one? Each request is one TCP conn or you use keep-alive?
Hi @underhood ,
We are finishing the SPIKE today, but we can continue working on this during the next 15 days, because I am not plaining to bring this for Netdata before this interval, I have other things to do in the next two weeks.
Initially I ran in sequence, but I will publish more benchmarks in the sequence that was done in concurrency. Until this POST everything was done with one TCP connection per request, I did not use keep-alive any time.
I am bringing another benchmark done with libwebsockets, Netdata and H2O. In this benchmark, I added an '&' at the end of the curl command line to put some process to run in parallel, I let the script to put 240 process to run before to sleep 1 second and run more 240
process. Doing these requests I got the following results for each webserver:
real 1m19.062s
user 2m23.110s
sys 0m40.969s
real 1m2.021s
user 2m17.259s
sys 0m38.610s
real 1m29.491s
user 2m21.370s
sys 0m37.072s
After to run some tests with curl and bash script, I decided to use specific tools to do benchmarks, the first test was done with Apache HTTP server benchmarking tool(ab) and I got the following results:
bash-5.0$ ab -t 100 -n 10000 -c 397 "https://localhost:7891/dashboard.js"
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: localhost
Server Port: 7891
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key: X25519 253 bits
TLS Server Name: localhost
Document Path: /dashboard.js
Document Length: 387033 bytes
Concurrency Level: 397
Time taken for tests: 18.121 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 3874780000 bytes
HTML transferred: 3870330000 bytes
Requests per second: 551.83 [#/sec] (mean)
Time per request: 719.421 [ms] (mean)
Time per request: 1.812 [ms] (mean, across all concurrent requests)
Transfer rate: 208811.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 500 745.5 266 7371
Processing: 16 199 110.7 204 517
Waiting: 1 84 65.7 108 332
Total: 31 698 766.2 512 7701
Percentage of the requests served within a certain time (ms)
50% 512
66% 546
75% 613
80% 704
90% 1459
95% 1572
98% 3282
99% 3587
100% 7701 (longest request)
bash-5.0$ ab -t 100 -n 10000 -c 397 https://localhost:19999/dashboard.js
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: NetData
Server Hostname: localhost
Server Port: 19999
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key: X25519 253 bits
TLS Server Name: localhost
Document Path: /dashboard.js
Document Length: 387033 bytes
Concurrency Level: 397
Time taken for tests: 8.795 seconds
Complete requests: 10000
Failed requests: 131
(Connect: 0, Receive: 0, Length: 131, Exceptions: 0)
Total transferred: 3847547221 bytes
HTML transferred: 3844057221 bytes
Requests per second: 1137.03 [#/sec] (mean)
Time per request: 349.156 [ms] (mean)
Time per request: 0.879 [ms] (mean, across all concurrent requests)
Transfer rate: 427223.83 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 176 345.9 59 1482
Processing: 16 167 57.2 171 340
Waiting: 1 87 48.0 76 250
Total: 19 343 360.7 233 1663
Percentage of the requests served within a certain time (ms)
50% 233
66% 261
75% 292
80% 323
90% 1228
95% 1373
98% 1411
99% 1433
100% 1663 (longest request)
bash-5.0$ ab -t 100 -n 10000 -c 397 https://localhost:8081/netdata.conf
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software: h2o/2.3.0-DEV
Server Hostname: localhost
Server Port: 8081
SSL/TLS Protocol: TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key: X25519 253 bits
TLS Server Name: localhost
Document Path: /netdata.conf
Document Length: 251809 bytes
Concurrency Level: 397
Time taken for tests: 12.774 seconds
Complete requests: 10000
Failed requests: 0
Total transferred: 2520630000 bytes
HTML transferred: 2518090000 bytes
Requests per second: 782.82 [#/sec] (mean)
Time per request: 507.139 [ms] (mean)
Time per request: 1.277 [ms] (mean, across all concurrent requests)
Transfer rate: 192695.92 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 2 32 46.6 21 331
Processing: 59 473 92.3 447 741
Waiting: 1 8 9.3 5 223
Total: 64 505 101.4 472 880
Percentage of the requests served within a certain time (ms)
50% 472
66% 509
75% 556
80% 587
90% 654
95% 702
98% 758
99% 802
100% 880 (longest request)
I will call attention here that the fact I am using the number 397 in the concurrency option, because with a big number it is necessary to change the source code of libwebsockets to increase the option backlog of the function listen().
Just a quick comment during my lunch break:
when i was benchmarking my mempool PR I also noticed 2 things:
First I thought it is my PR breaking things but then i compiled Master and saw the same thing.
I am glad I saw someone else notice this problem as well.
I didn't want to report becuase I was not yet sure it's not my VM breaking things etc.
Anyhow the eye of Sauron has noticed already (i have it already on my personal todo list of things to investigate).
Additionally i noticed netdata seems to not close all threads properly at least some times - not confirmed yet
Finally I am bringing a last benchmark from wrk, this was a tool presented to me by Ilya(Thank you very much @ilyam8 !).
bash-5.0$ wrk -t4 -c397 -d5m https://localhost:7891/dashboard.js
Running 5m test @ https://localhost:7891/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 467.83ms 44.10ms 774.63ms 87.63%
Req/Sec 212.15 77.76 474.00 73.28%
253426 requests in 5.00m, 91.45GB read
Requests/sec: 844.53
Transfer/sec: 312.08MB
bash-5.0$ wrk -t4 -c397 -d5m https://localhost:19999/dashboard.js
Running 5m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 568.10ms 368.39ms 2.00s 72.14%
Req/Sec 86.66 86.82 600.00 85.78%
69150 requests in 5.00m, 24.98GB read
Socket errors: connect 76, read 112, write 0, timeout 679
Requests/sec: 230.47
Transfer/sec: 85.25MB
bash-5.0$ wrk -t4 -c397 -d5m https://localhost:8081/dashboard.js
Running 5m test @ https://localhost:8081/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 118.57ms 66.72ms 730.86ms 69.25%
Req/Sec 844.59 341.29 2.19k 67.88%
1003814 requests in 5.00m, 362.10GB read
Requests/sec: 3345.42
Transfer/sec: 1.21GB
In this post I am bringing the simple version of the files used to test and understand the libraries.
The benchmarks with H2O server proof that H2O has an awesome basis, but the fact that it is not simple to port it to Microsoft, and no less important the fact that it only has static library by default will increase Netdata binary size around 60%. Another problem is the fact that H2O examples gave me an initial wrong impression from the library, so it is necessary to go deep in the web server code to understand better how to use the library. I am not saying it is a problem to study H2O code, but the fact that we had a small example in the source code that is almost 25 times slower than the WebServer is something preety bad.
Libwebsockets does not have the same performance that H2O had, on the other hand it was prepared to run in different Operate Systems, it also has a better documentation and it examples work fine.
Considering the results and the fact that we wanna improve Netdata quality, I think to move with any library will move us in the correct direction, but the fact that H2O static library when linked raises the binary size a lot and it is necessay to do explicit calling for libuv while libwebsockets simplified everything to us, I think libwebsockets will improve Netdata and it will simplify the transaction to other Operate Systems.
Apologies for noticing just now, but the benchmark was incorrect. netdata.conf is a static file for the other two, but not for netdata, which generates the output. Please repeat them with something truly static, like dashboard.js. The difference is enormous, as you can see below:
[christopher@chris-msi netdata]$ wrk -t4 -c397 -d5s http://localhost:19999/netdata.conf
Running 5s test @ http://localhost:19999/netdata.conf
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 962.74ms 423.82ms 1.94s 60.81%
Req/Sec 110.12 111.51 570.00 85.37%
1879 requests in 5.03s, 1.02GB read
Socket errors: connect 0, read 0, write 0, timeout 6
Requests/sec: 373.59
Transfer/sec: 207.80MB
[christopher@chris-msi netdata]$ wrk -t4 -c397 -d5s http://localhost:19999/dashboard.js
Running 5s test @ http://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 19.88ms 10.45ms 63.96ms 64.11%
Req/Sec 3.07k 294.27 4.32k 76.56%
60991 requests in 5.05s, 22.01GB read
Requests/sec: 12073.95
Transfer/sec: 4.36GB
I'd also like to see cpu and memory netdata charts during the execution of the benchmark.
Performance and efficiency are crucial for us, we won't use a library that will degrade them.
No problem, I am glad you found the problem and I am able now to generate the correct results, I already could see in the first round of tests that we had an improvement with Netdata results as expected.
I decided do not create other comments with the newest results, I am updating the previous results of request benchmark where I am using dashboard.js for the current day, but I saved the old results with netdata.conf, when I finish the update with the access benchmark, I will generate a report with the cpu and memory benchmark.
I used Netdata with dbengine set as memory mode to measure CPU usage, for sure this would not be good for Netdata, because I am using itself to measure its performance, but I decided to move in front anyway, because I trust the results and due the fact we are open source, everybody can verify that there is not any kind of BIAS in the result.
This benchmark was done with wrk.



The CPU usage shows a better result of libwebsockets when compared with h2o.
The final benchmark was done again with wrk doing requests for all three webservers, here we are measuring the memory usage of each webserver.



The memory management of both libwebsockets and H2O in this case was done by libuv, H2O is well known to do not call the kernel many times to allocate memory.
Considering the latest results, I think we cannot discard the possibility to bring the general idea from both libraries to improve Netdata core instead to bring libraries, but for sure I keep my vision that libwebsockets is the best option, mainly considering the fact that h2o failed to compile with mingw.
Considering the latest results, I think we cannot discard the possibility to bring the general idea from both libraries to improve Netdata core instead to bring libraries, but for sure I keep my vision that libwebsockets is the best option, mainly considering the fact that h2o failed to compile with
mingw.
We can't use those libraries at all. Taking 4 CPU cores just to serve the web requests is a big no. The memory benchmark shows horrible performance too. We need to remember that netdata is doing a lot of other things as well, not just serving that static content. You'd need to set it to memory mode RAM and disable all collectors to have really comparable results.
So I say we forget about what those libraries do and incrementally improve netdata.
Actually, I believe we could easily take this one step further. We canadd a blog post and a link to that post from docs/Performance.md, to showcase how performant netdata really is.
With only the apps plugin enabled and memory mode RAM with the shortest possible history, we can run the benchmarks again and put the results in a blog post. @joelhans can help with this, let's just have another, fair run for netdata with the exact same test. The tests for the other libraries don't need to be repeated.
We can use their ideas instead the library, mainly because H2O has URL parser more concentrated and it does not destroy the original request .
@underhood is also working in something that can improve our web server, our buffer management is something that did not allow Netdata be better in this benchmark.
@joelhans any doubt you have let me know.
Joel will just wait for the updated performance metrics and charts after you repeat the test on netdata,
With only the apps plugin enabled and memory mode RAM with the shortest possible history
Update the ticket after the test and he can prepare the blog post.
@cakrit I do not agree that in this tests High CPU usage is a bad thing. If we do a max load test if server is able to finish quicker by better using available resources it is better (therefore we want it to see to finish X requests in parallel as quickly as possible, therefore if it is able to use more cores to do that the better, not getting blocked by IO or by other threads of self etc.). Of course we can then limit the usage in netdata so it doesn't allow all cores to be used. But quicker it is able to process the request the better (and quicker usually means it can use all the availible CPU power better). Remember these tests do not simulate normal usage as netdata is used but represent maximum load we are able to generate to stress test it.
So if server is able to finish X requests quicker and generates higher CPU load -> GOOD, if it generates higher CPU load but takes same time for those X requests ->BAD
@cakrit I also have to investigate the netdata thing where webserver stops responding when under high parallel load. Something I noticed and @thiagoftsm seen something similar in the tests. I will try to find as much as possible but y'all know my situation until 1st Jan. I think this is something that needs seriously get investigated.
@cakrit Regarding
So I say we forget about what those libraries do and incrementally improve netdata.
That might be a way to go but only as long as we want to support just HTTP/1.1, but there is HTTP/2 already, websockets and HTTP/3 on the way. Therefore using libs is much more future proof (as features in daemon grow) implementing all that will be way too much work IMHO to be done by Netdata (following all the standards, testing with all possible browsers/configs, security matters) and is best left to people who want to develop web servers - not what netdata wants. I would rather prefer netdata devs focus on netdata instead of having their hands full with developing yet another HTTP server. I might change my mind if benches really show that performance price is way too high but sacrificing some performance for all those features might be actually good thing.
So if server is able to finish X requests quicker and generates higher CPU load -> GOOD, if it generates higher CPU load but takes same time for those X requests ->BAD
I'd argue that this assessment only holds true if the ratio of time to CPU load is the same or lower for the faster completion. If we can finish some set number of requests 20% faster, but cause 50% more CPU load while doing so, we've not improved, we've actually gotten worse.
@Ferroin
Yep naturally, point I was trying to make is that lib being able to use 100% of each core to max is not necessarily bad thing.
I can add billionth digit of Pi calculation into request handler then add some sleeps and mutexes in between everywhere so then CPU load will be low while actually taking much more CPU cycles per request and more time.
I looked at the response times and requests/sec after @underhood 's comments and we're comparing apples with oranges again.
The reason the CPU usage is so low and the response time and throughput so high compared to the other two is that the following settings were left with the default values:
[web]
web server threads = 4
web server max sockets = 512
So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.
Of course the reason we started investigating the other libraries was to not have to do everything ourselves. Let's do a proper benchmark and, if it's the performance hit is acceptable (if it even exists), by all means we go with one of the options.
There is a huge difference between netdata and any normal web server. netdata should not use all available resources, except perhaps when running containerized, with user-set container resource limits. This is a crucial feature and you can see at the performance doc and the netdata for IoT doc how important it is to netdata.
I will work on the benchmarks again today, I did not release newest versions of the benchmark yet, because I began to receive a great number of gaps in some charts when I was using H2O and Apache and I raised the number of requests of the benchmark. After a research and a brief talk with @underhood and Ilya , we arrived in the conclusion that was not a bug, but we had the gaps, due the fact the webservers were not allowing other software to get a slice of time to process.
I looked at the response times and requests/sec after @underhood 's comments and we're comparing apples with oranges again.
The reason the CPU usage is so low and the response time and throughput so high compared to the other two is that the following settings were left with the default values:
[web] web server threads = 4 web server max sockets = 512So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.
Of course the reason we started investigating the other libraries was to not have to do everything ourselves. Let's do a proper benchmark and, if it's the performance hit is acceptable (if it even exists), by all means we go with one of the options.
There is a huge difference between netdata and any normal web server. netdata should not use all available resources, except perhaps when running containerized, with user-set container resource limits. This is a crucial feature and you can see at the performance doc and the netdata for IoT doc how important it is to netdata.
@cakrit agreed with apples to oranges and that is why i suggested making a repo with benchmarks so it is more clear what and exactly how it is tested.
@cakrit Regarding the CPU use: From your response I can see I did not describe what I mean so well. It is clear that Netdata should not use all available resources and that was not my point at all.
@thiagoftsm: glad my hunch on the gaps got confirmed and I was able to help.
The next group of Benchmarks I am beginning to bring here were done in a different computer, to confirm that the results are not related to hardware, I used the following environment now:
memory mode = ram , web server threads = 4 and web server max sockets = 512 dashboard.js from Netdata on the current date in all the servers during the requests.Firstly I am bringing the CPU results:

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 470.02ms 268.54ms 1.88s 69.03%
Req/Sec 99.10 92.81 660.00 83.52%
151684 requests in 8.00m, 54.73GB read
Socket errors: connect 1, read 25, write 0, timeout 0
Requests/sec: 315.95
Transfer/sec: 116.74MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 426.90ms 3.91ms 443.09ms 91.79%
Req/Sec 245.62 184.92 740.00 60.71%
444826 requests in 8.00m, 160.52GB read
Requests/sec: 926.60
Transfer/sec: 342.41MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 80.35ms 97.27ms 1.60s 98.38%
Req/Sec 1.38k 416.42 3.13k 71.05%
2588120 requests in 8.00m, 0.91TB read
Requests/sec: 5390.83
Transfer/sec: 1.94GB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 120.26ms 31.74ms 493.43ms 70.92%
Req/Sec 727.09 209.04 2.35k 71.51%
1386740 requests in 8.00m, 500.25GB read
Socket errors: connect 0, read 1046044, write 0, timeout 0
Requests/sec: 2888.60
Transfer/sec: 1.04GB
The gaps presented in the charts were due the fact Netdata could not get time from processor every seconds due the high sample of data and number of process/threads opened by the servers.
I am splitting the results for CPU and Memory to avoid a long comment, now I am bringing the memory results for memory of the benchmark




The next Benchmark had the following change in Netdata configuration:
web server threads = 8With the double of threads available we got the following results for CPU:

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 269.93ms 193.37ms 1.86s 71.80%
Req/Sec 142.96 129.09 0.88k 77.95%
254251 requests in 8.00m, 91.78GB read
Socket errors: connect 0, read 123, write 0, timeout 14
Requests/sec: 529.61
Transfer/sec: 195.77MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 419.63ms 3.77ms 433.95ms 82.02%
Req/Sec 255.00 143.49 717.00 74.21%
452524 requests in 8.00m, 163.30GB read
Requests/sec: 942.70
Transfer/sec: 348.35MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 72.40ms 45.14ms 1.33s 89.24%
Req/Sec 1.37k 400.26 3.12k 72.07%
2604138 requests in 8.00m, 0.92TB read
Requests/sec: 5424.55
Transfer/sec: 1.96GB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 115.01ms 33.57ms 450.95ms 70.57%
Req/Sec 771.78 243.84 2.17k 72.46%
1470998 requests in 8.00m, 530.64GB read
Socket errors: connect 0, read 915773, write 0, timeout 0
Requests/sec: 3064.10
Transfer/sec: 1.11GB
It is important to notice that when we raised the number of threads of web sever, we have a gain of performance on Netdata, but we also had a gain of performance in the others. Netdata again did not reach 100% of usage from any CPU.
To finish the group of 8 threads, I am bringing the memory results:




The final Benchmark of the group had the following change in Netdata configuration:
server max sockets = 1024With the double of threads available we got the following results for CPU:

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 571.16ms 286.26ms 1.99s 67.54%
Req/Sec 163.43 140.91 1.06k 76.35%
303609 requests in 8.00m, 109.55GB read
Socket errors: connect 0, read 60, write 0, timeout 15
Requests/sec: 632.39
Transfer/sec: 233.66MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 444.76ms 3.98ms 463.62ms 76.88%
Req/Sec 225.64 88.06 545.00 77.09%
426974 requests in 8.00m, 154.08GB read
Requests/sec: 889.35
Transfer/sec: 328.64MB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 75.24ms 70.33ms 1.78s 97.76%
Req/Sec 1.37k 401.26 3.20k 71.74%
2591233 requests in 8.00m, 0.91TB read
Requests/sec: 5397.36
Transfer/sec: 1.95GB

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 90.67ms 25.21ms 325.89ms 72.77%
Req/Sec 1.04k 286.85 2.12k 67.71%
1991809 requests in 8.00m, 718.50GB read
Socket errors: connect 0, read 165484, write 0, timeout 0
Requests/sec: 4148.77
Transfer/sec: 1.50GB
Finally, I am bringing the memory for 1024 sockets:




Hi,
@thiagoftsm I do think to be able to interpret the CPU results we __absolutely need the time__ it took to complete the same amount of requests (therefore i would kick out the time limit flag from the wrk) from each proposed solution.
I would imagine something like this:
All web servers were given 1 000 000 requests to handle.
Each request is serving the same static file. Keep alive is off.
Each server was running on 4 threads, client was running on 4 threads in parallel.
Following are the results:
| Server | !! Time total to serve 1mil requests !! | avg. CPU load | max. mem | avg. req/s | avg. time/request
| --- | --- | --- | --- | --- | --- |
| Netdata | ... | 30% | ... | ... | ... |
| websocket | ... | 200% | ... | ... | ... |
The point I am trying to convey is if for example websockets use 400% CPU (all for cores) but is able to finish 10x quicker it is more effective (uses less resources per single request handled - that is IMHO what really matters). But if we just see CPU time but not time it took to handle all requests we cannot really know.
Second thing to consider then is having server just run on single thread and having X clients in parallel with same table as for previous test. This will let us see how much each individual thread is effective as opposed to how well it is able to scale with number of threads.
Hi @underhood ,
I used Netdata chart to show how the processor and memory are working during the time, the average give us a central information that won't express this exactly, for example see the latest chart for websockets, the average will give us a value higher than 72 due the extremes, but for a long time it did not need more than 72Mb.
For I have the exact number of 1M requests, I will need to use ab instead wrk.
We used only one file, I will update the description, thanks to remember!
That's not @underhood's point, it's that we need to see the results of the wrk to compare latency and throughput.
Repeating what I said, with emphasis to clarify:
So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.
After to get the results with other benchmark, I decided to freeze the thread values at 2048 and I increased the web server Threads, the following results were got for Netdata.

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 267.74ms 180.09ms 1.53s 69.21%
Req/Sec 372.22 237.46 1.17k 61.07%
710387 requests in 8.00m, 256.33GB read
Socket errors: connect 0, read 12, write 0, timeout 0
Requests/sec: 1479.67
Transfer/sec: 546.72MB


thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
4 threads and 397 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 292.11ms 215.76ms 1.87s 68.19%
Req/Sec 341.46 203.70 1.21k 62.78%
652749 requests in 8.00m, 235.53GB read
Socket errors: connect 0, read 20, write 0, timeout 0
Requests/sec: 1359.62
Transfer/sec: 502.36MB

Hello everyone,
After a talk with @cakrit now, we decided that next sprint we will begin to test libh2o, I will try to reduce their static library to get what we really need from them.
I am also bringing a table with all the previous results:

Considering that we defined the road to go and I will do already in the next sprint, I am closing it.
Based on the tests above, the suggestion is indeed to move with libh2o and try to reduce the static library's footprint. If there are any other opinions, please comment so we can close.
As already discussed with @thiagoftsm, I'm absolutely fine with H2O if we can find a way to minimize the amount of code we'll use from H2O, as the lib is massive.
For benchmarking there is a good tool available: https://github.com/httperf/httperf
This produces quite fine-grained statistics and ramps up to a high load-level.
The picoparser looks like a good choice for performance, but the code is quite subtle and intricate - so I wonder how do we handle the dependency?
Netdata already brought codes from other libraries to our code keeping the reference for the original developers, you can see this on the top of the file libnetdata/avl/avl.c, so we would not have problems with this.
Do we have a list of which parts of h2o need to be pulled into netdata and which parts we do not need?
Also, the separate repo for the picohttpparser does not look as if it matches a pair of source files from the larger h2o repo - do we need to check that the calling conventions and structures are the same? My concern here is that if I pull in the picohttparser in parallel to the integration of the h2o library then we need to end up with compatible code.
Right now Netdata only works with HTTP/1.1 and our stream that is a kind of HTTP method, so our goal now is to bring the HTTP stuff and probably we will need to have websockets too.
I think it would be good to use the picopaser from libh2o, because this will help us to bring the rest, case they are different.
I took another look at h2o - they pull in the picohttpparser directly into their deps/ directory and then keep a copy in their source tree. I think we can do the same.
I took another look at h2o - they pull in the picohttpparser directly into their deps/ directory and then keep a copy in their source tree. I think we can do the same.
I agree 100% with you.
Is this ready to close?
I think so, we will move with libh2o.
Agreed.
Most helpful comment
@cakrit Regarding
That might be a way to go but only as long as we want to support just HTTP/1.1, but there is HTTP/2 already, websockets and HTTP/3 on the way. Therefore using libs is much more future proof (as features in daemon grow) implementing all that will be way too much work IMHO to be done by Netdata (following all the standards, testing with all possible browsers/configs, security matters) and is best left to people who want to develop web servers - not what netdata wants. I would rather prefer netdata devs focus on netdata instead of having their hands full with developing yet another HTTP server. I might change my mind if benches really show that performance price is way too high but sacrificing some performance for all those features might be actually good thing.