Netdata: SPIKE Libraries to change sockets and webserver

Created on 12 Sep 2019  路  55Comments  路  Source: netdata/netdata

Feature idea summary

Considering the necessity that we have to migrate to Microsoft and to create conditions to have unit tests in the web server, it is necessary to migrate the connection pool from epool for something more generic like libuv or something more complete like libwebsockets or libh2o, these last two libraries have the libuv as their basis.

This SPIKE will map all the functions and necessaries steps to do the migration.

@mfundul considering the issue that you created, you gave me the idea to write this spike as a documentation that you help not only you, but all future contributors to give new features for our webserver.

Expected behavior

To have the road map to migrate the web server and our libsockets to any operating system.

feature request

Most helpful comment

@cakrit Regarding

So I say we forget about what those libraries do and incrementally improve netdata.

That might be a way to go but only as long as we want to support just HTTP/1.1, but there is HTTP/2 already, websockets and HTTP/3 on the way. Therefore using libs is much more future proof (as features in daemon grow) implementing all that will be way too much work IMHO to be done by Netdata (following all the standards, testing with all possible browsers/configs, security matters) and is best left to people who want to develop web servers - not what netdata wants. I would rather prefer netdata devs focus on netdata instead of having their hands full with developing yet another HTTP server. I might change my mind if benches really show that performance price is way too high but sacrificing some performance for all those features might be actually good thing.

All 55 comments

The next table maps the relationship between Netdata functions and the respective functions inside LibH2O and LibWebSockets, when there is not any relationship between the functions, a dash(-) signal will be used.

Something very important to do when we compare the table results, it is not to do a direct association between functions, the functions are compared in the context that they bring the same functionality, but this does not mean that everything we have inside Netdata function will be in the other libraries.

Sockets

| Socket action | Netdata File | Netdata Function | Lib H2O | Lib Websocket |
| --------------------------- | ----------------- | ------------------------- | ------------ | --------------------- |
| Create Socket | web/server/web_server.c | api_listen_sockets_setup | - | - |
| Create Socket | libnetdata/socket/socket.c | listen_sockets_setup | - | - |
| Create Socket | libnetdata/socket/socket.c | bind_to_this | - | - |
| Create Socket | libnetdata/socket/socket.c | create_listen_socketsX, create_listen_socket_unix | uv_ipX_addr, uv_tcp_bind, uv_listen | lws_create_context, lws_create_vhost |
| Create Socket | libnetdata/socket/socket.c | listen_add_sockets | - | - |
| Threads | web/server/static/static-threaded.c | socket_listen_main_static_threaded | - | - |
| Threads | libnetdata/socket/socket.c | security_start_ssl | OpenSSL API | lws_create_context, lws_create_vhost |
| Threads | libnetdata/threads/threads.c | netdata_thread_create | uv_thread_create | uv_thread_create |
| Threads | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker_cleanup | - | - |
| Threads | web/server/web_client_cache.c | web_client_cache_destroy | - | - |
| Add socket to pool | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker | - | - |
| Add socket to pool | web/server/static/static-threaded.c | socket_listen_main_static_threaded_worker | uv_run | - |
| Add socket to pool | libnetdata/socket/socket.c | poll_events | h2o_uv_socket_create | - |
| Accept client connection | libnetdata/socket/socket.c | poll_add_fd | h2o_accept | lws_service |
| Add socket to pool | libnetdata/socket/socket.c | poll_event_process | uv_accept | lws_service |
| Close socket | web/server/static/static-threaded.c | socket_listen_main_static_threaded_cleanup | - | - |
| Close socket | libnetdata/socket/socket.c | listen_sockets_close | - | - |
| Remove socket from poll | web/server/static/static-threaded.c | web_server_file_del_callback | uv_close | - |
| Remove socket from poll | web/server/static/static-threaded.c | web_server_del_callback | uv_close and h2o callback | - |
| Remove socket from poll | web/server/web_client_cache.c | web_client_release | uv_close | lws_service |
| Remove socket from poll | web/server/web_client.c | web_client_request_done | uv_close | lws_service |
| Remove socket from poll | daemon/global_statistic.c | web_client_disconnected | - | - |
| Remove socket from poll | web/server/web_client_cache.c | web_client_free | - | - |

Process Request

| Web Server Action | Netdata File | Netdata Function | Lib H2O | Lib Websocket |
| --------------------------- | ----------------- | ------------------------- | ------------ | --------------------- |
| Read socket | web/server/static/static-threaded.c | web_server_rcv_callback | PicoHTTPParser library | lws_callback_function |
| Read socket | web/server/web_client.c | web_client_receive | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_process_request | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | http_request_validate | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_is_request_complete | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_valid_method | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_find_protocol | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | http_header_parser | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_split_path_query | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_map_query_string | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_decode_r | PicoHTTPParser library | lws_callback_function |
| Process request | libnetdata/url/url.c | url_parse_query_string | PicoHTTPParser library | lws_callback_function |
| Process request | streaming/rrdpush.c | rrdpush_receiver_thread_spawn | A callback to libuv | lws_callback_function |
| Process request | web/server/web_client.c | web_client_process_url | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | web_client_switch_host | PicoHTTPParser library | lws_callback_function |
| Process request | web/server/web_client.c | mysendfile | h2o_send | lws_callback_function |
| Process request | web/server/web_client.c | web_client_api_request | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1 | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_info | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_data | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_chart | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | api/formaters/rrd2json.c | rrd_stats_api_v1_chart | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_charts | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/formatters/chart2json.c | charts2jon | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_registry | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | registry/registry.c | registry_request_* | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/badges/web_buffer_svg.c | web_client_api_request_v1_badge | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarms | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_alarms2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_log | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_alarm_log2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_variables | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | database/rrdvar.c | health_api_v1_chart_variables2json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/web_api_v1.c | web_client_api_request_v1_alarm_count | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | health/health_json.c | health_aggregate_alarms | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/allmetrics.c | web_client_api_request_v1_allmetrics | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/shell/allmetrics_shell.c | rrd_stats_api_v1_charts_allmetrics_json | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/exporters/shell/allmetrics_shell.c | rrd_stats_api_v1_charts_allmetrics_shell | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | backends/prometheus/backend_prometeus.c | rrd_stats_api_v1_charts_allmetrics_prometheus_single_host | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | backends/prometheus/backend_prometeus.c | rrd_stats_api_v1_charts_allmetrics_prometheus_all_hosts | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | web/api/health/health_cmdapi.c | web_client_api_request_v1_mgmt_health | h2o_config_register_path,h2o_create_handler | lws_callback_function |
| Process request | libnetdata/clocks/clocks.c | now_realtime_timeval | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | now_realtime_timeval | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/static/static-threaded.c | web_server_snd_callback | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | web_client_send | h2o_send | lws_http_transaction_completed,ltws_callback_http_dummy |
| Process request | web/server/web_client.c | web_client_send_data | h2o_send_data | lws_http_transaction_completed,ltws_callback_http_dummy |

Additional dependencies

libh2o: uv ssl crypto z
libwebsockets: uv ssl crypto z cap

I wrote a simple web server that reads files from a directory and also reads /etc/netdata/netdata.conf every request is done in a security TLS channel.

Problems

The following problems were found in the test with the libraries:

Libh2o

  • Libh2o has a explicity include for pthread.h inside the file /usr/local/include/h2o/multithread.h , so this does not allow mingw-w64 to compile and probably other windows compilers.
  • The default compilation of H2O only generates static library, thanks this the webserver that I am using to test the library has 2.8Mb.
  • Poor documentation and few examples that do not demonstrate the real potential of H2O.

Libwebsockets

  • We need to link with more libraries than we need to do with Libh2o
  • The libwebsockets has different functions that works together the lws_callback_function, but the majority of the work is done using only one function.

Solutions for the problems

We can do the following steps to use the libraries and overcome the problems:

Libh2o

  • We can help h2o development and create PRs to help them to support Microsoft.
  • We need to change the make rules to generate shared libraries
  • The common solution for both cases is to bring codes from H2O web server direct to Netdata respecting MIT license.
  • They have a complete web server developed with the library, so we can study it.

Libwebsockets

  • For the first problem, we do not have many things to do, we need to link. Another possibility would be copy parts of the code and incorporate to our code.
  • We can organize better the switch and if statement to avoid waste time testing all the options, no less important we can reduce the number of tests related to the reasons available here.

To finish this SPIKE I decided to run a simple benchmark with both servers written in the previous post to compare the results, I wrote a shell script that executes a simple request for the index page in a TLS tunnel, but it does this in a loop, initially I set only 40 requests to test and I got the following results:

LibH2O

real    0m2.047s
user    0m0.376s
sys     0m0.104s

Libwebsockets

real    0m0.581s
user    0m0.394s
sys     0m0.128s

After this initial result, I decided to move in front with the libwebsockets only, and I ran another test, but this time with 10000 requests:

Libwebsockets

real    2m18.088s
user    1m36.917s
sys     0m28.008s

doing a simple math we can see that for a very simple request with an equally simple web server, we are having around 72 requests per second.

After to get the results from the previous tests, I decided to do more a benchmark, this time I decided to do download of dashboard.js present on Sunday, September 29th using both Netdata and the simple web server linked with libwebsockets, this time I decided to go directly in the loop with 10000 requests:

LIbwebsockets

real    2m7.123s
user    1m26.926s
sys     0m27.753s

Netdata

real    2m19.897s
user    1m31.669s
sys     0m27.150s

The fact that Netdata is slower than libwebsockets in this specific example it is hoped, because Netdata is a big code, while the test code has only 421 lines of code, what really matter here is the fact that Netdata was more or less 38 seconds slower only.

a simple benchmark with both servers written in the previous post to compare the results, I wrote a shell script that executes a simple request for the index page in a TLS tunnel, but it does this in a loop, initially I set only 40 requests

LibH2O
real 0m2.047s
user 0m0.376s
sys 0m0.104s

It looks strange to me, could you share your benchmark code?
cc @underhood

libh2o should be very fast

https://h2o.examp1e.net/assets/staticfile612-nginx1910-h2o170.png

I agree Ilya, I will talk with him.
This is not a real web server, it is only a simple test, libh20 does not have a shared library by default and its binary was 9 times bigger than libwebsockets binary, this is something to consider for sure.

I did another benchmark, this time I used the h2o web server compiled and I adjust it to use SSL v1.2.
I again requested 10000 times the file dashboard.js that I use in a specific computer, and I got the following results:

H2O webserver

real    2m5.329s
user    1m27.190s
sys     0m27.122s

As we can see H2O is faster than Netdata.
So the initial benchmark had problem related to original code linked with H2O that brings a problem related with their examples, because they were the basis for I build the software here.

Final decision what library to use is pending. @thiagoftsm please define the final result of the investigation

What is the time-frame for decision here? IMHO this is not decision to be taken lightly as it will possibly go with netdata long time.

@thiagoftsm can you share your benches as @ilyam8 mentioned? Maybe create a git repo with them.

Don't worry much about code quality with the tests (point is to benchmark not to see flawless code).

Point is that other people can take a look as well potentially spotting some accidental bottlenecks in the benches.

Do you run all tests single-threaded? e.g. 10000 requests one by one? Each request is one TCP conn or you use keep-alive?

Hi @underhood ,

We are finishing the SPIKE today, but we can continue working on this during the next 15 days, because I am not plaining to bring this for Netdata before this interval, I have other things to do in the next two weeks.

Initially I ran in sequence, but I will publish more benchmarks in the sequence that was done in concurrency. Until this POST everything was done with one TCP connection per request, I did not use keep-alive any time.

I am bringing another benchmark done with libwebsockets, Netdata and H2O. In this benchmark, I added an '&' at the end of the curl command line to put some process to run in parallel, I let the script to put 240 process to run before to sleep 1 second and run more 240
process. Doing these requests I got the following results for each webserver:

Libwebsockets

real    1m19.062s
user    2m23.110s
sys     0m40.969s

Netdata

real    1m2.021s
user    2m17.259s
sys     0m38.610s

H2O

real    1m29.491s
user    2m21.370s
sys     0m37.072s

After to run some tests with curl and bash script, I decided to use specific tools to do benchmarks, the first test was done with Apache HTTP server benchmarking tool(ab) and I got the following results:

Libwebsockets

bash-5.0$ ab -t 100 -n 10000 -c 397 "https://localhost:7891/dashboard.js"
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:        
Server Hostname:        localhost
Server Port:            7891
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        localhost

Document Path:          /dashboard.js
Document Length:        387033 bytes

Concurrency Level:      397
Time taken for tests:   18.121 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      3874780000 bytes
HTML transferred:       3870330000 bytes
Requests per second:    551.83 [#/sec] (mean)
Time per request:       719.421 [ms] (mean)
Time per request:       1.812 [ms] (mean, across all concurrent requests)
Transfer rate:          208811.51 [Kbytes/sec] received

Connection Times (ms)                                                                                                                                                                                                                                                          
              min  mean[+/-sd] median   max                                                                                                                                                                                                                                    
Connect:        2  500 745.5    266    7371                                                                                                                                                                                                                                    
Processing:    16  199 110.7    204     517                                                                                                                                                                                                                                    
Waiting:        1   84  65.7    108     332                                                                                                                                                                                                                                    
Total:         31  698 766.2    512    7701                                                                                                                                                                                                                                    

Percentage of the requests served within a certain time (ms)                                                                                                                                                                                                                   
  50%    512                                                                                                                                                                                                                                                                   
  66%    546                                                                                                                                                                                                                                                                   
  75%    613                                                                                                                                                                                                                                                                   
  80%    704                                                                                                                                                                                                                                                                   
  90%   1459                                                                                                                                                                                                                                                                   
  95%   1572                                                                                                                                                                                                                                                                   
  98%   3282                                                                                                                                                                                                                                                                   
  99%   3587                                                                                                                                                                                                                                                                   
 100%   7701 (longest request)

Netdata

bash-5.0$ ab -t 100 -n 10000 -c 397 https://localhost:19999/dashboard.js
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:        NetData
Server Hostname:        localhost
Server Port:            19999
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        localhost

Document Path:          /dashboard.js
Document Length:        387033 bytes

Concurrency Level:      397
Time taken for tests:   8.795 seconds
Complete requests:      10000
Failed requests:        131
   (Connect: 0, Receive: 0, Length: 131, Exceptions: 0)
Total transferred:      3847547221 bytes
HTML transferred:       3844057221 bytes
Requests per second:    1137.03 [#/sec] (mean)
Time per request:       349.156 [ms] (mean)
Time per request:       0.879 [ms] (mean, across all concurrent requests)
Transfer rate:          427223.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2  176 345.9     59    1482
Processing:    16  167  57.2    171     340
Waiting:        1   87  48.0     76     250
Total:         19  343 360.7    233    1663

Percentage of the requests served within a certain time (ms)
  50%    233
  66%    261
  75%    292
  80%    323
  90%   1228
  95%   1373
  98%   1411
  99%   1433
 100%   1663 (longest request)

H20

bash-5.0$ ab -t 100 -n 10000 -c 397 https://localhost:8081/netdata.conf
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:        h2o/2.3.0-DEV
Server Hostname:        localhost
Server Port:            8081
SSL/TLS Protocol:       TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256
Server Temp Key:        X25519 253 bits
TLS Server Name:        localhost

Document Path:          /netdata.conf
Document Length:        251809 bytes

Concurrency Level:      397
Time taken for tests:   12.774 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      2520630000 bytes
HTML transferred:       2518090000 bytes
Requests per second:    782.82 [#/sec] (mean)
Time per request:       507.139 [ms] (mean)
Time per request:       1.277 [ms] (mean, across all concurrent requests)
Transfer rate:          192695.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2   32  46.6     21     331
Processing:    59  473  92.3    447     741
Waiting:        1    8   9.3      5     223
Total:         64  505 101.4    472     880

Percentage of the requests served within a certain time (ms)
  50%    472
  66%    509
  75%    556
  80%    587
  90%    654
  95%    702
  98%    758
  99%    802
 100%    880 (longest request)

I will call attention here that the fact I am using the number 397 in the concurrency option, because with a big number it is necessary to change the source code of libwebsockets to increase the option backlog of the function listen().

Just a quick comment during my lunch break:

when i was benchmarking my mempool PR I also noticed 2 things:

  1. sometimes Netdata becomes too slow and unresponsive (this might be same what you see with ApacheBench although I thested with httperf) -> therefore there seems to be something that makes netdata webserver upset when too many parallel requests are running. - _after testing with other benchmark tools i was not able to reproduce the issue therefore consider this to be problem with httperf_
  2. keep-alive seems not to be working really well currently (although i have to test again to be sure fault is not actually at httperf side) - _same as point 1_

First I thought it is my PR breaking things but then i compiled Master and saw the same thing.
I am glad I saw someone else notice this problem as well.
I didn't want to report becuase I was not yet sure it's not my VM breaking things etc.

Anyhow the eye of Sauron has noticed already (i have it already on my personal todo list of things to investigate).

Additionally i noticed netdata seems to not close all threads properly at least some times - not confirmed yet

Finally I am bringing a last benchmark from wrk, this was a tool presented to me by Ilya(Thank you very much @ilyam8 !).

Libwebsockets

bash-5.0$ wrk -t4 -c397 -d5m https://localhost:7891/dashboard.js
Running 5m test @ https://localhost:7891/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   467.83ms   44.10ms 774.63ms   87.63%
    Req/Sec   212.15     77.76   474.00     73.28%
  253426 requests in 5.00m, 91.45GB read
Requests/sec:    844.53
Transfer/sec:    312.08MB

Netdata

bash-5.0$ wrk -t4 -c397 -d5m https://localhost:19999/dashboard.js
Running 5m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   568.10ms  368.39ms   2.00s    72.14%
    Req/Sec    86.66     86.82   600.00     85.78%
  69150 requests in 5.00m, 24.98GB read
  Socket errors: connect 76, read 112, write 0, timeout 679
Requests/sec:    230.47
Transfer/sec:     85.25MB

H20

bash-5.0$ wrk -t4 -c397 -d5m https://localhost:8081/dashboard.js
Running 5m test @ https://localhost:8081/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   118.57ms   66.72ms 730.86ms   69.25%
    Req/Sec   844.59    341.29     2.19k    67.88%
  1003814 requests in 5.00m, 362.10GB read
Requests/sec:   3345.42
Transfer/sec:      1.21GB

In this post I am bringing the simple version of the files used to test and understand the libraries.

files.zip

The benchmarks with H2O server proof that H2O has an awesome basis, but the fact that it is not simple to port it to Microsoft, and no less important the fact that it only has static library by default will increase Netdata binary size around 60%. Another problem is the fact that H2O examples gave me an initial wrong impression from the library, so it is necessary to go deep in the web server code to understand better how to use the library. I am not saying it is a problem to study H2O code, but the fact that we had a small example in the source code that is almost 25 times slower than the WebServer is something preety bad.
Libwebsockets does not have the same performance that H2O had, on the other hand it was prepared to run in different Operate Systems, it also has a better documentation and it examples work fine.
Considering the results and the fact that we wanna improve Netdata quality, I think to move with any library will move us in the correct direction, but the fact that H2O static library when linked raises the binary size a lot and it is necessay to do explicit calling for libuv while libwebsockets simplified everything to us, I think libwebsockets will improve Netdata and it will simplify the transaction to other Operate Systems.

Apologies for noticing just now, but the benchmark was incorrect. netdata.conf is a static file for the other two, but not for netdata, which generates the output. Please repeat them with something truly static, like dashboard.js. The difference is enormous, as you can see below:

[christopher@chris-msi netdata]$ wrk -t4 -c397 -d5s http://localhost:19999/netdata.conf
Running 5s test @ http://localhost:19999/netdata.conf
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   962.74ms  423.82ms   1.94s    60.81%
    Req/Sec   110.12    111.51   570.00     85.37%
  1879 requests in 5.03s, 1.02GB read
  Socket errors: connect 0, read 0, write 0, timeout 6
Requests/sec:    373.59
Transfer/sec:    207.80MB

[christopher@chris-msi netdata]$ wrk -t4 -c397 -d5s http://localhost:19999/dashboard.js
Running 5s test @ http://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    19.88ms   10.45ms  63.96ms   64.11%
    Req/Sec     3.07k   294.27     4.32k    76.56%
  60991 requests in 5.05s, 22.01GB read
Requests/sec:  12073.95
Transfer/sec:      4.36GB

I'd also like to see cpu and memory netdata charts during the execution of the benchmark.

Performance and efficiency are crucial for us, we won't use a library that will degrade them.

No problem, I am glad you found the problem and I am able now to generate the correct results, I already could see in the first round of tests that we had an improvement with Netdata results as expected.
I decided do not create other comments with the newest results, I am updating the previous results of request benchmark where I am using dashboard.js for the current day, but I saved the old results with netdata.conf, when I finish the update with the access benchmark, I will generate a report with the cpu and memory benchmark.

I used Netdata with dbengine set as memory mode to measure CPU usage, for sure this would not be good for Netdata, because I am using itself to measure its performance, but I decided to move in front anyway, because I trust the results and due the fact we are open source, everybody can verify that there is not any kind of BIAS in the result.
This benchmark was done with wrk.

Netdata

netdata_cpu

Libwebsockets

websockets_cpu

H2O

h2o_cpu

The CPU usage shows a better result of libwebsockets when compared with h2o.

The final benchmark was done again with wrk doing requests for all three webservers, here we are measuring the memory usage of each webserver.

Netdata

netdata_mem

Websockets

websockets_mem

H2O

h2o_mem

The memory management of both libwebsockets and H2O in this case was done by libuv, H2O is well known to do not call the kernel many times to allocate memory.

Considering the latest results, I think we cannot discard the possibility to bring the general idea from both libraries to improve Netdata core instead to bring libraries, but for sure I keep my vision that libwebsockets is the best option, mainly considering the fact that h2o failed to compile with mingw.

Considering the latest results, I think we cannot discard the possibility to bring the general idea from both libraries to improve Netdata core instead to bring libraries, but for sure I keep my vision that libwebsockets is the best option, mainly considering the fact that h2o failed to compile with mingw.

We can't use those libraries at all. Taking 4 CPU cores just to serve the web requests is a big no. The memory benchmark shows horrible performance too. We need to remember that netdata is doing a lot of other things as well, not just serving that static content. You'd need to set it to memory mode RAM and disable all collectors to have really comparable results.

So I say we forget about what those libraries do and incrementally improve netdata.

Actually, I believe we could easily take this one step further. We canadd a blog post and a link to that post from docs/Performance.md, to showcase how performant netdata really is.

With only the apps plugin enabled and memory mode RAM with the shortest possible history, we can run the benchmarks again and put the results in a blog post. @joelhans can help with this, let's just have another, fair run for netdata with the exact same test. The tests for the other libraries don't need to be repeated.

We can use their ideas instead the library, mainly because H2O has URL parser more concentrated and it does not destroy the original request .
@underhood is also working in something that can improve our web server, our buffer management is something that did not allow Netdata be better in this benchmark.
@joelhans any doubt you have let me know.

Joel will just wait for the updated performance metrics and charts after you repeat the test on netdata,

With only the apps plugin enabled and memory mode RAM with the shortest possible history

Update the ticket after the test and he can prepare the blog post.

@cakrit I do not agree that in this tests High CPU usage is a bad thing. If we do a max load test if server is able to finish quicker by better using available resources it is better (therefore we want it to see to finish X requests in parallel as quickly as possible, therefore if it is able to use more cores to do that the better, not getting blocked by IO or by other threads of self etc.). Of course we can then limit the usage in netdata so it doesn't allow all cores to be used. But quicker it is able to process the request the better (and quicker usually means it can use all the availible CPU power better). Remember these tests do not simulate normal usage as netdata is used but represent maximum load we are able to generate to stress test it.
So if server is able to finish X requests quicker and generates higher CPU load -> GOOD, if it generates higher CPU load but takes same time for those X requests ->BAD

@cakrit I also have to investigate the netdata thing where webserver stops responding when under high parallel load. Something I noticed and @thiagoftsm seen something similar in the tests. I will try to find as much as possible but y'all know my situation until 1st Jan. I think this is something that needs seriously get investigated.

@cakrit Regarding

So I say we forget about what those libraries do and incrementally improve netdata.

That might be a way to go but only as long as we want to support just HTTP/1.1, but there is HTTP/2 already, websockets and HTTP/3 on the way. Therefore using libs is much more future proof (as features in daemon grow) implementing all that will be way too much work IMHO to be done by Netdata (following all the standards, testing with all possible browsers/configs, security matters) and is best left to people who want to develop web servers - not what netdata wants. I would rather prefer netdata devs focus on netdata instead of having their hands full with developing yet another HTTP server. I might change my mind if benches really show that performance price is way too high but sacrificing some performance for all those features might be actually good thing.

So if server is able to finish X requests quicker and generates higher CPU load -> GOOD, if it generates higher CPU load but takes same time for those X requests ->BAD

I'd argue that this assessment only holds true if the ratio of time to CPU load is the same or lower for the faster completion. If we can finish some set number of requests 20% faster, but cause 50% more CPU load while doing so, we've not improved, we've actually gotten worse.

@Ferroin
Yep naturally, point I was trying to make is that lib being able to use 100% of each core to max is not necessarily bad thing.
I can add billionth digit of Pi calculation into request handler then add some sleeps and mutexes in between everywhere so then CPU load will be low while actually taking much more CPU cycles per request and more time.

I looked at the response times and requests/sec after @underhood 's comments and we're comparing apples with oranges again.

The reason the CPU usage is so low and the response time and throughput so high compared to the other two is that the following settings were left with the default values:

[web]
    web server threads = 4
    web server max sockets = 512

So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.

Of course the reason we started investigating the other libraries was to not have to do everything ourselves. Let's do a proper benchmark and, if it's the performance hit is acceptable (if it even exists), by all means we go with one of the options.

There is a huge difference between netdata and any normal web server. netdata should not use all available resources, except perhaps when running containerized, with user-set container resource limits. This is a crucial feature and you can see at the performance doc and the netdata for IoT doc how important it is to netdata.

I will work on the benchmarks again today, I did not release newest versions of the benchmark yet, because I began to receive a great number of gaps in some charts when I was using H2O and Apache and I raised the number of requests of the benchmark. After a research and a brief talk with @underhood and Ilya , we arrived in the conclusion that was not a bug, but we had the gaps, due the fact the webservers were not allowing other software to get a slice of time to process.

I looked at the response times and requests/sec after @underhood 's comments and we're comparing apples with oranges again.

The reason the CPU usage is so low and the response time and throughput so high compared to the other two is that the following settings were left with the default values:

[web]
    web server threads = 4
    web server max sockets = 512

So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.

Of course the reason we started investigating the other libraries was to not have to do everything ourselves. Let's do a proper benchmark and, if it's the performance hit is acceptable (if it even exists), by all means we go with one of the options.

There is a huge difference between netdata and any normal web server. netdata should not use all available resources, except perhaps when running containerized, with user-set container resource limits. This is a crucial feature and you can see at the performance doc and the netdata for IoT doc how important it is to netdata.

@cakrit agreed with apples to oranges and that is why i suggested making a repo with benchmarks so it is more clear what and exactly how it is tested.

@cakrit Regarding the CPU use: From your response I can see I did not describe what I mean so well. It is clear that Netdata should not use all available resources and that was not my point at all.

@thiagoftsm: glad my hunch on the gaps got confirmed and I was able to help.

The next group of Benchmarks I am beginning to bring here were done in a different computer, to confirm that the results are not related to hardware, I used the following environment now:

  • An Intel I5 with 8GB ram and 4 real processors. This computer has hyper-threading technology.
  • This computer has among the servers running: Xorg server 1.20-5, PostgreSQL 11.5, Apache 2.4.41
  • I also started a VM with Virtual Box using 2 cores and 512MB of ram
  • Both computer and VM runs Slackware Current with kernel 4.19.76 and the latest Netdata and libraries available for the current day.
  • With the exception of Apache and Netdata that was running during all time, the other web servers were started when the benchmark began and killed when the benchmark was finished.
  • This benchmark was done with the software wrk with 4 threads and 397 concurrency.
  • Netdata was running with memory mode = ram , web server threads = 4 and web server max sockets = 512
  • We used the file dashboard.js from Netdata on the current date in all the servers during the requests.

Firstly I am bringing the CPU results:

Netdata

netdata_cpu_default

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   470.02ms  268.54ms   1.88s    69.03%
    Req/Sec    99.10     92.81   660.00     83.52%
  151684 requests in 8.00m, 54.73GB read
  Socket errors: connect 1, read 25, write 0, timeout 0
Requests/sec:    315.95
Transfer/sec:    116.74MB

Websockets

websockets_cpu_default

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   426.90ms    3.91ms 443.09ms   91.79%
    Req/Sec   245.62    184.92   740.00     60.71%
  444826 requests in 8.00m, 160.52GB read
Requests/sec:    926.60
Transfer/sec:    342.41MB

H2O

h2o_cpu_default

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    80.35ms   97.27ms   1.60s    98.38%
    Req/Sec     1.38k   416.42     3.13k    71.05%
  2588120 requests in 8.00m, 0.91TB read
Requests/sec:   5390.83
Transfer/sec:      1.94GB

Apache

apache_cpu_default

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   120.26ms   31.74ms 493.43ms   70.92%
    Req/Sec   727.09    209.04     2.35k    71.51%
  1386740 requests in 8.00m, 500.25GB read
  Socket errors: connect 0, read 1046044, write 0, timeout 0
Requests/sec:   2888.60
Transfer/sec:      1.04GB

The gaps presented in the charts were due the fact Netdata could not get time from processor every seconds due the high sample of data and number of process/threads opened by the servers.

I am splitting the results for CPU and Memory to avoid a long comment, now I am bringing the memory results for memory of the benchmark

Netdata

netdata_mem_default

WebSockets

websocket_mem_default

H2O

h2o_mem_default

Apache

apache_mem_default

The next Benchmark had the following change in Netdata configuration:

  • web server threads = 8

With the double of threads available we got the following results for CPU:

Netdata

netdata_cpu_8_512

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   269.93ms  193.37ms   1.86s    71.80%
    Req/Sec   142.96    129.09     0.88k    77.95%
  254251 requests in 8.00m, 91.78GB read
  Socket errors: connect 0, read 123, write 0, timeout 14
Requests/sec:    529.61
Transfer/sec:    195.77MB

Websockets

websockets_cpu_8_512

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   419.63ms    3.77ms 433.95ms   82.02%
    Req/Sec   255.00    143.49   717.00     74.21%
  452524 requests in 8.00m, 163.30GB read
Requests/sec:    942.70
Transfer/sec:    348.35MB

H2O

h2o_cpu_8_512

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    72.40ms   45.14ms   1.33s    89.24%
    Req/Sec     1.37k   400.26     3.12k    72.07%
  2604138 requests in 8.00m, 0.92TB read
Requests/sec:   5424.55
Transfer/sec:      1.96GB

Apache

apache_cpu_8_512

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   115.01ms   33.57ms 450.95ms   70.57%
    Req/Sec   771.78    243.84     2.17k    72.46%
  1470998 requests in 8.00m, 530.64GB read
  Socket errors: connect 0, read 915773, write 0, timeout 0
Requests/sec:   3064.10
Transfer/sec:      1.11GB

It is important to notice that when we raised the number of threads of web sever, we have a gain of performance on Netdata, but we also had a gain of performance in the others. Netdata again did not reach 100% of usage from any CPU.

To finish the group of 8 threads, I am bringing the memory results:

Netdata

netdata_mem_8_512

Websockets

websockets_mem_8_512

H2O

h2o_mem_8_512

Apache

apache_mem_8_512

The final Benchmark of the group had the following change in Netdata configuration:

  • server max sockets = 1024

With the double of threads available we got the following results for CPU:

Netdata

netdata_cpu_8_1024

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   571.16ms  286.26ms   1.99s    67.54%
    Req/Sec   163.43    140.91     1.06k    76.35%
  303609 requests in 8.00m, 109.55GB read
  Socket errors: connect 0, read 60, write 0, timeout 15
Requests/sec:    632.39
Transfer/sec:    233.66MB

Websockets

websockets_cpu_8_1024

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:7891/dashboard.js
Running 8m test @ https://localhost:7891/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   444.76ms    3.98ms 463.62ms   76.88%
    Req/Sec   225.64     88.06   545.00     77.09%
  426974 requests in 8.00m, 154.08GB read
Requests/sec:    889.35
Transfer/sec:    328.64MB

H2O

h2o_cpu_8_1024

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:8081/dashboard.js
Running 8m test @ https://localhost:8081/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    75.24ms   70.33ms   1.78s    97.76%
    Req/Sec     1.37k   401.26     3.20k    71.74%
  2591233 requests in 8.00m, 0.91TB read
Requests/sec:   5397.36
Transfer/sec:      1.95GB

Apache

apache_cpu_8_1024

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost/dashboard.js
Running 8m test @ https://localhost/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    90.67ms   25.21ms 325.89ms   72.77%
    Req/Sec     1.04k   286.85     2.12k    67.71%
  1991809 requests in 8.00m, 718.50GB read
  Socket errors: connect 0, read 165484, write 0, timeout 0
Requests/sec:   4148.77
Transfer/sec:      1.50GB

Finally, I am bringing the memory for 1024 sockets:

Netdata

netdata_mem_8_1024

Websockets

websockets_mem_8_1024

H2O

h2o_mem_8_1024

Apache

apache_mem_8_1024

Hi,
@thiagoftsm I do think to be able to interpret the CPU results we __absolutely need the time__ it took to complete the same amount of requests (therefore i would kick out the time limit flag from the wrk) from each proposed solution.

I would imagine something like this:

All web servers were given 1 000 000 requests to handle.
Each request is serving the same static file. Keep alive is off.
Each server was running on 4 threads, client was running on 4 threads in parallel.
Following are the results:

| Server | !! Time total to serve 1mil requests !! | avg. CPU load | max. mem | avg. req/s | avg. time/request
| --- | --- | --- | --- | --- | --- |
| Netdata | ... | 30% | ... | ... | ... |
| websocket | ... | 200% | ... | ... | ... |

The point I am trying to convey is if for example websockets use 400% CPU (all for cores) but is able to finish 10x quicker it is more effective (uses less resources per single request handled - that is IMHO what really matters). But if we just see CPU time but not time it took to handle all requests we cannot really know.

Second thing to consider then is having server just run on single thread and having X clients in parallel with same table as for previous test. This will let us see how much each individual thread is effective as opposed to how well it is able to scale with number of threads.

  • tests with keepalive (e.g. 1000 requests per conection) might be interesting - to see how much server is actually limited by system being able to create/close/maintain TCP connections.

Hi @underhood ,

I used Netdata chart to show how the processor and memory are working during the time, the average give us a central information that won't express this exactly, for example see the latest chart for websockets, the average will give us a value higher than 72 due the extremes, but for a long time it did not need more than 72Mb.
For I have the exact number of 1M requests, I will need to use ab instead wrk.
We used only one file, I will update the description, thanks to remember!

That's not @underhood's point, it's that we need to see the results of the wrk to compare latency and throughput.

Repeating what I said, with emphasis to clarify:

So we should repeat the tests with different values in these two, to get comparable throughput/latency and then we can look again at the CPU %.

After to get the results with other benchmark, I decided to freeze the thread values at 2048 and I increased the web server Threads, the following results were got for Netdata.

32 Threads

CPU

netdata_cpu_32_2048

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   267.74ms  180.09ms   1.53s    69.21%
    Req/Sec   372.22    237.46     1.17k    61.07%
  710387 requests in 8.00m, 256.33GB read
  Socket errors: connect 0, read 12, write 0, timeout 0
Requests/sec:   1479.67
Transfer/sec:    546.72MB

Memory

netdata_mem_32_2048

64 Threads

CPU

netdata_cpu_64_2048

thiago@ceres:~$ wrk -t4 -c397 -d8m https://localhost:19999/dashboard.js
Running 8m test @ https://localhost:19999/dashboard.js
  4 threads and 397 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   292.11ms  215.76ms   1.87s    68.19%
    Req/Sec   341.46    203.70     1.21k    62.78%
  652749 requests in 8.00m, 235.53GB read
  Socket errors: connect 0, read 20, write 0, timeout 0
Requests/sec:   1359.62
Transfer/sec:    502.36MB

Memory

netdata_mem_64_2048

Hello everyone,

After a talk with @cakrit now, we decided that next sprint we will begin to test libh2o, I will try to reduce their static library to get what we really need from them.

I am also bringing a table with all the previous results:

discussion

Considering that we defined the road to go and I will do already in the next sprint, I am closing it.

Based on the tests above, the suggestion is indeed to move with libh2o and try to reduce the static library's footprint. If there are any other opinions, please comment so we can close.

As already discussed with @thiagoftsm, I'm absolutely fine with H2O if we can find a way to minimize the amount of code we'll use from H2O, as the lib is massive.

For benchmarking there is a good tool available: https://github.com/httperf/httperf
This produces quite fine-grained statistics and ramps up to a high load-level.

The picoparser looks like a good choice for performance, but the code is quite subtle and intricate - so I wonder how do we handle the dependency?

  1. Pulling the relevant parts of the code into the netdata repo
  2. Extracting it from h20 as a build-time dependency

Netdata already brought codes from other libraries to our code keeping the reference for the original developers, you can see this on the top of the file libnetdata/avl/avl.c, so we would not have problems with this.

Do we have a list of which parts of h2o need to be pulled into netdata and which parts we do not need?

Also, the separate repo for the picohttpparser does not look as if it matches a pair of source files from the larger h2o repo - do we need to check that the calling conventions and structures are the same? My concern here is that if I pull in the picohttparser in parallel to the integration of the h2o library then we need to end up with compatible code.

Right now Netdata only works with HTTP/1.1 and our stream that is a kind of HTTP method, so our goal now is to bring the HTTP stuff and probably we will need to have websockets too.

I think it would be good to use the picopaser from libh2o, because this will help us to bring the rest, case they are different.

I took another look at h2o - they pull in the picohttpparser directly into their deps/ directory and then keep a copy in their source tree. I think we can do the same.

I took another look at h2o - they pull in the picohttpparser directly into their deps/ directory and then keep a copy in their source tree. I think we can do the same.

I agree 100% with you.

Is this ready to close?

I think so, we will move with libh2o.

Agreed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jimidak picture jimidak  路  57Comments

ScrumpyJack picture ScrumpyJack  路  66Comments

UltimateByte picture UltimateByte  路  68Comments

kevincaradant picture kevincaradant  路  55Comments

ktsaou picture ktsaou  路  100Comments