netdata changes since v1.7 release

Created on 10 Sep 2017  路  3Comments  路  Source: netdata/netdata

key bugs fixed

streaming slaves consuming 100% CPU

A bug has been fixed, at the streaming functionality. netdata was not handling all the error cases properly, resulting in 100% cpu utilization of a single core, at the slaves, under certain conditions. Especially under FreeBSD and macOS slaves, these conditions were always met, so using FreeBSD or macOS as netdata slaves, was completely broken.

Also, there have been many more improvements at the streaming functionality.

missing alarm notifications on netdata masters

A bug has been fixed, affecting alarm notifications at netdata masters. netdata was incorrectly messing cached alarm state data between the alarms of the mirrored hosts, resulting in alarm notifications not dispatched under certain conditions. This was affecting only netdata masters (ie. netdata servers with more than one host databases, with health monitoring enabled). The alarms were generated and were visible at the dashboards, but the notifications were not always sent.

It is now fixed.

netdata API generating corrupted JSON

Another bug has been fixed, at the netdata API. netdata was assuming that the JSON representation of a chart would at most be 1024 bytes, and it was generating corrupted JSON output when any chart was exceeding that limit.

Removed the limitation (ie. now there is no limit).

netdata crashing when starting on systems without writable disks

There was an issue that caused netdata to crash while starting, if no usable disks were found.

Fixed.

python.d.plugin URLService did not support HTTP keep-alive

netdata now uses urllib3 (shipped with netdata for both python v2 and v3) for URLService based plugins.

This enables HTTP keep-alive on all connections, which allows netdata to have permanent connections to third party web applications.

Fixed by @l2isbad

This can be considered an enhancement, but given that netdata collects metrics per second, the lack of HTTP keep-alive was forcing netdata to reconnect on every iteration, which (to my view), is a bug.

streamed charts with duplicate names

There was an issue with charts that were created with name aliases. When these charts were streamed from netdata slaves to netdata masters, they ended up with duplicate chart names (ie instead of type.name they had type.type.name). Fixed it.

compatibility enhancements

  • better support for Oracle Linux, by @schindlerd
  • better support for Alpine Linux
  • various fixes at the build procedure for macOS
  • fping can now run as non-root, in static binary netdata packages

netdata generic enhancements

  • netdata can now listen on UNIX domain sockets (.sock files). This allows a local web server and netdata to communicate bypassing the network stack (for netdata set bind to = unix:/path/to/netdata.sock - this option supports multiple arguments, so netdata can listen to multiple unix sockets and tcp sockets, at the same time).

  • minor fixes at the installer, by @vincele

  • systemd netdata.service now allows setting negative netdata OOM score and restarts netdata if it crashes. The new netdata.service is not automatically installed when updating netdata. Either delete /etc/systemd/system/netdata.service and then update/re-install netdata, or copy the file by hand.

new plugins

  • Added Intel CPU temperature charts on FreeBSD and macOS, by @vlvkobal
  • Added CPU thermal throttling charts on Linux (useful on physical servers and possibly laptops)
  • Added chrony plugin, by @domschl
  • Added Stiebel Eltron plugin to collect metrics from heat pumps and hot water installations from Stiebel Eltron ISG.

improved plugins

  • web_log bugfixes, enhancements and optimizations (including squid logs), by @l2isbad
  • web_log now enables parsing HTTP/2 logs in custom_log_format, by @Funzinator
  • redis bugfixes, by @l2isbad
  • haproxy bugfixes, by @l2isbad
  • elasticsearch bugfixes and optimizations, by @l2isbad
  • rabbitmq bugfixes and optimizations, by @l2isbad
  • mdstat bugfixes, by @JeffHenson
  • tomcat improvements, by @Wing924
  • mysql improvements, by @alibo and @l2isbad
  • dovecot improvements
  • postgres improvements, by @facetoe
  • cpufreq fixed a bug that prevented accurate reporting of CPU frequencies. accurate works with the acpi-cpufreq driver and calculates the average CPU clock of the CPUs utilizing the accounting per frequency, as reported by the kernel, by @tycho
  • cpuidle performance improvements (faster under load) by @tycho
  • fail2ban bugfixes, by @l2isbad
  • SNMP plugin new uses latest net-snmp and the corrupted 64 bit counters encountered under certain node.js version is now fixed.

dashboard improvements

  • easypiecharts and gauges can now render arbitrary ranges and animate clock wise or counter clock wise.

  • container network interfaces are now moved to the container section and they are rendered from the container view point (i.e. sent = what the container sent) - no more veth* garbage on the dashboard.

    The interfaces also appear as eth0 (or whatever the container sees) and they are inside the container section of the dashboard. netdata maps each veth* interface to the right container, using plain cgroups features, so this works for all container managers (docker, lxc, etc).

  • containers and VMs now have summary gauges on the dashboard

    image

  • traditionally netdata was using 1024 bits = 1 kilobit. It is fixed: 1000 bits = 1 kilobit.

  • netdata charts should now work on wordpress pages.

alarms and notifications

  • alarm-notify.sh now supports debug mode, showing the exact commands it runs to send notifications, when export NETDATA_ALARM_NOTIFY_DEBUG=1

  • alarm-notify.sh now supports setting the sender email address of the emails it sends.

  • emails sent by alarm-notify.sh now include headers to reduce the possibility of them being scored as spam, by @Ferroin

  • network related alarms got new thresholds and improved badges

  • netdata now detects if the system has been suspended and pauses all alarms for 60 seconds on resume, to prevent false alarms (no more false alarms on laptops when they resume).

  • netdata alarms now support filtering based on hostname and O/S (linux, freebsd, macos). This means that netdata masters, can now support alarms for slaves of any O/S (i.e. a Linux netdata master can handle alarms for a FreeBSD slave).

  • netdata slack notifications now show the host sent the alarm. In the image below, the alarm is about bangalore, and is sent by netdata-build-server (at the lower left corner):

    image

statsd

  • the number of fractional points supported by statsd is now configurable (1 to 7).
  • 95th percentile calculation on statsd histograms and timers, was incorrectly averaging the values. It is now fixed.
  • statsd metrics with non ASCII text were processed by the statsd server, but were breaking JSON data generated by netdata. Fixed it by replacing all invalid characters.
discussion

Most helpful comment

I have promised that v1.8 will bring central health monitoring, but given that there is no progress in that area yet, and we have fixed major bugs in netdata, I think we should release v1.8 asap and plan all new features for v1.9.

All 3 comments

I have promised that v1.8 will bring central health monitoring, but given that there is no progress in that area yet, and we have fixed major bugs in netdata, I think we should release v1.8 asap and plan all new features for v1.9.

As a mostly unrelated sidenote- I really must applaud the changes to the prometheus scraping api @ktsaou . It was a small pain to update everything in my grafana, but being able to add entire families with one line vs 8+ is sooooo much better. Kudos to everyone involved with that process!

released v1.8.0
thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

BecomeBamboo picture BecomeBamboo  路  3Comments

gino picture gino  路  3Comments

RX14 picture RX14  路  3Comments

kenXengineering picture kenXengineering  路  3Comments

dankott picture dankott  路  3Comments