Icinga2: InfluxdbWriter not closing connections Icinga2 2.10.3 CentOS 7

Created on 1 Mar 2019  路  18Comments  路  Source: Icinga/icinga2

Expected Behavior


I would expect Icinga2 to close http connections to Influxdb rather than keeping them open and opening additional ones.

Current Behavior


Icinga2 appears to keep opening new connections to influxdb and never closing them. This is causing thousands of established connections. I have seen three outcomes so far:

  • Icinga2 system runs out of memory and kills Icinga2

  • Icinga2 system runs out of File Descriptors

  • Influxdb system runs out of File Descriptors and influxdb crashes

Possible Solution



I am not sure what the solution is. However, this is new behavior in icinga2 2.10.3

Steps to Reproduce (for bugs)



1.Turn on influxdb feature and monitor established TCP connections

Context



This issue is causing Icinga2 to be killed by the system when all memory is used, run out of fd and checks fail, or crashing the influxdb process on a remote system.

Your Environment

  • Version used (icinga2 --version):
icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.3-1)

Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-957.5.1.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid
  • Operating System and version:
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Enabled features (icinga2 feature list):
Disabled features: command compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification
  • Icinga Web 2 version and modules (System - About):
Icingaweb2 2.6.2
Modules: doc 2.6.2, grafana 1.3.4, monitoring 2.6.2
  • Config validation (icinga2 daemon -C):
[2019-03-01 07:40:24 -0500] information/cli: Icinga application loader (version: r2.10.3-1)
[2019-03-01 07:40:24 -0500] information/cli: Loading configuration file(s).
[2019-03-01 07:40:24 -0500] information/ConfigItem: Committing config item(s).
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/
[2019-03-01 07:40:24 -0500] information/ApiListener: My API identity: REMOVED
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'snmp-interface' (in /var/lib/icinga2/api/zones/global-config/_etc/services-snmp.conf: 1:0-1:29) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'snmp-storage' (in /var/lib/icinga2/api/zones/global-config/_etc/services-snmp.conf: 20:1-20:28) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'disk-windows' (in /var/lib/icinga2/api/zones/global-config/_etc/services-windows.conf: 1:0-1:27) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'nscp-local-memory' (in /var/lib/icinga2/api/zones/global-config/_etc/services-windows.conf: 9:1-9:33) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'ping6' (in /var/lib/icinga2/api/zones/global-config/_etc/services.conf: 35:1-35:21) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule '' (in /var/lib/icinga2/api/zones/global-config/_etc/services.conf: 282:1-282:66) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 646 Services.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 58 Hosts.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 4 EventCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 FileLogger.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 113 Dependencies.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 7 NotificationCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1400 Notifications.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 9 HostGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 ApiListener.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 34 Downtimes.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 36 Zones.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 41 Endpoints.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 6 ApiUsers.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 4 Users.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 218 CheckCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 2 UserGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 5 ServiceGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-03-01 07:40:25 -0500] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-03-01 07:40:25 -0500] information/cli: Finished validating the configuration file(s).
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

  • This is my /etc/icinga2/features-available/influxdb.conf for the icinga2 feature. I have removed host, username, and password values.

cat /etc/icinga2/features-available/influxdb.conf 
/**
 * The InfluxdbWriter type writes check result metrics and
 * performance data to an InfluxDB HTTP API
 */

library "perfdata"
object InfluxdbWriter "influxdb" {
  host = "*"
  ssl_enable = true
  port = 8086
  database = "icinga2"
  flush_threshold = 1024
  flush_interval = 10s
  username = "*"
  password = "*"
  host_template = {
    measurement = "$host.check_command$"
    tags = {
      hostname = "$host.name$"
    }
  }
  service_template = {
    measurement = "$service.check_command$"
    tags = {
      hostname = "$host.name$"
      service = "$service.name$"
    }
  }
  enable_send_thresholds = true
  enable_send_metadata = true
}
  • influxd version InfluxDB v1.7.4 (git: 1.7 ef77e72f435b71b1ad6da7d6a6a4c4a262439379)
  • socket stats for system with influxdb feature on
cat /proc/net/sockstat
sockets: used 5663
TCP: inuse 5435 orphan 1 tw 11 alloc 5438 mem 40
UDP: inuse 4 mem 3
UDPLITE: inuse 0
RAW: inuse 3
FRAG: inuse 0 memory 0
  • socket stats for system with influxdb feature off
cat /proc/net/sockstat
sockets: used 270
TCP: inuse 44 orphan 0 tw 19 alloc 45 mem 53
UDP: inuse 4 mem 2
UDPLITE: inuse 0
RAW: inuse 2
FRAG: inuse 0 memory 0

Here is a graph showing the Established TCP connections growing over time. The first peak Icinga2 was killed for out of memory. The second peak influxdb daemon on another server crashed and restarted.
image

Before the upgrade this Icinga2 system would maintain about 33 TCP connections. After the upgrade it peaked at 7,270 TCP connections.

areinfluxdb bug

Most helpful comment

Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...

All 18 comments

@Al2Klimov Thank you for taking a look at this.

As I continue to troubleshoot the situation, it appears that the issue is only showing up on systems that are using an ssl connection to influxdb. I am not seeing it on test systems that are not using an ssl connection to influxdb.

ref/IC/12219

Same issue here, one connection more per influxdb flush as it seems. we downgraded to 2.10.2 for now.
screen shot 2019-03-06 at 11 24 15

Hello @marcofl!

Please could you test #6990?

Best,
AK

I can confirm the exact same behaviour for our system. As we have sufficient RAM the process does not die but all checks fail with:
Error: Function call 'pipe2' failed with error code 24, 'Too many open files'

image

Same problem here. InfluxDBWriter with TLS.

All Icinga checks cannot be executed after a while due to 'Too many open files' error. This is a serious bug. We have downgraded to 2.10.2 for now.

Hello guys!

Feel free to test the PR I linked. The faster one of you writes a test protocol, the faster it will be merged.

Best,
AK

Hello @marcofl!

Please could you test #6990?

Best,
AK

Sure, can you point me to the correct snapshot package for xenial?

Hello @marcofl!

I'm afraid there isn't any (yet). If there were any, I'd not refer to the PR.

Best,
AK

I'm wondering about the changes involved here, since git diff v2.10.2 v2.10.3 lib/perfdata doesn't highlight something here. Likely it is related to a2ae01e64b4fa00aeb3c652ac9a633bf458ca5ed with the dropped life support references making the original problem with not closing the streams at all more visible.

Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...

Yes, I can confirm we experience the exact same issue. Also using Icinga 2.10.3 with InfluxDB and TLS.

Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...

@Al2Klimov created a patch which is on my review list. I am at Icinga Camp Berlin currently so I will merge this the latest next week.

Cheers,
Michael

Installed https://github.com/Icinga/icinga2/pull/6990 on our systems that were suffering from this issue.

It has been running since March 11.

The issue appears to have cleared up. We have been observing the TCP Connection count and it is NOT increasing. Previously we could have a crash or out of file handles in 12 hours or less.
Icinga2 system is now functioning normally.

Thank you for your help.

Yes, I can confirm we experience the exact same issue. Also using Icinga 2.10.3 with InfluxDB and TLS.

Me too :-)

$ netstat -patune | grep icinga | wc -l
3829

nearly all connections to InfluxDB.

This may affect other (TLS) streams not only for InfluxDB/Elasticsearch features.

Tests

Use the influxdb vagrant box, and modify it a bit for TLS.

 usermod -a -G icinga influxdb

vim /etc/influxdb/influxdb.conf

[http]
https-certificate = "/var/lib/icinga2/certs/icinga2-influxdb.vagrant.demo.icinga.com.crt"
https-private-key = "/var/lib/icinga2/certs/icinga2-influxdb.vagrant.demo.icinga.com.key"
https-enabled = true

systemctl restart influxdb


vim /etc/icinga2/features-enabled/influxdb.conf

  ssl_enable = true

systemctl restart icinga2

The Grafana datasource needs to be modified to server, https and skip verify.

Generate some more load from Icinga:

vim /etc/icinga2/demo/many.conf

const countHosts = 100;

systemctl restart icinga2

Open files

[root@icinga2-influxdb ~]# for p in $(pidof icinga2); do lsof -p $p | grep TCP; done
icinga2 5317 icinga   15u     IPv4             807181       0t0       TCP *:5665 (LISTEN)
icinga2 5317 icinga   16u     IPv4             806410       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:42390->icinga2-influxdb.vagrant.demo.icinga.com:mysql (ESTABLISHED)
icinga2 5317 icinga   19u     IPv4             809574       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34776->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   20u     IPv4             806696       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34736->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   22u     IPv4             807700       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34750->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   23u     IPv4             815710       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34928->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   24u     IPv4             813370       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34882->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   25u     IPv4             823461       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:35094->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga   26u     IPv4             814492       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:34906->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)

....

icinga2 5317 icinga  180u     IPv4             877680       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:36486->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga  181u     IPv4             873421       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:36210->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga  185u     IPv4             876449       0t0       TCP icinga2-influxdb.vagrant.demo.icinga.com:36476->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)

Fix

Screen Shot 2019-03-18 at 14 33 23

I can confirm the issues is gone with 2.10.4 for us. Thanks a lot.

Was this page helpful?
0 / 5 - 0 ratings