I would expect Icinga2 to close http connections to Influxdb rather than keeping them open and opening additional ones.
Icinga2 appears to keep opening new connections to influxdb and never closing them. This is causing thousands of established connections. I have seen three outcomes so far:
Icinga2 system runs out of memory and kills Icinga2
Icinga2 system runs out of File Descriptors
Influxdb system runs out of File Descriptors and influxdb crashes
I am not sure what the solution is. However, this is new behavior in icinga2 2.10.3
1.Turn on influxdb feature and monitor established TCP connections
This issue is causing Icinga2 to be killed by the system when all memory is used, run out of fd and checks fail, or crashing the influxdb process on a remote system.
icinga2 --version):icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.3-1)
Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-957.5.1.el7.x86_64
Architecture: x86_64
Build information:
Compiler: GNU 4.8.5
Build host: unknown
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
icinga2 feature list):Disabled features: command compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification
Icingaweb2 2.6.2
Modules: doc 2.6.2, grafana 1.3.4, monitoring 2.6.2
icinga2 daemon -C):[2019-03-01 07:40:24 -0500] information/cli: Icinga application loader (version: r2.10.3-1)
[2019-03-01 07:40:24 -0500] information/cli: Loading configuration file(s).
[2019-03-01 07:40:24 -0500] information/ConfigItem: Committing config item(s).
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-03-01 07:40:24 -0500] warning/ApiListener: Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/
[2019-03-01 07:40:24 -0500] information/ApiListener: My API identity: REMOVED
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'snmp-interface' (in /var/lib/icinga2/api/zones/global-config/_etc/services-snmp.conf: 1:0-1:29) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'snmp-storage' (in /var/lib/icinga2/api/zones/global-config/_etc/services-snmp.conf: 20:1-20:28) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'disk-windows' (in /var/lib/icinga2/api/zones/global-config/_etc/services-windows.conf: 1:0-1:27) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'nscp-local-memory' (in /var/lib/icinga2/api/zones/global-config/_etc/services-windows.conf: 9:1-9:33) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule 'ping6' (in /var/lib/icinga2/api/zones/global-config/_etc/services.conf: 35:1-35:21) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] warning/ApplyRule: Apply rule '' (in /var/lib/icinga2/api/zones/global-config/_etc/services.conf: 282:1-282:66) for type 'Service' does not match anywhere!
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 646 Services.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 58 Hosts.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 4 EventCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 FileLogger.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 113 Dependencies.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 7 NotificationCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1400 Notifications.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 9 HostGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 ApiListener.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 34 Downtimes.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 36 Zones.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 41 Endpoints.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 6 ApiUsers.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 4 Users.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 218 CheckCommands.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 2 UserGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 5 ServiceGroups.
[2019-03-01 07:40:25 -0500] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-03-01 07:40:25 -0500] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-03-01 07:40:25 -0500] information/cli: Finished validating the configuration file(s).
If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
This is my /etc/icinga2/features-available/influxdb.conf for the icinga2 feature. I have removed host, username, and password values.
cat /etc/icinga2/features-available/influxdb.conf
/**
* The InfluxdbWriter type writes check result metrics and
* performance data to an InfluxDB HTTP API
*/
library "perfdata"
object InfluxdbWriter "influxdb" {
host = "*"
ssl_enable = true
port = 8086
database = "icinga2"
flush_threshold = 1024
flush_interval = 10s
username = "*"
password = "*"
host_template = {
measurement = "$host.check_command$"
tags = {
hostname = "$host.name$"
}
}
service_template = {
measurement = "$service.check_command$"
tags = {
hostname = "$host.name$"
service = "$service.name$"
}
}
enable_send_thresholds = true
enable_send_metadata = true
}
InfluxDB v1.7.4 (git: 1.7 ef77e72f435b71b1ad6da7d6a6a4c4a262439379)cat /proc/net/sockstat
sockets: used 5663
TCP: inuse 5435 orphan 1 tw 11 alloc 5438 mem 40
UDP: inuse 4 mem 3
UDPLITE: inuse 0
RAW: inuse 3
FRAG: inuse 0 memory 0
cat /proc/net/sockstat
sockets: used 270
TCP: inuse 44 orphan 0 tw 19 alloc 45 mem 53
UDP: inuse 4 mem 2
UDPLITE: inuse 0
RAW: inuse 2
FRAG: inuse 0 memory 0
Here is a graph showing the Established TCP connections growing over time. The first peak Icinga2 was killed for out of memory. The second peak influxdb daemon on another server crashed and restarted.

Before the upgrade this Icinga2 system would maintain about 33 TCP connections. After the upgrade it peaked at 7,270 TCP connections.
@Al2Klimov Thank you for taking a look at this.
As I continue to troubleshoot the situation, it appears that the issue is only showing up on systems that are using an ssl connection to influxdb. I am not seeing it on test systems that are not using an ssl connection to influxdb.
ref/IC/12219
Same issue here, one connection more per influxdb flush as it seems. we downgraded to 2.10.2 for now.

Hello @marcofl!
Please could you test #6990?
Best,
AK
I can confirm the exact same behaviour for our system. As we have sufficient RAM the process does not die but all checks fail with:
Error: Function call 'pipe2' failed with error code 24, 'Too many open files'

Same problem here. InfluxDBWriter with TLS.
All Icinga checks cannot be executed after a while due to 'Too many open files' error. This is a serious bug. We have downgraded to 2.10.2 for now.
Hello guys!
Feel free to test the PR I linked. The faster one of you writes a test protocol, the faster it will be merged.
Best,
AK
Hello @marcofl!
Please could you test #6990?
Best,
AK
Sure, can you point me to the correct snapshot package for xenial?
Hello @marcofl!
I'm afraid there isn't any (yet). If there were any, I'd not refer to the PR.
Best,
AK
I'm wondering about the changes involved here, since git diff v2.10.2 v2.10.3 lib/perfdata doesn't highlight something here. Likely it is related to a2ae01e64b4fa00aeb3c652ac9a633bf458ca5ed with the dropped life support references making the original problem with not closing the streams at all more visible.
Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...
Yes, I can confirm we experience the exact same issue. Also using Icinga 2.10.3 with InfluxDB and TLS.
Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...
@Al2Klimov created a patch which is on my review list. I am at Icinga Camp Berlin currently so I will merge this the latest next week.
Cheers,
Michael
Installed https://github.com/Icinga/icinga2/pull/6990 on our systems that were suffering from this issue.
It has been running since March 11.
The issue appears to have cleared up. We have been observing the TCP Connection count and it is NOT increasing. Previously we could have a crash or out of file handles in 12 hours or less.
Icinga2 system is now functioning normally.
Thank you for your help.
Yes, I can confirm we experience the exact same issue. Also using Icinga 2.10.3 with InfluxDB and TLS.
Me too :-)
$ netstat -patune | grep icinga | wc -l
3829
nearly all connections to InfluxDB.
This may affect other (TLS) streams not only for InfluxDB/Elasticsearch features.
Use the influxdb vagrant box, and modify it a bit for TLS.
usermod -a -G icinga influxdb
vim /etc/influxdb/influxdb.conf
[http]
https-certificate = "/var/lib/icinga2/certs/icinga2-influxdb.vagrant.demo.icinga.com.crt"
https-private-key = "/var/lib/icinga2/certs/icinga2-influxdb.vagrant.demo.icinga.com.key"
https-enabled = true
systemctl restart influxdb
vim /etc/icinga2/features-enabled/influxdb.conf
ssl_enable = true
systemctl restart icinga2
The Grafana datasource needs to be modified to server, https and skip verify.
Generate some more load from Icinga:
vim /etc/icinga2/demo/many.conf
const countHosts = 100;
systemctl restart icinga2
[root@icinga2-influxdb ~]# for p in $(pidof icinga2); do lsof -p $p | grep TCP; done
icinga2 5317 icinga 15u IPv4 807181 0t0 TCP *:5665 (LISTEN)
icinga2 5317 icinga 16u IPv4 806410 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:42390->icinga2-influxdb.vagrant.demo.icinga.com:mysql (ESTABLISHED)
icinga2 5317 icinga 19u IPv4 809574 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34776->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 20u IPv4 806696 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34736->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 22u IPv4 807700 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34750->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 23u IPv4 815710 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34928->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 24u IPv4 813370 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34882->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 25u IPv4 823461 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:35094->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 26u IPv4 814492 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:34906->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
....
icinga2 5317 icinga 180u IPv4 877680 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:36486->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 181u IPv4 873421 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:36210->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)
icinga2 5317 icinga 185u IPv4 876449 0t0 TCP icinga2-influxdb.vagrant.demo.icinga.com:36476->icinga2-influxdb.vagrant.demo.icinga.com:d-s-n (ESTABLISHED)

I can confirm the issues is gone with 2.10.4 for us. Thanks a lot.
Most helpful comment
Can you give this ticket higher priority / bug label maybe? This actually made this version unusable for everyone using the InfluxDB writer...