Environment:
3 Zones: Master, Z1, Z2
Master: 1 Node
Z1: 2 Nodes, same network as master
Z2: 2 Nodes, different network as master
The satellites are used as command executor to the clients. We mostly use check_by_ssh, check_wmi and the most known SNMP plugins. The problematic zone got 585 hosts and 4165 services.
The cluster ran without problems since May last year, I didn't change anything in the cluster itself. Just added Hosts and services due the migration from old to new monitoring solution.
Current Behavior:
Since a short time, in the zone Z1 the api log is growing very fast and I cannot find out why.
I have cluster checks enabled, which tell me, that all endpoints are connected. I have already set "log_duration = 0" in the zones.conf. I have increased the check interval for the services from 1m to 5m.
I have doubled the CPU cores from the Z1 satellites.
I cannot see any late checks in Icingaweb2. The Service Results come to the master.
The cluster zone checks tells log lag less than 1ms.
The cluster check tells, that all endpoints are connected. Interval of both checks = 5s.
the api log is growing and growing till the disk is full. Also old logs are not automatically deleted. Even after a clean restart of the icinga2 service and/or the a reboot of the whole server.
On the Satelites, I can see in the logs many messages like:
[2019-01-25 09:59:14 +0100] information/ApiListener: Sending replay log for endpoint 'icinga2-master.domain.de' in zone 'master'.
[2019-01-25 09:59:14 +0100] information/ApiListener: Finished sending replay log for endpoint 'icinga2-master.domain.de' in zone 'master'.
Here is my cluster config
Master zones.conf
object Endpoint "icinga2-master.domain.de" {
host = "x.x.x.x"}
object Endpoint "icinga2-z1-101.domain.de" {
host = "x.x.x.x"
log_duration = 0 }
object Endpoint "icinga2-z1-102.domain.de" {
host = "x.x.x.x"
log_duration = 0}
object Endpoint "icinga2-z2-101.domain.de" {
host = "x.x.x.x"
log_duration = 0}
object Endpoint "icinga2-z2-102.domain.de" {
host = "x.x.x.x"
log_duration = 0}
object Zone "master" {
endpoints = [ "icinga2-master.domain.de" ] }
object Zone "z1" {
endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
parent = "master" }
object Zone "z2" {
endpoints = [ "icinga2-z2-101.domain.de","icinga2-z2-102.domain.de" ]
parent = "master" }
object Zone "global-templates" {
global = true}
z1-101/102 zones.conf
object Endpoint "icinga2-master.domain.de" {
log_duration = 0
}
object Zone "master" {
endpoints = ["icinga2-master.domain.de"]
}
object Endpoint "icinga2-z1-101.domain.de" {
}
object Endpoint "icinga2-z1-102.domain.de" {
}
object Zone "z1" {
endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
parent = "master"
}
object Zone "global-templates" {
global = true
}
Expected Behavior:
No api logs.
icinga2 --version):Copyright (c) 2012-2018 Icinga Development Team (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Ubuntu
Platform version: 16.04.5 LTS (Xenial Xerus)
Kernel: Linux
Kernel version: 4.4.0-116-generic
Architecture: x86_64
Build information:
Compiler: GNU 5.3.1
Build host: 409e1113863b
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
icinga2 feature list):Icinga Web 2 version and modules (System - About):
Icinga Web 2 Version
2.6.2
Git commit
f9ded02180200d85eb0e553708a7875bc10ba98b
Config validation (icinga2 daemon -C):
icinga2 daemon -C
[2019-01-25 10:39:15 +0100] information/cli: Icinga application loader (version: r2.10.2-1)
[2019-01-25 10:39:15 +0100] information/cli: Loading configuration file(s).
[2019-01-25 10:39:15 +0100] information/ConfigItem: Committing config item(s).
[2019-01-25 10:39:15 +0100] information/ApiListener: My API identity: icinga2-master.domain.de
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 7373 Services.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 928 Hosts.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 6 NotificationCommands.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 10822 Notifications.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 3 HostGroups.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 53 Downtimes.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 38 Comments.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 4 Zones.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 5 Endpoints.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 9 Users.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 247 CheckCommands.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 24 ServiceGroups.
[2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-01-25 10:39:19 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-01-25 10:39:19 +0100] information/cli: Finished validating the configuration file(s).
Maybe this Information helps here: https://github.com/Icinga/icinga2/issues/6542
I don't think so.
I am not using the director.
I don't have any apply rules, which do not match.
It is not /var/log/icinga2/icinga2.log which is growing. It is /var/lib/icinga2/api/log/ which normally is just filled, when there are connection problems between sat and master (which I don't have)
Additional info. Seems both zones are affected (z1 + z2 )
This config doesn't make sense.
object Zone "z1" {
endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
parent = "master" }
object Zone "z2" {
endpoints = [ "icinga2-z2-101.domain.de","icinga2-z2-102.domain.de" ]
parent = "master" }
Why?
2 zones, 2 satellites each. Like in the docs
object Zone "satellite" {
endpoints = [ "icinga2-satellite1.localdomain" ]
parent = "master"
}
The master checks the 4 satellites. The satellites each check the hosts which are in /etc/icinga 2/zones.d/z1 + z2 via ssh/wmi/nrpe/....
Ah, they share the same numbers. Sorry. Then I would check whether the HA enabled zones and their endpoints are connected to each other.
It seems this was the problem. Although I don't know why (doesn't they connect to each other automatically?). What I've done now is:
I am now watching if the logs are growing again. Will give feedback again during the day.
Looks good so far. Thx a lot Michi.
Most helpful comment
It seems this was the problem. Although I don't know why (doesn't they connect to each other automatically?). What I've done now is:
I am now watching if the logs are growing again. Will give feedback again during the day.