Icinga2: /var/lib/icinga2/api/log/ growing although endpoints are connected

Created on 25 Jan 2019  路  8Comments  路  Source: Icinga/icinga2

Expected Behavior

Environment:
3 Zones: Master, Z1, Z2
Master: 1 Node
Z1: 2 Nodes, same network as master
Z2: 2 Nodes, different network as master
The satellites are used as command executor to the clients. We mostly use check_by_ssh, check_wmi and the most known SNMP plugins. The problematic zone got 585 hosts and 4165 services.
The cluster ran without problems since May last year, I didn't change anything in the cluster itself. Just added Hosts and services due the migration from old to new monitoring solution.

Current Behavior:
Since a short time, in the zone Z1 the api log is growing very fast and I cannot find out why.
I have cluster checks enabled, which tell me, that all endpoints are connected. I have already set "log_duration = 0" in the zones.conf. I have increased the check interval for the services from 1m to 5m.
I have doubled the CPU cores from the Z1 satellites.
I cannot see any late checks in Icingaweb2. The Service Results come to the master.
The cluster zone checks tells log lag less than 1ms.
The cluster check tells, that all endpoints are connected. Interval of both checks = 5s.
the api log is growing and growing till the disk is full. Also old logs are not automatically deleted. Even after a clean restart of the icinga2 service and/or the a reboot of the whole server.

On the Satelites, I can see in the logs many messages like:

[2019-01-25 09:59:14 +0100] information/ApiListener: Sending replay log for endpoint 'icinga2-master.domain.de' in zone 'master'.
[2019-01-25 09:59:14 +0100] information/ApiListener: Finished sending replay log for endpoint 'icinga2-master.domain.de' in zone 'master'.

Here is my cluster config
Master zones.conf

object Endpoint "icinga2-master.domain.de" {
        host = "x.x.x.x"}
object Endpoint "icinga2-z1-101.domain.de" {
        host = "x.x.x.x"
        log_duration = 0 }
object Endpoint "icinga2-z1-102.domain.de" {
        host = "x.x.x.x"
        log_duration = 0}
object Endpoint "icinga2-z2-101.domain.de" {
        host = "x.x.x.x"
        log_duration = 0}
object Endpoint "icinga2-z2-102.domain.de" {
        host = "x.x.x.x"
        log_duration = 0}
object Zone "master" {
        endpoints = [ "icinga2-master.domain.de" ] }
object Zone "z1" {
        endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
        parent = "master" }
object Zone "z2" {
        endpoints = [ "icinga2-z2-101.domain.de","icinga2-z2-102.domain.de" ]
        parent = "master" }
object Zone "global-templates" {
        global = true}

z1-101/102 zones.conf

object Endpoint "icinga2-master.domain.de" {
  log_duration = 0
}

object Zone "master" {
        endpoints = ["icinga2-master.domain.de"]
}

object Endpoint "icinga2-z1-101.domain.de" {
}

object Endpoint "icinga2-z1-102.domain.de" {
}

object Zone "z1" {
        endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
        parent = "master"
}

object Zone "global-templates" {
        global = true
}

Expected Behavior:
No api logs.

Your Environment

  • Version used (icinga2 --version):
    icinga2 --version
    icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.2-1)

Copyright (c) 2012-2018 Icinga Development Team (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl2.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
Platform: Ubuntu
Platform version: 16.04.5 LTS (Xenial Xerus)
Kernel: Linux
Kernel version: 4.4.0-116-generic
Architecture: x86_64

Build information:
Compiler: GNU 5.3.1
Build host: 409e1113863b

Application information:

General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2

Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var

Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid

  • Enabled features (icinga2 feature list):
    Enabled features: api checker command ido-mysql influxdb mainlog notification
  • Icinga Web 2 version and modules (System - About):
    Icinga Web 2 Version
    2.6.2
    Git commit
    f9ded02180200d85eb0e553708a7875bc10ba98b

  • Config validation (icinga2 daemon -C):
    icinga2 daemon -C
    [2019-01-25 10:39:15 +0100] information/cli: Icinga application loader (version: r2.10.2-1)
    [2019-01-25 10:39:15 +0100] information/cli: Loading configuration file(s).
    [2019-01-25 10:39:15 +0100] information/ConfigItem: Committing config item(s).
    [2019-01-25 10:39:15 +0100] information/ApiListener: My API identity: icinga2-master.domain.de
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ScheduledDowntime.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 7373 Services.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 928 Hosts.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 FileLogger.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 6 NotificationCommands.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 10822 Notifications.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 3 HostGroups.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ApiListener.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 53 Downtimes.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 38 Comments.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 4 Zones.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 5 Endpoints.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 9 Users.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 247 CheckCommands.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 24 ServiceGroups.
    [2019-01-25 10:39:19 +0100] information/ConfigItem: Instantiated 3 TimePeriods.
    [2019-01-25 10:39:19 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
    [2019-01-25 10:39:19 +0100] information/cli: Finished validating the configuration file(s).

aredistributed

Most helpful comment

It seems this was the problem. Although I don't know why (doesn't they connect to each other automatically?). What I've done now is:

  • at Z1-101: add the host ip to the endpoint config of Z1-102 at zones.conf
  • at Z1-102: add the host ip to the endpoint config of Z1-101 at zones.conf
  • at Z2-101: add the host ip to the endpoint config of Z2-102 at zones.conf
  • at Z2-102: add the host ip to the endpoint config of Z2-101 at zones.conf

I am now watching if the logs are growing again. Will give feedback again during the day.

All 8 comments

Maybe this Information helps here: https://github.com/Icinga/icinga2/issues/6542

I don't think so.
I am not using the director.
I don't have any apply rules, which do not match.
It is not /var/log/icinga2/icinga2.log which is growing. It is /var/lib/icinga2/api/log/ which normally is just filled, when there are connection problems between sat and master (which I don't have)

Additional info. Seems both zones are affected (z1 + z2 )

This config doesn't make sense.

object Zone "z1" {
        endpoints = [ "icinga2-z1-101.domain.de","icinga2-z1-102.domain.de" ]
        parent = "master" }
object Zone "z2" {
        endpoints = [ "icinga2-z2-101.domain.de","icinga2-z2-102.domain.de" ]
        parent = "master" }

Why?
2 zones, 2 satellites each. Like in the docs

object Zone "satellite" {
endpoints = [ "icinga2-satellite1.localdomain" ]
parent = "master"
}

The master checks the 4 satellites. The satellites each check the hosts which are in /etc/icinga 2/zones.d/z1 + z2 via ssh/wmi/nrpe/....

Ah, they share the same numbers. Sorry. Then I would check whether the HA enabled zones and their endpoints are connected to each other.

It seems this was the problem. Although I don't know why (doesn't they connect to each other automatically?). What I've done now is:

  • at Z1-101: add the host ip to the endpoint config of Z1-102 at zones.conf
  • at Z1-102: add the host ip to the endpoint config of Z1-101 at zones.conf
  • at Z2-101: add the host ip to the endpoint config of Z2-102 at zones.conf
  • at Z2-102: add the host ip to the endpoint config of Z2-101 at zones.conf

I am now watching if the logs are growing again. Will give feedback again during the day.

Looks good so far. Thx a lot Michi.

Was this page helpful?
0 / 5 - 0 ratings