Icinga2: Icinga2 Agent remote instance not connected to master - Connection constantly breaks off

Created on 17 Jul 2018  路  10Comments  路  Source: Icinga/icinga2

Icinga2 on a Windows Server 2016, connected to a master installation on a Ubuntu Server LTS 16.04.4, constantly redeploys config and restarts. The connection in Icinga reports: Remote Icinga instance is not connected

Expected Behavior


Icinga2 agent should stay connected to Icinga2 master. Config should not be redeployed every 1-5 minutes without any changes(?)

Current Behavior


The agent loses connection (as far as icingaweb2 tells). Only every few minutes a check is being executed and a result is reported back to Icinga2 master.

Possible Solution


Maybe this is related to #6378?

Steps to Reproduce (for bugs)



I'm not sure if this can be reproduced but:

  1. Install Icinga2 agent via Powershell Self Service API

Context


I've deployed the Icinga2 Agent installation via Powershell & Self Service API through GPO's.
This has worked flawlessly. Every once in a while but multiple times a day a few servers loose connection for a few minutes. Most noticably are two Windows Server 2016 Core DCs (SRV11 & SRV12), which only seem to connect every 5 minutes once for a check and then they loose the connection again.

image

Icinga2 Master Log:
https://pastebin.com/GvvgwPVE

Icinga2 Master Debuglog (part of it for srv11):
https://pastebin.com/dthQveXt

Icinga2 Agent Log on srv11:
https://pastebin.com/2PJd221Q

Your Environment

  • Version used (icinga2 --version):
    r2.8.4-1
  • Operating System and version:
    Ubuntu Server 16.04.4 LTS (Xenial Xerus)
  • Enabled features (icinga2 feature list):
    api checker debuglog graphite ido-mysql livestatus mainlog notification
  • Icinga Web 2 version and modules (System - About):
    Icinga Web 2 Version
    2.5.3
    businessprocess | 2.1.0
    director | master
    grafana | master
    monitoring | 2.5.3
  • Config validation (icinga2 daemon -C):
[2018-07-17 11:49:40 +0200] warning/icinga-app: Sysconfig file '/etc/sysconfig/icinga2' cannot be read. Using default values.
[2018-07-17 11:49:40 +0200] warning/icinga-app: Sysconfig file '/etc/sysconfig/icinga2' cannot be read. Using default values.
information/cli: Icinga application loader (version: r2.8.4-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: srv20.<customer.domain>
warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere!
warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere!
warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 12 Zones.
information/ConfigItem: Instantiated 10 Endpoints.
information/ConfigItem: Instantiated 2 FileLoggers.
information/ConfigItem: Instantiated 2 ApiUsers.
information/ConfigItem: Instantiated 1 LivestatusListener.
information/ConfigItem: Instantiated 115 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 213 CheckCommands.
information/ConfigItem: Instantiated 3 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 21 Hosts.
information/ConfigItem: Instantiated 2 UserGroups.
information/ConfigItem: Instantiated 4 Users.
information/ConfigItem: Instantiated 3 TimePeriods.
information/ConfigItem: Instantiated 94 Services.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
Object 'srv11' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 6:1-6:23
  * __name = "srv11"
  * host = "10.1.1.11"
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 7:5-7:22
  * log_duration = 0
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 8:5-8:21
  * name = "srv11"
  * package = "director"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 6
    * last_column = 23
    * last_line = 6
    * path = "/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf"
  * templates = [ "srv11" ]
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 6:1-6:23
  * type = "Endpoint"
  * zone = "srv20.<customer.domain>"

Object 'srv12' of type 'Endpoint':
  % declared in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 26:1-26:23
  * __name = "srv12"
  * host = "10.1.1.12"
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 27:5-27:22
  * log_duration = 0
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 28:5-28:21
  * name = "srv12"
  * package = "director"
  * port = "5665"
  * source_location
    * first_column = 1
    * first_line = 26
    * last_column = 23
    * last_line = 26
    * path = "/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf"
  * templates = [ "srv12" ]
    % = modified in '/var/lib/icinga2/api/packages/director/srv20.<customer.domain>-1531819936-0/zones.d/srv20.<customer.domain>/agent_endpoints.conf', lines 26:1-26:23
  * type = "Endpoint"
  * zone = "srv20.<customer.domain>"

Thanks

arewindows bug reNC

All 10 comments

Since you are referring to the windows reload issue, did you test the snapshot packages?

Can this problem be reproduced without any script automation, e.g. what steps need to be taken to configure the client, especially its local zones.conf file.

What does

constantly redeploys config and restarts.

mean exactly? I can see that the Director is involved here, are there automated configuration deployments in a frequent interval?

I did not try the snapshot version. When trying to install it over 2.8.4 i can only choose a repair installation which doesn't seem to upgrade it?
Any chance to tell the Self Service API to upgrade to snapshot builds?

I'll try to reproduce this issue. The steps right now where:

  • deploying the Powershell script with the additional commands taken from director through gpo.
  • Waiting for the server to connect itself to director
  • commit config

constantly redeploying config and restarting means I can see in the logfile that icinga2 on the Windows Server is getting the following messages over and over again, nearly every minute:

information/ApiListener: Applying config update from endpoint 'srv20.<customer.domain>' of zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Sending config updates for endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//.timestamp
[2018-07-18 09:24:58 +0200] information/ApiListener: Finished sending config file updates for endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Syncing runtime objects to endpoint 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/001-director-basics.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Sending replay log for endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/commands.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Finished sending replay log for endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/host_templates.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Finished syncing endpoint 'srv20.<customer.domain>' in zone 'srv20.<customer.domain>'.
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/hostgroups.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/service_apply.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/service_templates.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/servicesets.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/user_templates.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Updating configuration file: C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global//director/usergroups.conf
[2018-07-18 09:24:58 +0200] information/ApiListener: Applying configuration file update for path 'C:\ProgramData\icinga2\var/lib/icinga2/api/zones/director-global' (12611 Bytes). Received timestamp '2018-07-17 16:55:03 +0200' (1531839303.755495), Current timestamp '1970-01-01 01:00:00 +0100' (0.000000).
[2018-07-18 09:24:58 +0200] information/ApiListener: Restarting after configuration change.
[2018-07-18 09:24:59 +0200] information/Application: Got reload command: Starting new instance.

while other servers don't report those deployments that frequently. Logfiles on those Servers are 5x as big as on the others because of that.

There are no automated deployments in director

After I installed Icinga2 for another customer I noticed, this error seems to occur as soon as I deploy the script via GPO.
If I run the script manually the hosts stay connected and checks do not flap.

edit:
Icinga Version 2.9.1
IcingaWeb2 2.6.1

Do you need any more information?

Thank you :)

I'd say someone needs to reproduce the problem. I don't know much about the Powershell script and self service API, maybe @LordHepipud can assist here :)

I don't know, maybe it's the same problem.
We've some Windows Server 2016,
there i've deployed the icinga2 2.10.2 Agent with powershell module,
after a while (hours, a day), the agent lost connection and the icinga2 agent produces a cpu load of 100%.
When i stops the agent, to load goes to normal, the agent works a while normaly,
as a workarround i try to deinstall the agent, install it manually, but then i've the same problem after a while.
Here's a part of the log:

Context:
    (0) Handling new API client connection

[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] critical/ApiListener: Client TLS handshake failed (from [::ffff:10.18.14.43]:25082): Error: Socket was closed during TLS handshake.


Context:
    (0) Handling new API client connection

[2019-01-23 18:07:46 +0100] critical/ApiListener: Client TLS handshake failed (from [::ffff:10.18.14.44]:62906): Error: Socket was closed during TLS handshake.


Context:
    (0) Handling new API client connection

[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] warning/TlsStream: TLS stream was disconnected.
[2019-01-23 18:07:46 +0100] critical/ApiListener: Client TLS handshake failed (from [::ffff:10.18.14.44]:64460): Error: Socket was closed during TLS handshake.

The problem only occurs on windows 2016 servers Update: and on windows 2012 servers, there daily. It's a critical problem for us, because the affectected servers are in a production environment..

ref/NC/599869

Hi,

For those having problems with their Windows Agents, please try setting the NodeName constant and restart Icinga. We may have problems with detecting the hostname.

Best,
Eric

@mikeb93 Please try @lippserd's suggestion.

I have set the NodeName in constants.conf for one of the Windows hosts that show this issue.
I'll observe and report back.

Thanks

This also is a problem seen in the past and resolved with 2.11.

Was this page helpful?
0 / 5 - 0 ratings