After running some security software (that should detect security vulnerabilities) Icinga stops responding. To be more specific, icinga responds on port 5665, TCP handshake is done but TLS not - timeout:
curl -kvvv --connect-timeout 10 https://client-host:5665
* Rebuilt URL to: https://client-host:5665/
* Trying 10.0.0.1...
* TCP_NODELAY set
* Connected to client-host (10.0.0.1) port 5665 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* Operation timed out after 10000 milliseconds with 0 out of 0 bytes received
* stopped the pause stream!
* Closing connection 0
curl: (28) Operation timed out after 10000 milliseconds with 0 out of 0 bytes received
I put all logs in context section. Let me know if you need any more information.
Hard to tell as I cannot see inside the security software
I'm using distributed setup with HA master and this issue so far occurs only on client nodes.
Issue occurs on all 3 levels.
I turned on debug logs and find this:
[2019-03-20 08:47:15 +0000] information/ApiListener: New client connection from [10.0.0.2]:55062 (no client certificate)
[2019-03-20 08:47:15 +0000] information/ApiListener: New client connection from [10.0.0.2]:55080 (no client certificate)
[2019-03-20 08:47:15 +0000] information/HttpServerConnection: Unable to process available data, they're already being processed in another thread
[2019-03-20 08:47:15 +0000] information/HttpServerConnection: HTTP client disconnected (from [10.0.0.2]:55000)
[2019-03-20 08:47:15 +0000] warning/JsonRpcConnection: Error while reading JSON-RPC message for identity '': Error: Invalid NetString (missing ,)
(0) icinga2: icinga::NetString::ReadStringFromStream(boost::intrusive_ptr<icinga::Stream> const&, icinga::String*, icinga::StreamReadContext&, bool, long) (+0x34f) [0x783aef]
(1) icinga2: icinga::JsonRpc::ReadMessage(boost::intrusive_ptr<icinga::Stream> const&, icinga::String*, icinga::StreamReadContext&, bool, long) (+0x2e) [0xa5bffe]
(2) icinga2: icinga::JsonRpcConnection::ProcessMessage() (+0x67) [0xa5c107]
(3) icinga2: icinga::JsonRpcConnection::DataAvailableHandler() (+0x88) [0x8b4588]
(4) icinga2: icinga::JsonRpcConnection::Start() (+0x97) [0x9862b7]
(5) icinga2: icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) (+0x5aa) [0x9868da]
(6) icinga2: icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) (+0x21) [0x987a81]
(7) icinga2: icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) (+0x77a) [0x85576a]
(8) libboost_thread-mt.so.1.53.0: <unknown function> (+0xd27a) [0x2adef8e5e27a]
(9) libpthread.so.0: <unknown function> (+0x7e25) [0x2adefab41e25]
(10) libc.so.6: clone (+0x6d) [0x2adefae54bad]
(0) Handling new API client connection
Context:
(0) Handling new API client connection
[2019-03-20 08:47:15 +0000] warning/JsonRpcConnection: API client disconnected for identity ''
Context:
(0) Handling new API client connection
[2019-03-20 08:47:15 +0000] warning/JsonRpcConnection: API client disconnected for identity ''
Context:
(0) Handling new API client connection
Also I noticed that thread pool utilization is increasing and icinga API (TLS handshake) stops responding after hitting 100% (I removed some lines):
grep 'Pool #' /var/log/icinga2/debug.log
[2019-03-22 09:16:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0107218%
[2019-03-22 09:16:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 4.61835e-125%
[2019-03-22 09:16:35 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00593544%
[2019-03-22 09:16:43 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 1.95682e-126%
[2019-03-22 09:16:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00952978%
[2019-03-22 09:16:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 8.29114e-128%
[2019-03-22 09:17:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00476407%
[2019-03-22 09:17:13 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 3.513e-129%
[2019-03-22 09:17:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00937185%
[2019-03-22 09:17:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 1.48833e-130%
[2019-03-22 09:17:35 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00527004%
[2019-03-22 09:17:43 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 6.30609e-132%
[2019-03-22 09:17:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0102098%
[2019-03-22 09:17:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 2.6715e-133%
[2019-03-22 09:18:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00594054%
[2019-03-22 09:18:13 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 1.13192e-134%
[2019-03-22 09:18:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0108238%
[2019-03-22 09:18:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 4.79603e-136%
[2019-03-22 09:18:35 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00556682%
[2019-03-22 09:18:43 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 8.99087%
[2019-03-22 09:18:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0413032%
[2019-03-22 09:18:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 18.9742%
[2019-03-22 09:19:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00604529%
[2019-03-22 09:19:13 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 20.9446%
[2019-03-22 09:19:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0102741%
[2019-03-22 09:19:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 8; Pool utilization: 75.7418%
[2019-03-22 09:19:35 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.109776%
[2019-03-22 09:19:43 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 26.6427%
[2019-03-22 09:19:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.0129722%
[2019-03-22 09:19:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 25.0696%
[2019-03-22 09:20:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00653%
[2019-03-22 09:20:13 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 25.0029%
[2019-03-22 09:20:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00963879%
[2019-03-22 09:20:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 25.0001%
[2019-03-22 09:20:35 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00488279%
[2019-03-22 09:20:43 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 6; Pool utilization: 25%
...
[2019-03-22 09:22:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5041%
[2019-03-22 09:22:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 8; Pool utilization: 33.3418%
...
[2019-03-22 09:25:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5057%
[2019-03-22 09:25:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 13; Pool utilization: 72.1968%
...
[2019-03-22 09:48:20 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.51%
[2019-03-22 09:48:28 +0000] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 46; Pool utilization: 68.9717%
...
[2019-03-22 10:11:50 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5071%
[2019-03-22 10:11:58 +0000] notice/ThreadPool: Pool #2: Pending tasks: 3; Average latency: 0ms; Threads: 64; Pool utilization: 87.5%
...
[2019-03-22 10:16:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5034%
[2019-03-22 10:16:13 +0000] notice/ThreadPool: Pool #2: Pending tasks: 31; Average latency: 0ms; Threads: 64; Pool utilization: 100%
...
[2019-03-22 10:17:05 +0000] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5039%
[2019-03-22 10:17:14 +0000] notice/ThreadPool: Pool #2: Pending tasks: 40; Average latency: 0ms; Threads: 64; Pool utilization: 100%
icinga2 --version):icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.3-1)
Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-514.26.2.el7.x86_64
Architecture: x86_64
Build information:
Compiler: GNU 4.8.5
Build host: unknown
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
CentOS Linux 7 (Core)icinga2 feature list): api mainlog debuglogicinga2 daemon -C) I removed apply rules warnings:[2019-03-22 15:53:46 +0000] information/cli: Icinga application loader (version: r2.10.3-1)
[2019-03-22 15:53:46 +0000] information/cli: Loading configuration file(s).
[2019-03-22 15:53:46 +0000] information/ConfigItem: Committing config item(s).
[2019-03-22 15:53:46 +0000] information/ApiListener: My API identity: brq01-001.s.jumpshot.com
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 2 EventCommands.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 7 NotificationCommands.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 180 HostGroups.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 3 Zones.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 3 Endpoints.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 10 Users.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 264 CheckCommands.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 3 UserGroups.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 12 ServiceGroups.
[2019-03-22 15:53:46 +0000] information/ConfigItem: Instantiated 8 TimePeriods.
[2019-03-22 15:53:46 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-03-22 15:53:46 +0000] information/cli: Finished validating the configuration file(s).
zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.object Endpoint "client-host" {
}
object Zone "zone1" {
endpoints = [ "satelite-host" ]
}
object Endpoint "satelite-host" {
}
object Zone "global-templates"{
global = true
}
Closing in favor of #7041 where issues like this are already linked.
@miso231 We have packages with our new network stack ready. Do you have any chance to test them? I'll give you a link then.
@lippserd Yes, I can do it. Just send me the link
Issue is solved in package provided by @lippserd.
Most helpful comment
Issue is solved in package provided by @lippserd.