Icinga2: Unable to start icinga2 with kernel-3.10.0-514.21.2 RHEL7

Created on 20 Jun 2017  路  33Comments  路  Source: Icinga/icinga2

General Notes

This seems to be an upstream Kernel regression in RHEL 7 only.

Please read the published advisory and our twitter channel where we keep posting updates on the matter.

https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/

Original Description

Hello,
I've applied the latest kernel update on my Icinga2 box. After booting the new kernel icinga2 is no longer able to start.
Running the previous kernel version is my current workaround.

Log:
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: execvp: Argument list too long
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: Could not fetch RunAsUser variable. Error ''. Exiting.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service: control process exited, code=exited status=6
Jun 20 08:15:05 icinga.example.com systemd[1]: Failed to start Icinga host/service/network monitoring system.
Jun 20 08:15:05 icinga.example.com systemd[1]: Unit icinga2.service entered failed state.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service failed.

Icinga2 version is 2.6.3
RHEL7.3 with all updates
kernel-3.10.0-514.21.2.el7.x86_64

arecli bug corcrash queuimportant

Most helpful comment

Hi,

CentOS has released new kernel update 3.10.0-514.26.1 and i could confirm that icinga2 process starts well.

All 33 comments

Hi there,

got the same problem. This happend after upgrading to the newest RHEL kernel / glibc. The following (quick and dirty) fix did at least let me start Icinga again.

Change in /usr/sbin/icinga2 the last line to look like this:

exec $ICINGA2_BIN --no-stack-rlimit "$@"

When running strace, two systems with different patchlevel behave differently:

System with 3.10.0-514.21.1.el7.x86_64:

setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 25 vars */]) = 0
brk(0)                                  = 0x243b000`

System with 3.10.0-514.21.2.el7.x86_64:

setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 21 vars */]) = -1 E2BIG (Argument list too long)

There seems to be some major change in the behavior of the kernels. Any idea how to change that?

I suppose this is related to https://rhn.redhat.com/errata/RHSA-2017-1484.html. Is this something that must be fixed within Icinga?

Looks like their security fix inadvertently breaks legitimate uses of setrlimit(RLIMIT_STACK, ...).

Thanks for the report, we'll look into that and are therefore postponing today's v2.7 release.

CVE-2017-1000364 seems fixed/applied in Debian too.

https://security-tracker.debian.org/tracker/CVE-2017-1000364

@dnsmichi I applied the patches this morning to our Debian 8 'jessie' system and Icinga 2 is still starting after a reboot.

root@[HOSTNAME]:~# uname -a Linux [HOSTNAME] 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64 GNU/Linux

Is there anything else I can send you to help with this problem?

@mcktr thanks a lot, it's good to know that Debian does not seem to be affected.

We're currently investigating on the RHEL kernel update, diff'ing -1 and -2 source rpms.

stack_guard_gap = 256UL>>PAGE_SHIFT

expendable_stack_area()

Setting 4.5 MB stack size works, 4 MB does not.

We're lowering the stack size not to reserve too much memory for spawned threads. An older version just attempted to set ulimit -u inside the init script which failed on Debian Jessie in #1006.

Options:

  • remove RLIMIT_STACK and make it a systemd/init script option again
  • increase rlimit to a hardcoded size

Alright, for now you can use @pefmeister workaround, we'll have a blogpost detailing the issues out in the coming days and 2.7 will come with a longterm solution.

We reported a bug to RedHat mentioning the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1463241

The bug is currently private (I guess default for kernel)

You can reproduce the problem in a more simple way:

$ ulimit -s 1024
$ /bin/true
bash: /bin/true: Argument list too long

$ ulimit -s 4096
$ /bin/true
bash: /bin/true: Argument list too long

RHEL 6 seems to be fine:

[root@rhel6-test ~]# uname -a
Linux rhel6-test.localdomain 2.6.32-696.3.2.el6.x86_64 #1 SMP Wed Jun 7 11:51:39 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel6-test ~]# bash -c "ulimit -s 256; /bin/true; echo 'Works.'"
Works.

Issues to consider:

  • #111
  • #5013

The workaround for systemd also requires the prepare-dirs script being patched.

diff --git a/etc/initsystem/prepare-dirs b/etc/initsystem/prepare-dirs
index 6c4a08869..5677a787a 100644
--- a/etc/initsystem/prepare-dirs
+++ b/etc/initsystem/prepare-dirs
@@ -13,13 +13,13 @@ else
 fi


-ICINGA2_USER=`$DAEMON variable get --current RunAsUser`
+ICINGA2_USER=`$DAEMON variable get --current RunAsUser --no-stack-rlimit`
 if [ $? != 0 ]; then
         echo "Could not fetch RunAsUser variable. Error '$ICINGA2_USER'. Exiting."
         exit 6
 fi

-ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup`
+ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup --no-stack-rlimit`
 if [ $? != 0 ]; then
         echo "Could not fetch RunAsGroup variable. Error '$ICINGA2_GROUP'. Exiting."
         exit 6

Reference: https://monitoring-portal.org/index.php?thread/41070-problems-with-rhel-7-kernel-update-kernel-3-10-0-514-21-2-and-icinga-2/

CentOS 7 is currently rolling the kernel update onto the mirrors. The main mirror has it available.

[root@icinga2 ~]# uname -a ; ulimit -s 1024 && /bin/true && echo "works"
Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
-bash: /bin/true: Argument list too long

Looks like there are related problems with the Kernel Update, but also on Debian jessie here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865311

We received a test-build from RedHat that works fine in my test environment.

Our advisory is updated with everything that happened.

https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/

Please ensure to open a support case at RedHat to ask for an accelerated fix, or a test RPM. This raises awareness that they'll release it soon enough.

The configuration options have been added for v2.7. I would leave this issue open until RedHat/CentOS released a new Kernel update.

New kernel from RH is available.
kernel-3.10.0-514.26.1

No issues so far.

I can confirm this, too. Seems to be working with the new kernel. Let's go 2.7!

Thanks for your tests. We'll wait until everything is publicly resolved.

https://bugzilla.redhat.com/show_bug.cgi?id=1463241 is not clear about its state, CentOS still has the old Kernel version.

It is also highly likely that Debian was affected as they recently changed their patch set.
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.30-2%2Bdeb9u2
https://lists.debian.org/debian-security-announce/2017/msg00160.html

There might be more patches or regressions coming in, see e.g. https://github.com/torvalds/linux/commit/98da7d08850fb8bdeb395d6368ed15753304aa0c

Let's wait and see when the Kernel problems will calm down, then we'll may start a release cycle for 2.7 again.

A knowledge base entry has been published, saying solution is in progress

Hi,

CentOS has released new kernel update 3.10.0-514.26.1 and i could confirm that icinga2 process starts well.

Catching up after vacation - the CentOS bug tracker item (https://bugs.centos.org/view.php?id=13453) is resolved and RedHat has published multiple Kernel versions too. Tested that inside the Vagrant box, works fine.

[root@icinga2 ~]# uname -a && icinga2 daemon -C
Linux icinga2 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
information/cli: Icinga application loader (version: v2.6.3-399-gc7d71b0)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: icinga2
warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
information/ConfigItem: Instantiated 4 ApiUsers.
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 3 Zones.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 1 Endpoint.
information/ConfigItem: Instantiated 1 UserGroup.
information/ConfigItem: Instantiated 28 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 177 CheckCommands.
information/ConfigItem: Instantiated 1 Downtime.
information/ConfigItem: Instantiated 4 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 157 Hosts.
information/ConfigItem: Instantiated 318 Comments.
information/ConfigItem: Instantiated 1 User.
information/ConfigItem: Instantiated 3 TimePeriods.
information/ConfigItem: Instantiated 161 Services.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

We'll discuss the 2.7 release once everyone involved returned from holidays, probably next week or so.

Closing here, thanks to everyone involved 馃憤

Was this page helpful?
0 / 5 - 0 ratings