This seems to be an upstream Kernel regression in RHEL 7 only.
Please read the published advisory and our twitter channel where we keep posting updates on the matter.
https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/
Hello,
I've applied the latest kernel update on my Icinga2 box. After booting the new kernel icinga2 is no longer able to start.
Running the previous kernel version is my current workaround.
Log:
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: execvp: Argument list too long
Jun 20 08:15:05 icinga.example.com prepare-dirs[2629]: Could not fetch RunAsUser variable. Error ''. Exiting.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service: control process exited, code=exited status=6
Jun 20 08:15:05 icinga.example.com systemd[1]: Failed to start Icinga host/service/network monitoring system.
Jun 20 08:15:05 icinga.example.com systemd[1]: Unit icinga2.service entered failed state.
Jun 20 08:15:05 icinga.example.com systemd[1]: icinga2.service failed.
Icinga2 version is 2.6.3
RHEL7.3 with all updates
kernel-3.10.0-514.21.2.el7.x86_64
Hi there,
got the same problem. This happend after upgrading to the newest RHEL kernel / glibc. The following (quick and dirty) fix did at least let me start Icinga again.
Change in /usr/sbin/icinga2 the last line to look like this:
exec $ICINGA2_BIN --no-stack-rlimit "$@"
When running strace, two systems with different patchlevel behave differently:
System with 3.10.0-514.21.1.el7.x86_64:
setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 25 vars */]) = 0
brk(0) = 0x243b000`
System with 3.10.0-514.21.2.el7.x86_64:
setrlimit(RLIMIT_NOFILE, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
setrlimit(RLIMIT_NPROC, {rlim_cur=16*1024, rlim_max=16*1024}) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
setrlimit(RLIMIT_STACK, {rlim_cur=256*1024, rlim_max=RLIM64_INFINITY}) = 0
execve("/usr/lib64/icinga2/sbin/icinga2", ["/usr/lib64/icinga2/sbin/icinga2", "--no-stack-rlimit"], [/* 21 vars */]) = -1 E2BIG (Argument list too long)
There seems to be some major change in the behavior of the kernels. Any idea how to change that?
I suppose this is related to https://rhn.redhat.com/errata/RHSA-2017-1484.html. Is this something that must be fixed within Icinga?
Looks like their security fix inadvertently breaks legitimate uses of setrlimit(RLIMIT_STACK, ...).
Thanks for the report, we'll look into that and are therefore postponing today's v2.7 release.
CVE-2017-1000364 seems fixed/applied in Debian too.
https://security-tracker.debian.org/tracker/CVE-2017-1000364
@dnsmichi I applied the patches this morning to our Debian 8 'jessie' system and Icinga 2 is still starting after a reboot.
root@[HOSTNAME]:~# uname -a
Linux [HOSTNAME] 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u1 (2017-06-18) x86_64 GNU/Linux
Is there anything else I can send you to help with this problem?
@mcktr thanks a lot, it's good to know that Debian does not seem to be affected.
We're currently investigating on the RHEL kernel update, diff'ing -1 and -2 source rpms.
stack_guard_gap = 256UL>>PAGE_SHIFT
expendable_stack_area()
Setting 4.5 MB stack size works, 4 MB does not.
We're lowering the stack size not to reserve too much memory for spawned threads. An older version just attempted to set ulimit -u inside the init script which failed on Debian Jessie in #1006.
Options:
Alright, for now you can use @pefmeister workaround, we'll have a blogpost detailing the issues out in the coming days and 2.7 will come with a longterm solution.
We reported a bug to RedHat mentioning the problem:
https://bugzilla.redhat.com/show_bug.cgi?id=1463241
The bug is currently private (I guess default for kernel)
You can reproduce the problem in a more simple way:
$ ulimit -s 1024
$ /bin/true
bash: /bin/true: Argument list too long
$ ulimit -s 4096
$ /bin/true
bash: /bin/true: Argument list too long
RHEL 6 seems to be fine:
[root@rhel6-test ~]# uname -a
Linux rhel6-test.localdomain 2.6.32-696.3.2.el6.x86_64 #1 SMP Wed Jun 7 11:51:39 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@rhel6-test ~]# bash -c "ulimit -s 256; /bin/true; echo 'Works.'"
Works.
Issues to consider:
The workaround for systemd also requires the prepare-dirs script being patched.
diff --git a/etc/initsystem/prepare-dirs b/etc/initsystem/prepare-dirs
index 6c4a08869..5677a787a 100644
--- a/etc/initsystem/prepare-dirs
+++ b/etc/initsystem/prepare-dirs
@@ -13,13 +13,13 @@ else
fi
-ICINGA2_USER=`$DAEMON variable get --current RunAsUser`
+ICINGA2_USER=`$DAEMON variable get --current RunAsUser --no-stack-rlimit`
if [ $? != 0 ]; then
echo "Could not fetch RunAsUser variable. Error '$ICINGA2_USER'. Exiting."
exit 6
fi
-ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup`
+ICINGA2_GROUP=`$DAEMON variable get --current RunAsGroup --no-stack-rlimit`
if [ $? != 0 ]; then
echo "Could not fetch RunAsGroup variable. Error '$ICINGA2_GROUP'. Exiting."
exit 6
Reference: https://monitoring-portal.org/index.php?thread/41070-problems-with-rhel-7-kernel-update-kernel-3-10-0-514-21-2-and-icinga-2/
CentOS 7 is currently rolling the kernel update onto the mirrors. The main mirror has it available.
[root@icinga2 ~]# uname -a ; ulimit -s 1024 && /bin/true && echo "works"
Linux icinga2 3.10.0-514.21.2.el7.x86_64 #1 SMP Tue Jun 20 12:24:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
-bash: /bin/true: Argument list too long
Looks like there are related problems with the Kernel Update, but also on Debian jessie here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=865311
Reported CentOS bug: https://bugs.centos.org/view.php?id=13453
We received a test-build from RedHat that works fine in my test environment.
Our advisory is updated with everything that happened.
https://www.icinga.com/2017/06/20/advisory-for-latest-security-updates-on-rhel-7/
Please ensure to open a support case at RedHat to ask for an accelerated fix, or a test RPM. This raises awareness that they'll release it soon enough.
The configuration options have been added for v2.7. I would leave this issue open until RedHat/CentOS released a new Kernel update.
New kernel from RH is available.
kernel-3.10.0-514.26.1
No issues so far.
I can confirm this, too. Seems to be working with the new kernel. Let's go 2.7!
Thanks for your tests. We'll wait until everything is publicly resolved.
https://bugzilla.redhat.com/show_bug.cgi?id=1463241 is not clear about its state, CentOS still has the old Kernel version.
It is also highly likely that Debian was affected as they recently changed their patch set.
https://tracker.debian.org/media/packages/l/linux/changelog-4.9.30-2%2Bdeb9u2
https://lists.debian.org/debian-security-announce/2017/msg00160.html
There might be more patches or regressions coming in, see e.g. https://github.com/torvalds/linux/commit/98da7d08850fb8bdeb395d6368ed15753304aa0c
Let's wait and see when the Kernel problems will calm down, then we'll may start a release cycle for 2.7 again.
A knowledge base entry has been published, saying solution is in progress
Also related: https://access.redhat.com/solutions/3098341
Hi,
CentOS has released new kernel update 3.10.0-514.26.1 and i could confirm that icinga2 process starts well.
Catching up after vacation - the CentOS bug tracker item (https://bugs.centos.org/view.php?id=13453) is resolved and RedHat has published multiple Kernel versions too. Tested that inside the Vagrant box, works fine.
[root@icinga2 ~]# uname -a && icinga2 daemon -C
Linux icinga2 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
information/cli: Icinga application loader (version: v2.6.3-399-gc7d71b0)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
information/ApiListener: My API identity: icinga2
warning/ApplyRule: Apply rule 'satellite-host' (in /etc/icinga2/conf.d/satellite.conf: 29:1-29:41) for type 'Dependency' does not match anywhere!
information/ConfigItem: Instantiated 4 ApiUsers.
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 3 Zones.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 1 Endpoint.
information/ConfigItem: Instantiated 1 UserGroup.
information/ConfigItem: Instantiated 28 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 177 CheckCommands.
information/ConfigItem: Instantiated 1 Downtime.
information/ConfigItem: Instantiated 4 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 157 Hosts.
information/ConfigItem: Instantiated 318 Comments.
information/ConfigItem: Instantiated 1 User.
information/ConfigItem: Instantiated 3 TimePeriods.
information/ConfigItem: Instantiated 161 Services.
information/ConfigItem: Instantiated 3 ServiceGroups.
information/ConfigItem: Instantiated 1 ScheduledDowntime.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 1 GraphiteWriter.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).
We'll discuss the 2.7 release once everyone involved returned from holidays, probably next week or so.
Closing here, thanks to everyone involved 馃憤
Most helpful comment
Hi,
CentOS has released new kernel update 3.10.0-514.26.1 and i could confirm that icinga2 process starts well.