Icinga2: Plugins crash when run from icinga2-2.8.3

Created on 25 Apr 2018  路  25Comments  路  Source: Icinga/icinga2

We are using Oracle linux (6.9). After upgrading icinga2 today ps that runs from the icinga user crashes.

abrt report

abrt_version:   2.0.8
cgroup:         
cmdline:        /bin/ps -eo 's uid pid ppid vsz rss pcpu etime comm args'
event_log:      
executable:     /bin/ps
hostname:       silo2
kernel:         3.8.13-118.11.2.el6uek.x86_64
last_occurrence: 1524644388
machineid:      sosreport_uploader-dmidecode=3e09daeef311ed180ecdce08b9798954e1b07b24b7a91ae57195bf48c0f82fa9
pid:            2219
pkg_arch:       x86_64
pkg_epoch:      0
pkg_fingerprint: 72F9 7B74 EC55 1F03
pkg_name:       procps
pkg_release:    45.0.1.el6_9.1
pkg_vendor:     Oracle America
pkg_version:    3.2.8
pwd:            /
time:           Wed 25 Apr 2018 09:15:18 AM CEST
uid:            498
username:       icinga

sosreport.tar.xz: Binary file, 1256500 bytes

core_backtrace:
:{   "signal": 11
:,   "executable": "/bin/ps"
:,   "stacktrace":
:      [ {   "crash_thread": true
:        ,   "frames":
:              [ {   "address": 4206748
:                ,   "build_id": "2ab2498a96e7cfc4942207da4da8376443d1d7ba"
:                ,   "build_id_offset": 12444
:                ,   "file_name": "/bin/ps"
:                }
:              , {   "address": 4203318
:                ,   "build_id": "2ab2498a96e7cfc4942207da4da8376443d1d7ba"
:                ,   "build_id_offset": 9014
:                ,   "file_name": "/bin/ps"
:                } ]
:        } ]
:}

dso_list:
:/lib64/ld-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689
:/lib64/libproc-3.2.8.so procps-3.2.8-45.0.1.el6_9.1.x86_64 (Oracle America) 1499866591
:/bin/ps procps-3.2.8-45.0.1.el6_9.1.x86_64 (Oracle America) 1499866591
:/lib64/libc-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689
:/lib64/libselinux.so.1 libselinux-2.0.94-7.el6.x86_64 (Oracle America) 1475744335
:/lib64/libdl-2.12.so glibc-2.12-1.209.0.3.el6_9.2.x86_64 (Oracle America) 1497950689

environ:
:TERM=screen
:PATH=/sbin:/usr/sbin:/bin:/usr/bin
:PWD=/
:LANG=en_US.UTF-8
:SHLVL=1
:LC_NUMERIC=C
:LC_ALL=C

limits:
:Limit                     Soft Limit           Hard Limit           Units     
:Max cpu time              unlimited            unlimited            seconds   
:Max file size             unlimited            unlimited            bytes     
:Max data size             unlimited            unlimited            bytes     
:Max stack size            262144               unlimited            bytes     
:Max core file size        0                    unlimited            bytes     
:Max resident set          unlimited            unlimited            bytes     
:Max processes             16384                16384                processes 
:Max open files            16384                16384                files     
:Max locked memory         65536                65536                bytes     
:Max address space         unlimited            unlimited            bytes     
:Max file locks            unlimited            unlimited            locks     
:Max pending signals       63680                63680                signals   
:Max msgqueue size         819200               819200               bytes     
:Max nice priority         0                    0                    
:Max realtime priority     0                    0                    
:Max realtime timeout      unlimited            unlimited            us        

maps:
:00400000-00414000 r-xp 00000000 fc:00 1839                               /bin/ps
:00614000-00615000 rw-p 00014000 fc:00 1839                               /bin/ps
:00615000-00635000 rw-p 00000000 00:00 0 
:00cbb000-00cdc000 rw-p 00000000 00:00 0                                  [heap]
:7f877f7bc000-7f877f7be000 r-xp 00000000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f7be000-7f877f9be000 ---p 00002000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9be000-7f877f9bf000 r--p 00002000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9bf000-7f877f9c0000 rw-p 00003000 fc:00 24469                      /lib64/libdl-2.12.so
:7f877f9c0000-7f877fb4a000 r-xp 00000000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fb4a000-7f877fd4a000 ---p 0018a000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd4a000-7f877fd4e000 r--p 0018a000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd4e000-7f877fd50000 rw-p 0018e000 fc:00 3016                       /lib64/libc-2.12.so
:7f877fd50000-7f877fd54000 rw-p 00000000 00:00 0 
:7f877fd54000-7f877fd62000 r-xp 00000000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877fd62000-7f877ff62000 ---p 0000e000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877ff62000-7f877ff63000 rw-p 0000e000 fc:00 4212                       /lib64/libproc-3.2.8.so
:7f877ff63000-7f877ff77000 rw-p 00000000 00:00 0 
:7f877ff77000-7f877ff94000 r-xp 00000000 fc:00 18559                      /lib64/libselinux.so.1
:7f877ff94000-7f8780193000 ---p 0001d000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780193000-7f8780194000 r--p 0001c000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780194000-7f8780195000 rw-p 0001d000 fc:00 18559                      /lib64/libselinux.so.1
:7f8780195000-7f8780196000 rw-p 00000000 00:00 0 
:7f8780196000-7f87801b6000 r-xp 00000000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803a3000-7f87803a7000 rw-p 00000000 00:00 0 
:7f87803b5000-7f87803b6000 rw-p 00000000 00:00 0 
:7f87803b6000-7f87803b7000 r--p 00020000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803b7000-7f87803b8000 rw-p 00021000 fc:00 3008                       /lib64/ld-2.12.so
:7f87803b8000-7f87803b9000 rw-p 00000000 00:00 0 
:7ffcc5fd9000-7ffcc5ffa000 rw-p 00000000 00:00 0                          [stack]
:7ffcc5ffd000-7ffcc5fff000 r-xp 00000000 00:00 0                          [vdso]
:ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

open_fds:
:0:/dev/null
:pos:        0
:flags:        0100002
:1:pipe:[188294541]
:pos:        0
:flags:        01
:2:pipe:[188294542]
:pos:        0
:flags:        01

var_log_messages:
:Apr 25 09:15:18 silo2 kernel: ps[2219]: segfault at 7ffcc5f77ef8 ip 000000000040309c sp 00007ffcc5f77f00 error 6 in ps[400000+14000]
:Apr 25 09:15:18 silo2 abrt[2220]: Saved core dump of pid 2219 (/bin/ps) to /var/spool/abrt/ccpp-2018-04-25-09:15:18-2219 (503808 bytes)
:Apr 25 09:15:22 silo2 kernel: ps[2468]: segfault at 7ffdff0e49e8 ip 000000000040309c sp 00007ffdff0e49f0 error 6 in ps[400000+14000]
:Apr 25 09:15:22 silo2 abrt[2469]: Not saving repeating crash in '/bin/ps'

arechecks bug queuimportant

Most helpful comment

Thanks for the reports everyone 馃挭

2.8.4 is published to our package repos.

[root@608c145dffda /]# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.4-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 4.9.87-linuxkit-aufs
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

All 25 comments

Seems we have that problem too.
From perl script command:
$msg_count = `$path_to_sudo $path_to_exim -bpc`;

Returns error code 11
Same command under icinga user running directly from shell returns code 0.

After downgrade to 2.8.2-1 all works as before.

@olegy89 Also on Oracle?

@Crunsher centos6, centos7

I'm not able to reproduce this on centos7. Can you share your exact Host, Service and CheckCommand object definition?

This works fine:

object Host "c" {

 check_command = "c"
 check_interval = 5s
 retry_interval = 5s
}
object CheckCommand "c" {
  command = [ "/bin/ps", "-eo", "s uid pid ppid vsz rss pcpu etime comm args" ]
}

Hi,

in our case it's the mailq command that fails. It does not fail in all cases with the earlier icinga2 versions. These checks run for months, user nagios is allowed and so on.

mailq on host is empty

nagios$ mailq nagios$

user nagios does the check via plugin manually and it works

nagios$ '/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2' OK: exim mailq (0) is below threshold (2/5)|unsent=0;2;5;0

icingaweb2 reports CRITICAL

CRITICAL: Error code 0 returned from /usr/bin/mailq

icinga2 debug log on client

[2018-04-25 09:36:34 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2': PID 3365 [2018-04-25 09:36:34 +0200] notice/Process: PID 3365 ('/usr/lib/nagios/plugins/check_mailq' '-M' 'exim' '-c' '5' '-w' '2') terminated with exit code 2

syslog on client

Apr 25 09:36:34 lnv-2065 kernel: [ 974.883826] mailq[3366]: segfault at 7fff49eca968 ip 0000559f3db94463 sp 00007fff49eca810 error 6 in exim4[559f3db80000+f3000]

zones.d/director-global/service_apply.conf

````
apply Service "mailq" {
check_command = "mailq"
max_check_attempts = "5"
check_period = "always"
check_interval = 1m
retry_interval = 1m
check_timeout = 10s
enable_notifications = false
enable_active_checks = true
enable_passive_checks = true
enable_event_handler = true
enable_perfdata = true
volatile = false

assign where "Linux Agent via Icinga 2 Core" in host.templates
command_endpoint = host_name
vars.mailq_critical = "5"
vars.mailq_servertype = "exim"
vars.mailq_warning = "2"

import DirectorOverrideTemplate

}
````

It was okay before and happens since installing icinga2-2.8.3-1
Using Ubuntu 16.04-LTS / Ubuntu 14.04 LTS

Cheers,
Marianne

We noticed problem only with external command 'sudo exim -bpc' and 'check_ipmi_sensor' plugin.
'ps' works fine. But 'eximq' fails not on each host despite same version of icinga and exim.

object CheckCommand "eximq" {
  import "ipv4-or-ipv6"
  command = [  PluginDir + "/base/" + "check_eximq" ]
  arguments = {
    "--critical" = "$critical$"
    "--warning" = "$warning$"
  }
  timeout = "60"
}
cat ./check_eximq
#!/usr/bin/env perl
$msg_count = `sudo exim -bpc`;
print $?;
exit;
sudo -u icinga ./check_eximq
0

Result displayed in icinga web:
Plugin Output
11

@olegy89 Could you run uname -srvmo on the machine? The problem might be kernel specific

@Crunsher
Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 GNU/Linux
Linux 3.10.0-714.10.2.lve1.5.12.el7.x86_64 #1 SMP Fri Feb 2 00:27:48 EST 2018 x86_64 GNU/Linux
Linux 3.10.0-693.21.1.vz7.46.3 #1 SMP Mon Apr 2 18:21:35 MSK 2018 x86_64 GNU/Linux

@Crunsher affected examples:
Linux 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018 x86_64 GNU/Linux
Linux 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 GNU/Linux

Thanks! So it has nothing to do with the kernel sigh

I am able to reproduce this using @sysadmama 's config example

@dnsmichi
we are using check_procs.

apply Service "procs" {
  import "generic-service"

  check_command = "procs"

  assign where host.name == NodeName
}

The commit at fault is bf959371c4505bfe27b0682611c035d64b90efd3
Tickets: #6119 #6215

We've isolated the problem and are preparing 2.8.4 which reverts the regression.

Backported to support/2.8

Thanks for the reports everyone 馃挭

2.8.4 is published to our package repos.

[root@608c145dffda /]# icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.8.4-1)

Copyright (c) 2012-2017 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 4.9.87-linuxkit-aufs
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

I can confirm, patch is working for check_ipmi_sensor. Thanks !

Jup, patch ist working too. Thanks! :tada:

hi,

just as an information: I had/have the same issue for my nagios-plugins-ceph (check_ceph_*) and check_ipmi. It took me some hours to find it, but going back to 2.8.2-1.stretch solved the problem.

@linuxmail 2.8.4 has this fixed

I did a little reading yesterday evening on the faulty patch, and for some technical reference it can be assumed that it changed the way the default stack size was set and handled later. This caused a too low stack size where specific applications/plugins would then crash from in this process/thread space.

We've seen a similar thing with the stack guard patches in the RHEL kernel where setting the stack size also failed and made applications crash. That experience, and the only known located change in application.cpp justifies the immediate revert for production. Future patches in this region will be reviewed long-term, and if not properly proven with test protocols, likely not get merged.

Cheers,
Michael

Hello @dnsmichi
Sorry for this regression. Actually I believe this is because bf95937 fix the rlimit stack resetting feature, then let the default rlimit value 256 * 1024(hardcoded there https://github.com/Icinga/icinga2/blob/v2.8.4/lib/base/application.cpp#L1503) become effected, which is too low for some specific check commands.
I think we just fix the logic there( https://github.com/Icinga/icinga2/blob/v2.8.4/lib/base/application.cpp#L249 ) - if user didn't set the RLimitStack config, we just don't reset the rlimit value.

@tclh123 Feel free to open another PR. We would like having this fixed but got our hands are full with 2.9.0

Such a PR must include a test protocol with and without the patch testing all the edge cases, and requires long term tests. As can be seen, there are more implications with breaking things here.

Was this page helpful?
0 / 5 - 0 ratings