Core: Syslog-ng stability issues on 20.7

Created on 15 Aug 2020  路  23Comments  路  Source: opnsense/core

Not sure what the cause is, this needs more investigation, I'll try to update the ticket once we have more details.
Some tickets came in around syslog issues (https://github.com/opnsense/core/issues/4252, https://github.com/opnsense/core/issues/4262), which at first looked related to the legacy syslog (clog), but when syslog-ng is acting on it's own it seems to crash during startup in some cases as well.

What I've seen on one of the instances with the issue is that if it crashes during boot, it won't start for a couple of tries (service syslog-ng start or via the user interface), at the 3rd or 4th try it just starts without issues and can be stopped/started without issues again while not rebooting.

Another thing that stands out that if I start syslog-ng in foreground, it crashes before delivering any console output using

/usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid -Fdev -v

Trying to debug the startup using truss in similar conditions doesn't crash unfortunately.

truss /usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid -Fdev -v

Another thing I noticed was the service description, we still name syslog-ng remote syslog, I think we better rename both instances to make it explicit that syslog is the legacy one only used with clog.

Long story short, start syslog-ng a coupe of times manually fixes the problem in the mean time as a work around, but we do have to figure out at some point why this happens.

update 16/08/2020

Build a test package including debug symbols using the following in /etc/make.conf:

WITH_DEBUG_PORTS= sysutils/syslog-ng327

And installed it on the same machine with the issues, but then it gets weirder, where a non-debug build always crashes on boot (also when using syslog-ng328), the debug version seems to be stable on every boot.... which makes debugging even more difficult.

If someone has the same issue and want to try a debug kernel, just install using (OPNsense 20.7.1 / amd64):

pkg add -f https://pkg.opnsense.org/FreeBSD:12:amd64/snapshots/misc/syslog-ng327_dbg-3.27.1_1.txz 

update 17/08/2020

Just noticed on one of our machines that if syslog-ng doesn't crash on initial boot, it doesn't handle all messages either. Part of the log data just seem to vanish in thin air until you manage to restart syslog-ng into a normal state.

bug

All 23 comments

I can confirm that on many of our apu2.
After upgrading to 20.7.x syslog-ng doesn't start at boot and can't be started from the GUI.
If it's started from shell with /usr/local/etc/rc.d/syslog-ng start it work's, but after a reboot it's gone again.

@abplfab can you try the package with debug flags? we probably need a core dump file (mine didn't crash with symbols)

@abplfab can you try the package with debug flags? we probably need a core dump file (mine didn't crash with symbols)

Will try it this evening on some boxes.

Update 17/11/2020 - Have I time warped a few months ahead?

@marjohn56 :) copy-paste and a typo in the first one ... the other option is that I'm writing you from the future, but you'll probably won't take my word on that ;)

If the second one is true then we are three months further on and still the issue goes unresolved. 馃槈

let's consider this a typo then, the unresolved option doesn't sound too attractive ;)

Did the pkg install, core dump on reboot..

https://www.dropbox.com/s/lkhy9bul3plh3jj/syslog-ng.core?dl=0

Same here on an apu1: https://transfer.bug.ch/59566a97f42e9/syslog-ng.core (valid 5 days)

@marjohn56 @abplfab thanks, the backtrace looks the same as mine at a first glance

 # gdb syslog-ng
GNU gdb (GDB) 9.2 [GDB v9.2 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from syslog-ng...
(gdb) core  abplfab-syslog-ng.core
[New LWP 100169]
Core was generated by `/usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000348ae6cf5c1 in g_process_perform_supervise () at lib/gprocess.c:1160
#2  g_process_start () at lib/gprocess.c:1434
#3  0x000002fc4987c797 in main (argc=1, argv=0x673325f7a400) at syslog-ng/main.c:278
(gdb) core marjohn56-syslog-ng.core
[New LWP 100113]
Core was generated by `/usr/local/sbin/syslog-ng -p /var/run/syslog-ng.pid'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000005b54a12c5c1 in g_process_perform_supervise () at lib/gprocess.c:1160
#2  g_process_start () at lib/gprocess.c:1434
#3  0x000003fef6273797 in main (argc=1, argv=0x6739ee3fa060) at syslog-ng/main.c:278
(gdb) 

Looks like this was an issue back in 2012, unless I'm reading this wrong,

https://lists.balabit.hu/pipermail/syslog-ng/2012-May/018757.html

just did a fresh install of 20.7 and upgraded to OPNsense 20.7.1-amd64.

randomly getting
(syslog-ng), jid 0, uid 0: exited on signal 11 (core dumped)

running on
Intel(R) Atom(TM) CPU C2358 @ 1.74GHz (2 cores)
Manufacturer Sophos
Product Name SG
Version 125r2

@marjohn56 @abplfab can you try installing https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/syslog-ng327-3.27.1_2.txz ?

pkg add -f https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/syslog-ng327-3.27.1_2.txz

@marjohn56 @abplfab can you try installing https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/syslog-ng327-3.27.1_2.txz ?

pkg add -f https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/syslog-ng327-3.27.1_2.txz

done. if i notice another core dump i should find them under /syslog-ng.core ?

# pkg query %v syslog-ng327
3.27.1_2

i will send dumps if i get any again.

it can also be under /var/, best find it using

find / -name "syslog-ng*core"

Make sure to check the timestamp.

We probably need to push another version of the package with symbols, this is a version without debug symbols (it didn't die on my end)

Can you guys please post version for syslog too:

# pkg query %v syslog-ng327

Thanks,
Franco

Sorry guys, initial pkg update appears to have left v _1 in place, killed it and pkg install again and this time looks OK, three reboots and no core dump.

# pkg query %v syslog-ng327
3.27.1_2

damn paste!!

Fixed the paste :) Ok, that's more like it. We had some versions internally as well so sorry for the confusion.

Live system now has it too, that's still on 20.7.0, no issues after install and start from GUI.

root@fw:~ # pkg query %v syslog-ng327 3.27.1_2
no more crashes...

Syslog-ng peeps are still split over the fix, which is just a proof of concept really so that our users can have a happy syslog experience. Closing this as it will be in 20.7.2. Thanks everyone for helping out. 鉂わ笍

Was this page helpful?
0 / 5 - 0 ratings