Libelektra: augeas plugin might crash

Created on 27 Oct 2019  路  20Comments  路  Source: ElektraInitiative/libelektra

When doing kdb ls I got a crash in Augeas:

==14172== Callgrind, a call-graph generating cache profiler
==14172== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==14172== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==14172== Command: kdb ls /
==14172== 
==14172== For interactive control, run 'callgrind_control -h'.
==14172== brk segment overflow in thread #1: can't grow to 0x4830000
==14172== (see section Limitations in user manual)
==14172== NOTE: further instances of this message will not be shown
free(): double free detected in tcache 2

Sorry, I crashed by the signal SIGABRT
This should not have happened!

Please report the issue at https://issues.libelektra.org/
==14172== 
==14172== Process terminating with default action of signal 6 (SIGABRT): dumping core
==14172==    at 0x4B097BB: raise (raise.c:51)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x1D631B: catchSignal(int) (main.cpp:110)
==14172==    by 0x4B0983F: ??? (in /lib/x86_64-linux-gnu/libc-2.28.so)
==14172==    by 0x4B097BA: raise (internal-signals.h:84)
==14172==    by 0x4AF4534: abort (abort.c:79)
==14172==    by 0x4B4B507: __libc_message (libc_fatal.c:181)
==14172==    by 0x4B51C19: malloc_printerr (malloc.c:5341)
==14172==    by 0x4B536FC: _int_free (malloc.c:4193)
==14172==    by 0x4B41880: fclose@@GLIBC_2.2.5 (iofclose.c:77)
==14172==    by 0x4F09313: elektraAugeasGet (augeas.c:540)
==14172==    by 0x4908B3C: kdbGet (kdb.c:554)
==14172== 
==14172== Events    : Ir
==14172== Collected : 30693614075
==14172== 
==14172== I   refs:      30,693,614,075
zsh: abort      valgrind --tool=callgrind kdb ls /

The problem was quite obviously introduced by 339d34dfa331926e486340c6343b62266fb28e14 which does not return correctly in the error cases anymore.

A code review is needed if other places with return in macros are affected.

bug urgent

All 20 comments

Could you please give me a reproducible example on how to trigger this error?
I pulled the latest master, installed it (with augeas plugin) and do not experience any problems.

The error obviously only occurs if fopen fails. It could be reproduced by using LD_PRELOAD (e.g. see https://stackoverflow.com/questions/35771395/why-doesnt-ld-preload-trick-catch-open-when-called-by-fopen) with a shared library that fails on the Nth call of fopen. If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

Otherwise, as already written above, I think code review is the more efficient approach to find these problems.

Any progress here? I got the same segfault again.

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fdad623e535 in __GI_abort () at abort.c:79
#2  0x000055e1872b330c in catchSignal (signum=<optimized out>) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007fdad623e535 in __GI_abort () at abort.c:79
#6  0x00007fdad6295508 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fdad63a028d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#7  0x00007fdad629bc1a in malloc_printerr (str=str@entry=0x7fdad63a1f58 "free(): double free detected in tcache 2") at malloc.c:5341
#8  0x00007fdad629d6fd in _int_free (av=0x7fdad63d7c40 <main_arena>, p=0x55e18e5b2c30, have_lock=<optimized out>) at malloc.c:4193
#9  0x00007fdad628b881 in _IO_new_fclose (fp=fp@entry=0x55e18e5b2c40) at iofclose.c:77
#10 0x00007fdad5fa3314 in elektraAugeasGet (handle=<optimized out>, returned=0x55e1892bcdd0, parentKey=0x55e1891f61d0)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/augeas/augeas.c:540
#11 0x00007fdad659dc7d in elektraGetDoUpdate (parentKey=0x55e1891f61d0, split=0x55e1892b8320) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:592
#12 kdbGet (handle=<optimized out>, ks=0x55e1891b0390, parentKey=0x55e1891f61d0) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1236
#13 0x000055e187243a2a in kdb::KDB::get (this=<optimized out>, returned=..., parentKey=...)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/bindings/cpp/include/keyset.hpp:592
#14 0x000055e1872612b2 in FindCommand::execute (this=<optimized out>, cl=...) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/find.cpp:33
#15 0x000055e18724194d in main (argc=<optimized out>, argv=0x7ffc15dd7bc8) at /usr/include/c++/8/bits/unique_ptr.h:342

I will add it higher in my priority

Any progress here? It is only about adding a few return statements and such crashes are really severe.

I will Look at it in sunday/monday morning...

Thank you. The faster the better as this (and #3197) are release blockers and lcdproc is waiting for a release.

If you really want to do that, I can help you how to do it. It might also help to find other segfaults or unhelpful error messages.

I fear that you need to help me because I cannot reproduce this error. Do you have a mounted yml file which has some problems in it or why exactly is the augeas plugin called?

Please look at the code. It is easy to see in the code what was done wrongly but such error is difficult to reproduce in all its forms.

Without looking at the code, we can never get Elektra as stable as it was before introducing the new errors. Plenty of regressions were introduced in #2686, this cannot be fixed without understanding the code. return also exists in Java, so the semantics should be clear to you.

Please look at the code.

I did it more than once ... The whole stacktrace tells me that when I try to close the file descriptor, there is a double "free()" call (free(): double free detected in tcache 2). Though I absolutely cannot relate this to anything I have changed.

It is easy to see in the code what was done wrongly

No it is absolutely not for me because this is related to errors in memory management. Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

return also exists in Java, so the semantics should be clear to you.

Yes but not memory management or this kind of file opening/closing mechanism and as far as I understand this error it is related to this

Also your sentence is partly insulting as you assume I am too dumb to find the problem because it is "easy"...

This is your interpretation. "easy" was about that the code changes should not change any semantics, so it is enough to check if semantics were changed. What I said (and still say) is that you should look at the code without any assumptions of free/malloc/memory.

So forget everything about the stacktrace. Only look at 339d34dfa331926e486340c6343b62266fb28e14 and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

So forget everything about the stacktrace. Only look at 339d34d and about where return has been before and where it is missing now. One more hint: Macros are simple textual replacements, returning from them means to return from wherever the macro is used.

Ok thank you, this was the information I was missing. I was on the wrong path before.
I opened a PR here: https://github.com/ElektraInitiative/libelektra/pull/3244

Was this the only occurrence of a macro that had a return in the whole error changes?

Btw, augeas was the only case which I remember that had this sort of macro replacement. I replaced all three occurrences.

I will take an extra look this evening in the old revision to check for that

Thank you!

I have looked through the error settings in v0.8.26 and as far as I can tell it did not occur somewhere else than in the augeas section.

So I think after we merge the PR this can be closed

Thank you for looking! So mounting and augeas were the only regressions.

I think we may close this issue

Looks good, it does not crash anymore, instead I get:

Sorry, module augeas issued the error C01200:
Installation: Lens not found

Actually this is another known bug: the name of the lens should be printed. #203

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sanssecours picture sanssecours  路  3Comments

markus2330 picture markus2330  路  4Comments

markus2330 picture markus2330  路  3Comments

mpranj picture mpranj  路  3Comments

markus2330 picture markus2330  路  4Comments