I recently tried to run nnpackage_run on android with verbose log for debugging.
Running nnpackage_run with ONERT_LOG_ENABLE=1 produces segmentation fault for every model.
$ ONERT_LOG_ENABLE=1 BACKENDS=cpu ./nnfw_alpha/nnpackage_run --nnpackage mobilenet_v2_1.0_224/
[Compiler] [Compiler] ==== Compiler Options ====
[Compiler] backend_list : cpu
[Compiler] trace_filepath :
[Compiler] graph_dump_level : 0
[Compiler] op_seq_max_node : 0
[Compiler] executor : Linear
[Compiler] manual_scheduler_options : (Too many things to print)
[Compiler] he_scheduler : false
[Compiler] he_profiling_mode : false
[Compiler] disable_compile : false
[Compiler] fp16_enable : false
[Compiler] [loadBackend] Successfully loaded 'cpu' - libbackend_cpu.so
Segmentation fault (core dumped)
/data/tombstones/$ adb shell cat /data/tombstones/tombstone_13 | ./ndk-stack -sym ~/ONE/Product/out/lib
********** Crash dump: **********
Build fingerprint: 'samsung/y2qsqw/y2q:10/QP1A.190711.020/G986USQE1ATDA:eng/test-keys'
#00 0x0000000000056d54 /data/local/tmp/nnfw_alpha/nnpackage_run (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char>>::sentry::sentry(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char>>&)+60) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >::sentry::sentry(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&)
/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:263:9
#01 0x0000000000056b20 /data/local/tmp/nnfw_alpha/nnpackage_run (_ZNSt6__ndk124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m+44) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::__put_character_sequence<char, std::__ndk1::char_traits<char> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*, unsigned long)
/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:722:57
#02 0x000000000004828c /data/local/tmp/nnfw_alpha/nnpackage_run (_ZNSt6__ndk1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc+68) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::operator<<<std::__ndk1::char_traits<char> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*)
/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:865:12
#03 0x00000000002b0210 /data/local/tmp/nnfw_alpha/lib/libbackend_cpu.so (onert_backend_create+108) (BuildId: 9f1ba80af139a0695d7480d5f8eba6e322a4296b)
onert_backend_create
/home/jhyeo/ONE_alpha/runtime/onert/backend/cpu/cpu.cc:24:3
#04 0x0000000000482de4 /data/local/tmp/nnfw_alpha/lib/libonert_core.so (onert::compiler::BackendManager::loadBackend(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>> const&)+916) (BuildId: 020e9a745fe2a9ba7c018d3e5bae044ae92e4ea0)
onert::compiler::BackendManager::loadBackend(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)
/home/jhyeo/ONE_alpha/runtime/onert/core/src/compiler/BackendManager.cc:108:64
...
Crash dump is completed
nnpackage_run crashes at here. But it is just a simple std::cout function
@Samsung/nnfw Does anyone knows about this error?
I found that linker send signal
0x0000007ff7f349dc in __dl__ZL24debuggerd_signal_handleriP7siginfoPv () from target:/system/bin/linker64
(gdb) up
#1 <signal handler called>
(gdb) up
#2 0x00000055555abf58 in std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >::sentry::sentry (this=0x7fffffc018, __os=...)
at /home/test5/Desktop/test.nnpk/ONE/tools/cross/ndk/r20/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:263
263 if (__os.good())
I plan to find more in the code linking libc++_shared.so.
As a temporary solution
ONERT_LOG_ENABLE=1 BACKENDS=cpu LD_PRELOAD=/data/local/tmp/test5/lib/libc++_shared.so ./nnpackage_run --nnpackage mobilenet_v2/
When the process was running, I printed the /proc/{PID} and found that libc++_shared.so was not loaded.
@dhdh-oh Thanks! It works like a charm.
As we discussed personally,
# infra/nnfw/cmake/buildtool/cross/toolchain_aarch64-android.cmake
...
+set(ANDROID_STL c++_shared)
and use ./tools/cross/ndk/r20/ndk/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.so
This is another solution.
In android guide, It is recommended to use only one stl.
https://developer.android.com/ndk/guides/cpp-support#one_stl_per_app
Warning: The linker can catch some of these issues at build time, but many of these issues will only manifest as a crash or odd behavior at run time.
Below, I share my debugging experience in terms of information sharing.
I used this version : https://github.com/Samsung/ONE/pull/2794/files
/ONE/runtime/onert/backend/cpu/cpu.cc:24 is main point
(gdb) b /home/test5/Desktop/test.nnpk/ONE/runtime/onert/backend/cpu/cpu.cc:24
Thread 1 "nnpackage_run" hit Breakpoint 1, onert_backend_create ()
at /home/test5/Desktop/test.nnpk/ONE/runtime/onert/backend/cpu/cpu.cc:24
24 VERBOSE(onert_backend_create) << "'cpu' loaded\n";
'0x0000007ff1679fec' comes out a little later.
we can see jump to 0x7ff1626730. it is plt(code). simply 'plt' = {Here is the code to jump to 'got.plt'}
'got.plt' = {address of function. like cache}
(gdb) disassemble
...
(gdb) b* 0x0000007ff1679fec
0x0000007ff1679fec <+108>: bl 0x7ff1626730 <std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::operator<< <std::__ndk1::char_traits<ch
ar> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*)@plt>
# will jump to 0x7ff1626730
In 0x7ff1626730, there is a code that loads values (addresses) into registers x17 and x16.
x17 contains the address of the code associated with cout.
gdb-peda$ disass
Dump of assembler code for function _ZNSt6__ndk1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc@plt:
=> 0x0000007ff1626730 <+0>: adrp x16, 0x7ff1ac4000 <Eigen::TensorEvaluator<Eigen::TensorBroadcastingOp<std::__ndk1::array<long, 5ul> const, Eigen::TensorMap<Eigen::Tensor<unsigned int const, 5, 1, long>, 16, Eigen::MakePointer> const> const, Eigen::ThreadPoolDevice>::evalSubExprsIfNeeded(unsigned int const*)@got.plt>
0x0000007ff1626734 <+4>: ldr x17, [x16, #360]
0x0000007ff1626738 <+8>: add x16, x16, #0x168
0x0000007ff162673c <+12>: br x17
gdb-peda$ i r x17 x16
x17 0x555559d3a0 0x555559d3a0
x16 0x7ff1ac4168 0x7ff1ac4168
where is 0x555559d3a0 ?
1|y2q:/ # cat /proc/8687/maps
5555555000-5555aff000 r-xp 00000000 103:24 5395 /data/local/tmp/cmake_patch/nnpackage_run
5555b0e000-5555b1b000 r--p 005a9000 103:24 5395 /data/local/tmp/cmake_patch/nnpackage_run
5555b1b000-5555b1f000 rw-p 005b6000 103:24 5395 /data/local/tmp/cmake_patch/nnpackage_run
it is code area of nnpackage_run!! (maybe llvm/libc++ is static linked)
libbackend_cpu.so was intended to use libc++_shared.so, but it is still using static linked functions.
https://github.com/Samsung/ONE/pull/2794/files
let's see. before libbackend_cpu.so is loaded. There have been many cout related function calls.
"[Compiler] [Compiler] ==== Compiler Options ====" <------ other logs.
Process ./nnpackage_run created; pid = 8687
gdbserver: Unable to determine the number of hardware watchpoints available.
gdbserver: Unable to determine the number of hardware breakpoints available.
Listening on port 7878
Remote debugging from host 127.0.0.1
[Compiler] [Compiler] ==== Compiler Options ====
[Compiler] backend_list : cpu
[Compiler] trace_filepath :
[Compiler] graph_dump_level : 0
[Compiler] op_seq_max_node : 0
[Compiler] executor : Linear
[Compiler] manual_scheduler_options : (Too many things to print)
[Compiler] he_scheduler : false
[Compiler] he_profiling_mode : false
[Compiler] disable_compile : false
[Compiler] fp16_enable : false
[Compiler] [loadBackend] Successfully loaded 'cpu' - libbackend_cpu.so
so 'got.plt' would have been filled with 'static linked function.' <-0x555559d3a0.
and 'plt' jumped to 'got.plt'
What is the effect of https://github.com/Samsung/ONE/pull/2794/files?
Before <- something corrupted
gdb-peda$ ni
0x000000555559d3b0 865 return _VSTD::__put_character_sequence(__os, __str, _Traits::length(__str));
gdb-peda$ p __os
$3 = (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> > &) @0x7ff633a4c8: {
<std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >> = <invalid address>,
members of std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >:
_vptr$basic_ostream = 0x0
}
gdb-peda$
After <- You can see the complete ostream object.
gdb-peda$ ni
0x000000555559d3b0 864 {
gdb-peda$ p __os
$4 = (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> > &) @0x7ff65756b0: {
<std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >> = {
<std::__ndk1::ios_base> = {
_vptr$ios_base = 0x5555b17360 <vtable for std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >+64>,
static boolalpha = 0x1,
static dec = 0x2,
static fixed = 0x4,
static hex = 0x8,
static internal = 0x10,
static left = 0x20,
static oct = 0x40,
static right = 0x80,
static scientific = 0x100,
static showbase = 0x200,
static showpoint = 0x400,
static showpos = 0x800,
static skipws = 0x1000,
static unitbuf = 0x2000,
static uppercase = 0x4000,
static adjustfield = 0xb0,
static basefield = 0x4a,
static floatfield = 0x104,
static badbit = 0x1,
static eofbit = 0x2,
static failbit = 0x4,
static goodbit = 0x0,
static app = 0x1,
static ate = 0x2,
static binary = 0x4,
static in = 0x8,
static out = 0x10,
static trunc = 0x20,
__fmtflags_ = 0x1002,
__precision_ = 0x6,
__width_ = 0x0,
__rdstate_ = 0x0,
__exceptions_ = 0x0,
__rdbuf_ = 0x7ff6575b58 <std::__ndk1::__cout>,
__loc_ = 0x7ff6576a80 <std::__ndk1::(anonymous namespace)::make<std::__ndk1::locale::__imp, unsigned int>(unsigned int)::buf>,
__fn_ = 0x0,
__index_ = 0x0,
__event_size_ = 0x0,
__event_cap_ = 0x0,
static __xindex_ = {
<std::__ndk1::__atomic_base<int, true>> = {
<std::__ndk1::__atomic_base<int, false>> = {
__a_ = 0x0,
static is_always_lock_free = <optimized out>
}, <No data fields>}, <No data fields>},
__iarray_ = 0x0,
__iarray_size_ = 0x0,
__iarray_cap_ = 0x0,
__parray_ = 0x0,
__parray_size_ = 0x0,
__parray_cap_ = 0x0
},
members of std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >:
__tie_ = 0x0,
__fill_ = 0x20
},
members of std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >:
_vptr$basic_ostream = 0x5555b17338 <vtable for std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >+24>
}
Fixed by #2794.
Most helpful comment
As we discussed personally,
and use
./tools/cross/ndk/r20/ndk/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.soThis is another solution.