One: [onert] Running nnpackage_run with verbose log produces segfault on android

Created on 1 Jul 2020  路  7Comments  路  Source: Samsung/ONE

I recently tried to run nnpackage_run on android with verbose log for debugging.
Running nnpackage_run with ONERT_LOG_ENABLE=1 produces segmentation fault for every model.

Execution result

$ ONERT_LOG_ENABLE=1 BACKENDS=cpu ./nnfw_alpha/nnpackage_run --nnpackage mobilenet_v2_1.0_224/                                                                                            
[Compiler] [Compiler] ==== Compiler Options ====
[Compiler] backend_list             : cpu
[Compiler] trace_filepath           : 
[Compiler] graph_dump_level         : 0
[Compiler] op_seq_max_node          : 0
[Compiler] executor                 : Linear
[Compiler] manual_scheduler_options : (Too many things to print)
[Compiler] he_scheduler             : false
[Compiler] he_profiling_mode        : false
[Compiler] disable_compile          : false
[Compiler] fp16_enable              : false
[Compiler] [loadBackend] Successfully loaded 'cpu' - libbackend_cpu.so
Segmentation fault (core dumped) 
  • Segmentation fault is occured

Generated crash dump file

$ adb shell cat /data/tombstones/tombstone_13 | ./ndk-stack -sym ~/ONE/Product/out/lib
********** Crash dump: **********
Build fingerprint: 'samsung/y2qsqw/y2q:10/QP1A.190711.020/G986USQE1ATDA:eng/test-keys'
#00 0x0000000000056d54 /data/local/tmp/nnfw_alpha/nnpackage_run (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char>>::sentry::sentry(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char>>&)+60) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
                                                                 std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >::sentry::sentry(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&)
                                                                 /ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:263:9
#01 0x0000000000056b20 /data/local/tmp/nnfw_alpha/nnpackage_run (_ZNSt6__ndk124__put_character_sequenceIcNS_11char_traitsIcEEEERNS_13basic_ostreamIT_T0_EES7_PKS4_m+44) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
                                                                 std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::__put_character_sequence<char, std::__ndk1::char_traits<char> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*, unsigned long)
                                                                 /ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:722:57
#02 0x000000000004828c /data/local/tmp/nnfw_alpha/nnpackage_run (_ZNSt6__ndk1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc+68) (BuildId: 53cd438c0bccf6ad550e4c14532de31724e1d0c9)
                                                                 std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::operator<<<std::__ndk1::char_traits<char> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*)
                                                                 /ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:865:12
#03 0x00000000002b0210 /data/local/tmp/nnfw_alpha/lib/libbackend_cpu.so (onert_backend_create+108) (BuildId: 9f1ba80af139a0695d7480d5f8eba6e322a4296b)
                                                                         onert_backend_create
                                                                         /home/jhyeo/ONE_alpha/runtime/onert/backend/cpu/cpu.cc:24:3
#04 0x0000000000482de4 /data/local/tmp/nnfw_alpha/lib/libonert_core.so (onert::compiler::BackendManager::loadBackend(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>> const&)+916) (BuildId: 020e9a745fe2a9ba7c018d3e5bae044ae92e4ea0)
                                                                        onert::compiler::BackendManager::loadBackend(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)
                                                                        /home/jhyeo/ONE_alpha/runtime/onert/core/src/compiler/BackendManager.cc:108:64
...
Crash dump is completed

https://github.com/Samsung/ONE/blob/7d0d53f9b808cc6a41aa5e3c0f0a3a82794ef4c6/runtime/onert/backend/cpu/cpu.cc#L24

nnpackage_run crashes at here. But it is just a simple std::cout function

@Samsung/nnfw Does anyone knows about this error?

bug

Most helpful comment

As we discussed personally,

# infra/nnfw/cmake/buildtool/cross/toolchain_aarch64-android.cmake
...
+set(ANDROID_STL c++_shared)

and use ./tools/cross/ndk/r20/ndk/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.so

This is another solution.

All 7 comments

I found that linker send signal

0x0000007ff7f349dc in __dl__ZL24debuggerd_signal_handleriP7siginfoPv () from target:/system/bin/linker64                                                       

(gdb) up
#1  <signal handler called>
(gdb) up
#2  0x00000055555abf58 in std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >::sentry::sentry (this=0x7fffffc018, __os=...)
    at /home/test5/Desktop/test.nnpk/ONE/tools/cross/ndk/r20/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/ostream:263
263         if (__os.good())

I plan to find more in the code linking libc++_shared.so.

As a temporary solution

ONERT_LOG_ENABLE=1 BACKENDS=cpu LD_PRELOAD=/data/local/tmp/test5/lib/libc++_shared.so ./nnpackage_run --nnpackage mobilenet_v2/

When the process was running, I printed the /proc/{PID} and found that libc++_shared.so was not loaded.

@dhdh-oh Thanks! It works like a charm.

As we discussed personally,

# infra/nnfw/cmake/buildtool/cross/toolchain_aarch64-android.cmake
...
+set(ANDROID_STL c++_shared)

and use ./tools/cross/ndk/r20/ndk/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.so

This is another solution.

In android guide, It is recommended to use only one stl.
https://developer.android.com/ndk/guides/cpp-support#one_stl_per_app

Warning: The linker can catch some of these issues at build time, but many of these issues will only manifest as a crash or odd behavior at run time.


Below, I share my debugging experience in terms of information sharing.
I used this version : https://github.com/Samsung/ONE/pull/2794/files

/ONE/runtime/onert/backend/cpu/cpu.cc:24 is main point

(gdb) b /home/test5/Desktop/test.nnpk/ONE/runtime/onert/backend/cpu/cpu.cc:24

Thread 1 "nnpackage_run" hit Breakpoint 1, onert_backend_create ()
    at /home/test5/Desktop/test.nnpk/ONE/runtime/onert/backend/cpu/cpu.cc:24
24        VERBOSE(onert_backend_create) << "'cpu' loaded\n";

'0x0000007ff1679fec' comes out a little later.
we can see jump to 0x7ff1626730. it is plt(code). simply 'plt' = {Here is the code to jump to 'got.plt'}
'got.plt' = {address of function. like cache}

(gdb) disassemble
...
(gdb) b* 0x0000007ff1679fec

0x0000007ff1679fec <+108>:   bl      0x7ff1626730 <std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >& std::__ndk1::operator<< <std::__ndk1::char_traits<ch
ar> >(std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >&, char const*)@plt>

# will jump to 0x7ff1626730

In 0x7ff1626730, there is a code that loads values (addresses) into registers x17 and x16.
x17 contains the address of the code associated with cout.

gdb-peda$ disass
Dump of assembler code for function _ZNSt6__ndk1lsINS_11char_traitsIcEEEERNS_13basic_ostreamIcT_EES6_PKc@plt:
=> 0x0000007ff1626730 <+0>:     adrp    x16, 0x7ff1ac4000 <Eigen::TensorEvaluator<Eigen::TensorBroadcastingOp<std::__ndk1::array<long, 5ul> const, Eigen::TensorMap<Eigen::Tensor<unsigned int const, 5, 1, long>, 16, Eigen::MakePointer> const> const, Eigen::ThreadPoolDevice>::evalSubExprsIfNeeded(unsigned int const*)@got.plt>
   0x0000007ff1626734 <+4>:     ldr     x17, [x16, #360]
   0x0000007ff1626738 <+8>:     add     x16, x16, #0x168
   0x0000007ff162673c <+12>:    br      x17

gdb-peda$ i r x17 x16
x17            0x555559d3a0        0x555559d3a0
x16            0x7ff1ac4168        0x7ff1ac4168

where is 0x555559d3a0 ?

1|y2q:/ # cat /proc/8687/maps                                                                                                                                                
5555555000-5555aff000 r-xp 00000000 103:24 5395                          /data/local/tmp/cmake_patch/nnpackage_run                                                           
5555b0e000-5555b1b000 r--p 005a9000 103:24 5395                          /data/local/tmp/cmake_patch/nnpackage_run                                                           
5555b1b000-5555b1f000 rw-p 005b6000 103:24 5395                          /data/local/tmp/cmake_patch/nnpackage_run

it is code area of nnpackage_run!! (maybe llvm/libc++ is static linked)

libbackend_cpu.so was intended to use libc++_shared.so, but it is still using static linked functions.
https://github.com/Samsung/ONE/pull/2794/files
let's see. before libbackend_cpu.so is loaded. There have been many cout related function calls.
"[Compiler] [Compiler] ==== Compiler Options ====" <------ other logs.

Process ./nnpackage_run created; pid = 8687
gdbserver: Unable to determine the number of hardware watchpoints available.
gdbserver: Unable to determine the number of hardware breakpoints available.
Listening on port 7878
Remote debugging from host 127.0.0.1
[Compiler] [Compiler] ==== Compiler Options ====
[Compiler] backend_list             : cpu
[Compiler] trace_filepath           :
[Compiler] graph_dump_level         : 0
[Compiler] op_seq_max_node          : 0
[Compiler] executor                 : Linear
[Compiler] manual_scheduler_options : (Too many things to print)
[Compiler] he_scheduler             : false
[Compiler] he_profiling_mode        : false
[Compiler] disable_compile          : false
[Compiler] fp16_enable              : false
[Compiler] [loadBackend] Successfully loaded 'cpu' - libbackend_cpu.so

so 'got.plt' would have been filled with 'static linked function.' <-0x555559d3a0.
and 'plt' jumped to 'got.plt'

What is the effect of https://github.com/Samsung/ONE/pull/2794/files?

Before <- something corrupted

gdb-peda$ ni
0x000000555559d3b0      865         return _VSTD::__put_character_sequence(__os, __str, _Traits::length(__str));
gdb-peda$ p __os
$3 = (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> > &) @0x7ff633a4c8: {
  <std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >> = <invalid address>, 
  members of std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >: 
  _vptr$basic_ostream = 0x0
}
gdb-peda$

After <- You can see the complete ostream object.

gdb-peda$ ni
0x000000555559d3b0      864     {
gdb-peda$ p __os
$4 = (std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> > &) @0x7ff65756b0: {
  <std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >> = {
    <std::__ndk1::ios_base> = {
      _vptr$ios_base = 0x5555b17360 <vtable for std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >+64>,
      static boolalpha = 0x1,
      static dec = 0x2,
      static fixed = 0x4,
      static hex = 0x8,
      static internal = 0x10,
      static left = 0x20,
      static oct = 0x40,
      static right = 0x80,
      static scientific = 0x100,
      static showbase = 0x200,
      static showpoint = 0x400,
      static showpos = 0x800,
      static skipws = 0x1000,
      static unitbuf = 0x2000,
      static uppercase = 0x4000,
      static adjustfield = 0xb0,
      static basefield = 0x4a,
      static floatfield = 0x104,
      static badbit = 0x1,
      static eofbit = 0x2,
      static failbit = 0x4,
      static goodbit = 0x0,
      static app = 0x1,
      static ate = 0x2,
      static binary = 0x4,
      static in = 0x8,
      static out = 0x10,
      static trunc = 0x20,
      __fmtflags_ = 0x1002,
      __precision_ = 0x6,
      __width_ = 0x0,
      __rdstate_ = 0x0,
      __exceptions_ = 0x0,
      __rdbuf_ = 0x7ff6575b58 <std::__ndk1::__cout>,
      __loc_ = 0x7ff6576a80 <std::__ndk1::(anonymous namespace)::make<std::__ndk1::locale::__imp, unsigned int>(unsigned int)::buf>,
      __fn_ = 0x0,
      __index_ = 0x0,
      __event_size_ = 0x0,
      __event_cap_ = 0x0,
      static __xindex_ = {
        <std::__ndk1::__atomic_base<int, true>> = {
          <std::__ndk1::__atomic_base<int, false>> = {
            __a_ = 0x0,
            static is_always_lock_free = <optimized out>
          }, <No data fields>}, <No data fields>},
      __iarray_ = 0x0,
      __iarray_size_ = 0x0,
      __iarray_cap_ = 0x0,
      __parray_ = 0x0,
      __parray_size_ = 0x0,
      __parray_cap_ = 0x0
    },
    members of std::__ndk1::basic_ios<char, std::__ndk1::char_traits<char> >:
    __tie_ = 0x0,
    __fill_ = 0x20
  },
  members of std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >:
  _vptr$basic_ostream = 0x5555b17338 <vtable for std::__ndk1::basic_ostream<char, std::__ndk1::char_traits<char> >+24>
}

Fixed by #2794.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

periannath picture periannath  路  3Comments

binarman picture binarman  路  3Comments

YongseopKim picture YongseopKim  路  3Comments

dr-venkman picture dr-venkman  路  4Comments

mhs4670go picture mhs4670go  路  3Comments