Lightning: Fatal signal 11 received on Mac OS

Created on 29 Jan 2018  ·  22Comments  ·  Source: ElementsProject/lightning

Issue and Steps to Reproduce

I am running Mac OS X 10.13.2. With the new libwally update, I was able to build c-lightning just fine this morning. However, after I got my bitcoind testnet daemon synced and booted lightningd, upon making a command, I received this output in the lightningd log:

lightningd(68725): Connected json input
lightningd(68725): FATAL SIGNAL 11 RECEIVED

getinfo output

getinfo hangs, no output.

crash mac

Most helpful comment

Just to inform you, that finally I succeded running c-ligthning on macOS, creating a node "Folgore ⚡️macOS" having a channel connected to "Eternity Wall" on mainnet:

isghe (compile_macOS_isghe_debug)*$ ./lightning-cli listpeers | jq
{
  "peers": [
    {
      "id": "0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd",
      "connected": true,
      "netaddr": [
        "163.172.139.73:9735"
      ],
      "channels": [
        {
          "state": "CHANNELD_NORMAL",
          "owner": "lightning_channeld",
          "short_channel_id": "508273:180:1",
          "funding_txid": "55a5982344ccb0934989ffd39ec5dd3df529d4a714bf9c0023cabb9b2f4fe903",
          "msatoshi_to_us": 50000000,
          "msatoshi_total": 50000000,
          "dust_limit_satoshis": 546,
          "max_htlc_value_in_flight_msat": 18446744073709552000,
          "channel_reserve_satoshis": 0,
          "htlc_minimum_msat": 0,
          "to_self_delay": 144,
          "max_accepted_htlcs": 483
        }
      ]
    }
  ]
}

In few days, I'll make some tests, clean the code, and I will create a PR :-)

All 22 comments

Can you run with a debugger? gdb lightningd might work, and once it breaks type in bt to see a backtrace. The copy paste it here :-)

@cdecker I'm having the same issues as @bitstein on a Mac. I tried to run debug using lightningd, but for me both the daemon and the client hang upon any request, so I'm not getting a backtrace.

@bitstein , @cdecker @yashbhutwala I'm getting the same problem. I managed to get this with lightning-cli help :

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff73cf041a libsystem_kernel.dylib`read + 10
  * frame #1: 0x0000000104730b1a lightningd`check_with_child(batch=0x00007ffeeb540608, p=0x00000001046be000, size=8, is_write=false) at ptr_valid.c:258
    frame #2: 0x000000010473087b lightningd`ptr_valid_batch(batch=0x00007ffeeb540608, p=0x00000001046be000, alignment=1, size=8, write=false) at ptr_valid.c:289
    frame #3: 0x000000010471cf8f lightningd`autodata_make_table(example=0x00000001047d0d70, name="json_command", nump=0x00000001047d1b20) at autodata.c:33
    frame #4: 0x00000001046caedf lightningd`get_cmdlist at jsonrpc.c:281
    frame #5: 0x00000001046cc335 lightningd`find_cmd(buffer="{ \"method\" : \"help\", \"id\" : \"lightning-cli-68421\", \"params\" : [ ] }", tok=0x00007fc51d009978) at jsonrpc.c:312
    frame #6: 0x00000001046cc0de lightningd`parse_request(jcon=0x00007fc51d000760, tok=0x00007fc51d009950) at jsonrpc.c:574
    frame #7: 0x00000001046cbb3c lightningd`read_json(conn=0x00007fc51d0084c0, jcon=0x00007fc51d000760) at jsonrpc.c:667
    frame #8: 0x0000000104726e83 lightningd`next_plan(conn=0x00007fc51d0084c0, plan=0x00007fc51d0084f0) at io.c:59
    frame #9: 0x0000000104727b6c lightningd`do_plan(conn=0x00007fc51d0084c0, plan=0x00007fc51d0084f0, idle_on_epipe=false) at io.c:387
    frame #10: 0x00000001047279c4 lightningd`io_ready(conn=0x00007fc51d0084c0, pollflags=1) at io.c:397
    frame #11: 0x0000000104729397 lightningd`io_loop(timers=0x00007fc51bc028d0, expired=0x00007ffeeb540a00) at poll.c:305
    frame #12: 0x00000001046ccc56 lightningd`main(argc=4, argv=0x00007ffeeb540a68) at lightningd.c:347
    frame #13: 0x00007fff73b9f115 libdyld.dylib`start + 1
    frame #14: 0x00007fff73b9f115 libdyld.dylib`start + 1

Is it possible to run a lightning node in a container but bitcoind on the host? I've tried but to no avail.

@yashbhutwala yes, you just need to make sure that bitcoin-cli from the docker container can talk to bitcoind on the host. Something like the following should do it:

docker run -v $HOME/.bitcoin:/root/.bitcoin --net=host docker/image

That'll mount the hosts .bitcoin directory in the docker image so it can read the RPC config and it'll allow bitcoin-cli to connect to bitcoind over the loopback interface.

@cdecker I've already tried that but container is still not able to talk to the bitcoind server on my machine. --net=host (docker host) seems to have a different ip than my machine.

image

Upon some research, it turns out Docker on Mac works such that Docker host is not the same as localhost, there is no docker0 bridge on macOS.

image

I get the same reader FATAL SIGNAL 11 RECEIVED on MacOS 10.13.3

gdb --args lightningd/lightningd --network=testnet --port=9000 --log-level=debug
start
During startup program terminated with signal ?, Unknown signal.

I'm not familiar with gdb. How do I make it spit out more useful information?

Note that lightningd/lightningd --help doesn't crash.

Also, if bitcoind isn't running, I get FATAL SIGNAL 6 RECEIVED and the lightningd process stays active. I then have to kill it with killall -9 lightningd as cli/lightning-cli stop will complain Connecting to 'lightning-rpc': No such file or directory.

@benharold have you encountered this error during your work on Voltage? Or do you always run c-lightning on a remote linux server?

@Sjors I have always run on a remote linux server.

I get similar backtraces in autodata_make_tablerun_child on FreeBSD, though surprisingly it seems the process keeps running, and working. It does create core files every time though (maybe no more after #922).

While running getinfo, in lightningd:

#5  <signal handler called>
#6  run_child (infd=17, outfd=20) at ccan/ccan/ptr_valid/ptr_valid.c:190
#7  0x00000000004760fc in create_child (batch=0x7fffffffe618) at ccan/ccan/ptr_valid/ptr_valid.c:216
#8  0x0000000000475ce5 in check_with_child (batch=0x7fffffffe618, p=0x733000, size=8, is_write=false) at ccan/ccan/ptr_valid/ptr_valid.c:245
#9  0x0000000000475afb in ptr_valid_batch (batch=0x7fffffffe618, p=0x733000, alignment=1, size=8, write=false) at ccan/ccan/ptr_valid/ptr_valid.c:289
#10 0x000000000046283b in autodata_make_table (example=0x733950, name=0x4f6a6a "json_command", nump=0x735830) at ccan/ccan/autodata/autodata.c:38
#11 0x000000000040e949 in get_cmdlist () at lightningd/jsonrpc.c:286
#12 0x000000000040fdd5 in find_cmd (buffer=0x801a367a0 "{ \"method\" : \"getinfo\", \"id\" : \"lightning-cli-82041\", \"params\" : [ ] }\034", tok=0x801d408c8) at lightningd/jsonrpc.c:346
#13 0x000000000040fbbb in parse_request (jcon=0x801b7d060, tok=0x801d408a0) at lightningd/jsonrpc.c:608
#14 0x000000000040f5ef in read_json (conn=0x801d40600, jcon=0x801b7d060) at lightningd/jsonrpc.c:794
#15 0x000000000046c576 in next_plan (conn=0x801d40600, plan=0x801d40630) at ccan/ccan/io/io.c:59
#16 0x000000000046d219 in do_plan (conn=0x801d40600, plan=0x801d40630, idle_on_epipe=false) at ccan/ccan/io/io.c:387
#17 0x000000000046d084 in io_ready (conn=0x801d40600, pollflags=1) at ccan/ccan/io/io.c:397
#18 0x000000000046ea3b in io_loop (timers=0x801a20108, expired=0x7fffffffe9d0) at ccan/ccan/io/poll.c:305
#19 0x0000000000410717 in main (argc=5, argv=0x7fffffffea78) at lightningd/lightningd.c:35

While running connect, in gossipd:

#5  <signal handler called>
#6  run_child (infd=7, outfd=10) at ccan/ccan/ptr_valid/ptr_valid.c:190
#7  0x000000000043da0c in create_child (batch=0x7fffffffe7a8) at ccan/ccan/ptr_valid/ptr_valid.c:216
#8  0x000000000043d5f5 in check_with_child (batch=0x7fffffffe7a8, p=0x6dd000, size=8, is_write=false) at ccan/ccan/ptr_valid/ptr_valid.c:245
#9  0x000000000043d40b in ptr_valid_batch (batch=0x7fffffffe7a8, p=0x6dd000, alignment=1, size=8, write=false) at ccan/ccan/ptr_valid/ptr_valid.c:289
#10 0x000000000042a14b in autodata_make_table (example=0x6dd6c0, name=0x4b8e01 "type_to_string", nump=0x6decc8) at ccan/ccan/autodata/autodata.c:38
#11 0x0000000000416c01 in type_to_string_ (ctx=0x801a15120, typename=0x4b56ad "struct pubkey", u={}) at common/type_to_string.c:24
#12 0x00000000004069da in connection_out (conn=0x801a1c1e0, reach=0x801be6020) at gossipd/gossip.c:1565
#13 0x0000000000433e86 in next_plan (conn=0x801a1c1e0, plan=0x801a1c240) at ccan/ccan/io/io.c:59
#14 0x0000000000434b29 in do_plan (conn=0x801a1c1e0, plan=0x801a1c240, idle_on_epipe=false) at ccan/ccan/io/io.c:387
#15 0x0000000000434a03 in io_ready (conn=0x801a1c1e0, pollflags=4) at ccan/ccan/io/io.c:403
#16 0x000000000043634b in io_loop (timers=0x801bc60d0, expired=0x7fffffffea08) at ccan/ccan/io/poll.c:305
#17 0x0000000000402c21 in main (argc=1, argv=0x7fffffffeaa8) at gossipd/gossip.c:2029

I'll try to debug, but I couldn't figure out quickly what this code was supposed to do in the first place.

Possibly thanks to #922 I'm now getting a more useful crash log:

lightningd/lightningd --network=testnet --port=10000
2018-02-06T13:20:45.309Z lightningd(71192): Creating database
2018-02-06T13:20:45.449Z lightningd(71192): Server started with public key 027085167b840886f5239e10e8082c944d7b611f329a42b375ee472afe5938f0eb, alias YELLOWWHISPER (color #027085) and lightningd v0.5.2-2016-11-21-1859-gb3534462-dirty
2018-02-06T13:21:15.098Z lightningd(71192): FATAL SIGNAL 11 RECEIVED
2018-02-06T13:21:15.099Z lightningd(71192): error getting backtrace: lightningd/lightningd (2)
2018-02-06T13:21:15.099Z lightningd(71192): error getting backtrace: failed to read executable information (-1)
...
Log dumped in crash.log
2018-02-06T13:21:15.105Z lightningd(71192): FATAL SIGNAL 10 RECEIVED
2018-02-06T13:21:15.106Z lightningd(71192): error getting backtrace: lightningd/lightningd (2)
...
Log dumped in crash.log

That FATAL SIGNAL 11 error happened when I did:

cli/lightning-cli help

It did however return the help text. Not sure why it threw that error twice.

crash.log

In fact, it just keeps going. I was able to generate an address and it detects those funds.

I did get a Fatal signal 11 when I tried to connect to another node, unfortunately nothing in crash.log this time.

2018-02-06T13:31:47.985Z lightning_gossipd(72181): TRACE: req: type WIRE_GOSSIPCTL_PEER_ADDRHINT len 42
2018-02-06T13:31:47.985Z lightning_gossipd(72181): TRACE: req: type WIRE_GOSSIPCTL_REACH_PEER len 35
lightning_gossipd: Fatal signal 11
0x10888a8f9 ???
    ???:0
0x7fff79b39f59 ???
    ???:0
0x1088b20e1 ???
    ???:0
0x1088b1f8b ???
    ???:0
0x1088b1b64 ???
    ???:0
0x1088b197a ???
    ???:0
0x10889e11e ???
    ???:0
0x10888ab46 ???
    ???:0
0x10887a598 ???
    ???:0
0x1088a8012 ???
    ???:0
0x1088a8cfb ???
    ???:0
0x1088a8bc2 ???
    ???:0
0x1088aa526 ???
    ???:0
0x108876524 ???
    ???:0
2018-02-06T13:31:48.205Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x10888a91c
2018-02-06T13:31:48.205Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x7fff79b39f59
2018-02-06T13:31:48.205Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088b20e1
2018-02-06T13:31:48.205Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088b1f8b
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088b1b64
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088b197a
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x10889e11e
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x10888ab46
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x10887a598
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088a8012
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088a8cfb
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088a8bc2
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x1088aa526
2018-02-06T13:31:48.206Z lightning_gossipd(72181): TRACE: backtrace: (null):0 ((null)) 0x108876524
2018-02-06T13:31:48.206Z lightning_gossipd(72181): STATUS_FAIL_INTERNAL_ERROR: FATAL SIGNAL 11
lightning_gossipd: Fatal signal 10
0x10888a8f9 ???
    ???:0
0x7fff79b39f59 ???
    ???:0
0x1088b20e1 ???
    ???:0
0x1088b1f8b ???
    ???:0
0x1088b1b64 ???
    ???:0
0x1088b197a ???
    ???:0
0x10889e197 ???
    ???:0
0x10888ab46 ???
    ???:0
0x10887a598 ???
    ???:0
0x1088a8012 ???
    ???:0
0x1088a8cfb ???
    ???:0
0x1088a8bc2 ???
    ???:0
0x1088aa526 ???
    ???:0
0x108876524 ???
    ???:0
lightningd: lightning_gossipd failed (exit status 2), exiting.

Just to inform you, that finally I succeded running c-ligthning on macOS, creating a node "Folgore ⚡️macOS" having a channel connected to "Eternity Wall" on mainnet:

isghe (compile_macOS_isghe_debug)*$ ./lightning-cli listpeers | jq
{
  "peers": [
    {
      "id": "0271d98f7dfe66198f045690f552a9126abb0aa1585f0061854e780b7e08e6dccd",
      "connected": true,
      "netaddr": [
        "163.172.139.73:9735"
      ],
      "channels": [
        {
          "state": "CHANNELD_NORMAL",
          "owner": "lightning_channeld",
          "short_channel_id": "508273:180:1",
          "funding_txid": "55a5982344ccb0934989ffd39ec5dd3df529d4a714bf9c0023cabb9b2f4fe903",
          "msatoshi_to_us": 50000000,
          "msatoshi_total": 50000000,
          "dust_limit_satoshis": 546,
          "max_htlc_value_in_flight_msat": 18446744073709552000,
          "channel_reserve_satoshis": 0,
          "htlc_minimum_msat": 0,
          "to_self_delay": 144,
          "max_accepted_htlcs": 483
        }
      ]
    }
  ]
}

In few days, I'll make some tests, clean the code, and I will create a PR :-)

I've looked somewhat closer at the ptr_valid code (which I found to cause crashes in my post above) and I found the reason. Apparently, it is using a child process to probe memory addresses for validity (whether they can be read/written). A crash in the child process is thus not an error, but an expected outcome. I am confused as to why it needs to do this, though.

In Linux it will parse /proc/self/maps instead to achieve this, but this is not available on most UNIXes, including FreeBSD and MacOSX.

Edit: autodata_make_table makes use of this to search (part of) the virtual memory space to find table record tags. This is used to build the table of JSON commands, among other things. It seems a somewhat heavy-handed way to define tables, if you ask me, as if the program is reverse-engineering itself.

Yep, autodata has been causing a few issues as of late, so maybe we'll just rip it out completely. Unless you have a fix that we can upstream @laanwj

@isghe any progress on your PR?

Hi @Sjors
my workaround is embarrassing, but for what I see, it looks, it is working: ignoring the crash, commenting the line, in the function crashdump:

    // status_failed(STATUS_FAIL_INTERNAL_ERROR, "FATAL SIGNAL %u", sig);

obviously, that is not a solution, but can make you running and debugging on macOS.

I'm not a fan of ignoring crashes on mainnet, but I suppose I can spin up a testnet node on macOS.

Yeah, definitely don't just ignore crashes, you might end up corrupting some persistent state and then you'll definitely lose your funds.

SIGSEGV and SIGBUS signal happens whenever jsonrpc is called on mac. Strange thing is those signals do not happen if we do not register handlers with sigaction(). Any idea?

I think we can work around this issue by deactivating crashlog. backtrace does not support OSX ,which means this crashlog is useless on mac.

Was this page helpful?
0 / 5 - 0 ratings