Lightning: Rusty's mainnet node is offline

Created on 3 Nov 2020  路  8Comments  路  Source: ElementsProject/lightning

Issue and Steps to Reproduce

Connecting to it will raise an onion:9735: opening -1 socket gave Protocol not supported error. Likely this is because I don't have TOR running, but in this case the error message can be optimized. The node seems offline for a couple of days now.

[btc@schmoock ~]$ lightning-cli connect 024...
{
   "code": 401,
   "message": "IPV4_SKIPPED:9735: Connection establishment: Connection refused. [IPV6_SKIPPED]:9735: Connection establishment: Connection refused. TOR_SKIPPED.onion:9735: opening -1 socket gave Protocol not supported. TOR_SKIPPED.onion:9735: opening -1 socket gave Protocol not supported. "
}

Most helpful comment

Well, Rusty just today took CLBOSS off valgrind, and CLBOSS finally completed the autopilot algorithms that was hogging all the CPU and is relatively low-CPU now. So it should be a little more responsive. Maybe. lightningd remains inside valgrind though.

All 8 comments

@rustyrussell you might want to know :D

Yes, I somehow broke it by running clboss on it, for some reason.

Now it came online, offline, online, ... again and then forced closed a channel on me!

Received error from peer: channel SKIPPED: Fulfilled HTLC 375 SENT_REMOVE_HTLC cltv 655636 hit deadline

Yes, rusty and I have been playing around with it and restarting it quite a bit recently, including due to #4812 / #4813. Running in valgrind with CLBOSS using unoptimized autopilot algorithms also tends to really really REALLY mean a fairly bad responsiveness with that node too LOL.

"hit deadline" means your node did not fail the HTLC, which might be due to the HTLC being your incoming into a forward, and your outgoing channel is closed but your node has not gotten its timelock branch confirmed, which might prevent your node from releasing its own incoming (which the rusty node is now claiming as the reason to drop the channel with you). Not sure however, @rustyrussell knows the HTLC machine better,

I think most problems with his node comes from performance issues because of the valgrind.
Just rechecked, connected ... took already some seconds. Then I did ping and this alone took way too much time.
For now I can't recommend opening channels to this node as they are likely to be dropped eventually :/

[btc@schmoock ~]$ lightning-cli connect 024...
{
   "id": "024...",
   "features": "800000000000000000002aaaa2"
}
[btc@schmoock ~]$ lightning-cli ping 024...
# ... this took like a minute or so ...
{
   "totlen": 132
}

Well, Rusty just today took CLBOSS off valgrind, and CLBOSS finally completed the autopilot algorithms that was hogging all the CPU and is relatively low-CPU now. So it should be a little more responsive. Maybe. lightningd remains inside valgrind though.

The rusty node is down again due to some errors that do not seem related to CLBOSS (I disabled CLBOSS and confirmed the node still crashes). It scrashes at about ~3hours or so after I restart it, with some failure somewhere over in channeld. Sorry, will investigate the crash tomorrow, I suppose rusty is asleep now too.

Valgrind is reporting an out-of-memory condition in bcli plugin. It seems fairly consistent, the bcli inside Valgrind dies because Valgrind runs out of memory. Not sure if it is because we have a leak in bcli or if there is some limitation/bug in Valgrind. I tried setting a gdb up attached to the bcli but gdb does not catch any signals or segfaults, so it seems a Valgrind-side problem (which could be due to using too much memory in bcli).

I tried reviewing the past 4 commits modifying plugins/bcli.c but nothing jumps out to me as a potential leak. The closest is maybe the getblock change where we remove one unnecessary copy and just reuse the buffer, but my initial reading is that we properly move ownership of the buffer to an object that gets destroyed when the command completes, so it does not look like the culprit. Could also be something introduced in common/ code, which is harder to audit (because there is more of it).

One less CLBOSS user noooooo how can I take over the world now???

Was this page helpful?
0 / 5 - 0 ratings