Lightning: Bad commit_sig signature

Created on 22 Oct 2020  路  8Comments  路  Source: ElementsProject/lightning

az0re on IRC just reported it. Unfortunately Github won't let them get here because they use Tor.
Told them to send the (private) signature, transaction and feerate by mail.

lnd-compat

All 8 comments

Very interesting indeed, looking at the transaction whose signature failed it seems like the transaction that ultimately made it on-chain looks identical to the one which failed the signature. The fact that the unilateral close was ours suggests that the two consecutive states look identical, narrowing this down to a feerate being the cause for the new commit. However we also add and remove an HTLC that are identical except the payment_hash:

       "htlcs": [
      {
         "direction": "out",
         "id": 95,
         "msatoshi": 125000125,
         "amount_msat": "125000125msat",
         "expiry": 653923,
         "payment_hash": "hashA",
         "state": "SENT_ADD_ACK_REVOCATION"
      },
      {
         "direction": "out",
         "id": 96,
         "msatoshi": 125000125,
         "amount_msat": "125000125msat",
         "expiry": 653923,
         "payment_hash": "hashB",
         "state": "RCVD_REMOVE_HTLC"
      }
       ]

So it could be that the unilateral close was really the one with hashB, and the new (failed commitment) state was the one with hashA. The counterparty being an lnd node, which we had synchronization issues in the past with rapid successions of HTLCs suggests that we might have de-synched inbetween the two commits.

This happened again with a different peer, this time with an empty htlcs field, using git cd7d5cdff9e5efc0dcfb5fdc91e8c80a11daebed. I will send another email with the listpeers object and relevant log excerpt.

This happened yet again, using git bbfcae652cb71d937b7f6dfa5122b16984b028df, this time with totally full HTLC slots. All but two of those HTLCs were incoming, in state RCVD_REMOVE_REVOCATION and the two outgoing HTLCs were in state SENT_ADD_ACK_REVOCATION. This one might be a simple race issue, where both sides see max - 1 HTLCs and so simultaneously think it's OK to add one more HTLC. Then maybe they both issue an HTLC and then are unable to deal with the resulting channel state with more than the max number of allowed in-flight HTLCs. Being a bit flexible with the maximum number of allowed in-flight HTLCs might help.

This bug has cost me a ton of money, and fixing it is clearly not a priority. Despite my significant investment in C-Lightning infrastructure, I have started planning to migrate to an LND node. I am not willing to keep burning satoshis like this. I will also warn other people not to use C-Lightning on mainnet until this bug is fixed so they don't get burned, either.

OK, the good news is I've caught one of these in my logs, too! Finally...

@cdecker's idea that it's around feerate changes seems to be borne out here. Seems like we were updating fees, both sides were on different levels, and then when we tried updating fees again (while they were still in flux), something b0rked.

There's a simple workaround, however: we can avoid changing fees again until they're completely quiescent: we know the fee logic works in the simple cases, otherwise we wouldn't keep channels open at all.

That means even if LND's or our fee logic doesn't work in complex cases (and I'm still tracking this down to try to get the exact cause here), we won't trigger it in the meantime.

This bug has cost me a ton of money, and fixing it is clearly not a priority. Despite my significant investment in C-Lightning infrastructure, I have started planning to migrate to an LND node. I am not willing to keep burning satoshis like this. I will also warn other people not to use C-Lightning on mainnet until this bug is fixed so they don't get burned, either.

I totally understand! It's a nasty bug, and because it wasn't happening all the time, it didn't hit the top of our TODO list. However, now I've got some more clues I am pretty sure I can work around it (for existing peers), and make sure that we're actually doing the right thing (fixing LND or c-lightning, whichever is wrong).

There's a simple workaround, however: we can avoid changing fees again until they're completely quiescent: we know the fee logic works in the simple cases, otherwise we wouldn't keep channels open at all.

What are you referring to when you say quiescent? Do you mean no feerate update in flight, no HTLCs in flight or no commitment? I noticed that reports often share a couple of HTLCs and some are even close to the default 30 concurrent HTLCs. Some of these nodes seem incredibly busy, so waiting for there to be no HTLCs might starve our fee adjustment. Would remembering the last couple of feerates and just trying them out work?

Was planning on "no feerates in flight". In my logs, there are multiple fee changes going on at the same time...

I totally understand! It's a nasty bug, and because it wasn't happening all the time, it didn't hit the top of our TODO list. However, now I've got some more clues I am pretty sure I can work around it (for existing peers), and make sure that we're actually doing the right thing (fixing LND or c-lightning, whichever is wrong).

Super glad this might be resolved soon. I really don't want to migrate to LND, so if I stop bleeding satoshis then I am absolutely going to stay on C-Lightning. Thanks for looking at this!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SPIRY-RO picture SPIRY-RO  路  4Comments

agilob picture agilob  路  4Comments

brunoaduarte picture brunoaduarte  路  5Comments

rustyrussell picture rustyrussell  路  4Comments

brunoaduarte picture brunoaduarte  路  5Comments