Lnd: LND sends an invalid sig on reconnection

Created on 11 Dec 2019  路  13Comments  路  Source: lightningnetwork/lnd

Background

On reconnection, upon LND sends a new signature that doesn't include an htlc that it just acknowledged.

  eclair              lnd
    |                  |
    |                  | <- idle, no pending changes
    |                  |
    |-------add------->|
    |-------sig----->? | <- lnd is unresponsive, eclair disconnects
    |                  |
    |                  |
    |---reestablish--->|
    |<--reestablish----| <- commit number shows that lnd had received the last sig (eclair won't resend it)
    |<------rev--------| <- lnd re-sends the revocation
    |<------fee--------| <- lnd sends a new fee
    |<------sig--------| <- lnd sends a sig that should include add+fee
    |-------err------->| <- eclair rejects, sig includes the latest fee but not the add
    |                  |

Your environment

  • version of lnd: 0.8.1-beta

Logs: see https://github.com/ACINQ/eclair/issues/1234

That kind of issues is pretty annoying because it cause channels to force close. It also takes a lot of time to isolate and debug. Really hope you can fix it rather sooner than later. NB: what happened to the similar https://github.com/lightningnetwork/lnd/issues/3126?

P2 bug interop

Most helpful comment

So we've been able to reproduce this, and seem to have identified the root cause of the issue.

All 13 comments

Hello!

This related issue (https://github.com/ACINQ/eclair/issues/1234#issuecomment-564660054) happened with my lnd server from other side. @btcontract suggested to me write here. If data of channel is there (lnd-05 my node) i can help to get them. I sent logs of LND to @btcontract already (https://github.com/ACINQ/eclair/issues/1234#issuecomment-562555683) but there no channel detailed data. If @Roasbeef (@wpaulino ?) need i am even ready to recompile debug version from your sources for to get channel data.

P.S. When this happened i had 0.8.1-beta. Now i already have LND of 0.8.2-beta-rc2. But i hope it's not more important to help with getting of channel data.

But small noticing:

This channel is already in closedchannels:

        {
            "channel_point": "66c9fd731382c19788f4fc8dccfd5bd2fd42672c19033664033b766617575e68:1",
            "chan_id": "655490349604536321",
            "chain_hash": "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f",
            "closing_tx_hash": "7f5da9e905ef83e6b61afc10f23ea3456717aebb45b1a8a8ed30ad330736aa21",
            "remote_pubkey": "03144fcc73cea41a002b2865f98190ab90e4ff58a2ce24d3870f5079081e42922d",
            "capacity": "16777215",
            "close_height": 606808,
            "settled_balance": "12778187",
            "time_locked_balance": "0",
            "close_type": "REMOTE_FORCE_CLOSE"
        },

If closed channel is already in this list are the data of this channel there yet? 2016 blocks after closing are not elapsed but i thought that monitored channels should be in pendingchannels list. I thought that the closedchannels list is like junk list.

Re #3126, we weren't able to reproduce it, and weren't able to obtain version information for the lnd in that specific instance.

Another one here: https://github.com/ElementsProject/lightning/issues/3341

Don't know if "Zap Technologies" will be more responsive?

Thanks for that second report! That appears to be something slightly different which should've been fixed by https://github.com/lightningnetwork/lnd/pull/2397 (cc @halseth).

We're working on reproducing the first report (given by Pierre above) as we've pinned it to a known version of lnd.

@LNBIG-COM in the comment you link, there are eclair logs. Do you have the lnd logs from before the channel re-establish?

From 2019-12-05 19:03:13 to 2019-12-05 20:18:06 approx.

@LNBIG-COM in the comment you link, there are eclair logs. Do you have the lnd logs from before the channel re-establish?

From 2019-12-05 19:03:13 to 2019-12-05 20:18:06 approx.

Yes, you are right ;-) There are Eclair logs
Here log file reletaed with this channel
This file was created for @btcontract and i sent it. I didn't do a fresh logs. If you will need in fresh logs - i will do but i think time period there is. My zone time is +01 GMT. I don't know time zone of Eclair logs. I think you will guess about time there.

https://drive.google.com/open?id=107VB_haNo4KkO2bbVg-N04QKWjWBd8oB

Thanks. How can you obtain fresh logs, is this something that you can reproduce? Recording the event with logs on debug level debug would be helpful. Logs times seem to match.

Sorry i didn't understand your last message.
I did this file for @btcontract when he asked me

Right now i did at lnd-05:

grep -E 'f6320c98f9016ff224956a74b8916a8b7c9332fc7dab0b9a6ad318bb933aeda9|596165:480:1|685e571766763b03643603192c6742fdd25bfdcc8dfcf48897c1821373fdc967|66c9fd731382c19788f4fc8dccfd5bd2fd42672c19033664033b766617575e68' ~/.lnd/mainnet-lnd-run.log > 1.txt

And got same (byte in byte) same file as above one. I will not attach here same file - you can see logs here:

https://drive.google.com/open?id=107VB_haNo4KkO2bbVg-N04QKWjWBd8oB

I cannot turn on debug mode now because events already happen and i will not get fresh data i think.

I cannot turn on debug mode now because events already happen and i will not get fresh data i think.

Yes, that is what I was asking. I was wondering because you spoke of 'fresh' logs. But indeed, the events already happened.

So we've been able to reproduce this, and seem to have identified the root cause of the issue.

@Roasbeef in lnd slack asked to append these logs. Might be related to this issue.

2 new channels that we created to each other lnd <--> eclair. No pending htlcs seen in "lncli listchannels", one side has zero balance for both channels.

Now that I think of it, the channels were stuck as pending for ~half a day when mempool was empty - I'm pretty sure the funding transaction had many confirmations. I had to restart lnd to get them past pending status.

The logs show endless cycle of connect, tons of channel announcements, disconnect, immediate reconnect. Gists below show a sample loop iteration.

Eclair v0.3.2 sample
https://gist.github.com/vwoo/15855ead742baf3530c5590522435d4e

Of note is "waited for too long for a revocation to remoteCommitNumber=1, disconnecting".

Eclair logs also show essentially the same problem with at least two other nodes (not included in the gist to keep it short).

lnd 0.8.2 sample
https://gist.github.com/vwoo/16c077c52b22dadb17a12ca888b0bd33

@vwoo looking at your logs, this is definitely an unrelated issue. Probably the flood of announcements is blocking or slowing down the sending of the revocation, causing eclair to disconnect after 20s. I know that eclair could do a much better job at rate limiting when sending announcements, but in this instance we are receiving.

It appears your lnd doesn't support feature gossip_queries_ex, causing eclair to ask a huge number of channel_updates which is probably the cause of the flooding:

2019-12-25 01:12:28,707 INFO  f.a.eclair.router.Router  n:02731b798b39a09f9f14e90ee601afb6ebb796d6e5797de14582a978770b33700f - received reply_channel_range with 8000 channels, we're missing 182 channel announcements and 16000 updates, format=UNCOMPRESSED
2019-12-25 01:12:29,096 INFO  f.a.eclair.router.Router  n:02731b798b39a09f9f14e90ee601afb6ebb796d6e5797de14582a978770b33700f - received reply_channel_range with 8000 channels, we're missing 88 channel announcements and 16000 updates, format=UNCOMPRESSED
2019-12-25 01:12:29,312 INFO  f.a.eclair.router.Router  n:02731b798b39a09f9f14e90ee601afb6ebb796d6e5797de14582a978770b33700f - received reply_channel_range with 8000 channels, we're missing 82 channel announcements and 16000 updates, format=UNCOMPRESSED
2019-12-25 01:12:29,682 INFO  f.a.eclair.router.Router  n:02731b798b39a09f9f14e90ee601afb6ebb796d6e5797de14582a978770b33700f - received reply_channel_range with 7201 channels, we're missing 8 channel announcements and 14402 updates, format=UNCOMPRESSED
Was this page helpful?
0 / 5 - 0 ratings