Lnd: When a funding TX takes a long time to confirm, channels get stuck in syncing

Created on 10 Jun 2020  路  20Comments  路  Source: lightningnetwork/lnd

I funded new channels with 1 sat/byte and as expected they took nearly a day to clear. A peer using Eclair has informed me that once cleared the channel was stuck in SYNCING. He had to manually disconnect to get the channel unstuck.

This open issue in ECLAIR repo mentions that it could be related ChannelReestablish. Because the peer already fixed this, I'm uncertain if restarting my LND node would have forced a reestablish, but a restart shouldn't be required in any case.

https://github.com/ACINQ/eclair/issues/1444

channels interop p2p

Most helpful comment

AFAIK the connection would never have been broken - the node was running all day while the mempool was clearing. I'd see if I could get logs for you but I literally just migrated to LND yesterday and still don't quite know what I'm doing.

All 20 comments

@STAWKEYE thanks for the report! Off the top of your head, do you know if the connection was ever broken between the initial funding and the when the transaction confirmed? If the connection remained open, then we wouldn't expect a ChannelReestablish at all. If it was broken, then perhaps there is a deeper issue we can look into around sending ChannelReestablish. I'll keep an eye that issue and hopefully glean some info from the logs.

Because the peer already fixed this, I'm uncertain if restarting my LND node would have forced a reestablish, but a restart shouldn't be required in any case.

Disconnecting and reconnecting is all that should be required to trigger ChannelReestablish, a restart is unnecessary but it would have worked.

AFAIK the connection would never have been broken - the node was running all day while the mempool was clearing. I'd see if I could get logs for you but I literally just migrated to LND yesterday and still don't quite know what I'm doing.

Thanks @STAWKEYE, that gives us a good starting point.

. I'd see if I could get logs for you but I literally just migrated to LND yesterday and still don't quite know what I'm doing.

No problem 馃槀 I can help you get logs if we need to when the time comes

I'm the counter party for that channel. Here are the logs after the tx got a sufficient amount of confirmations:

2020-06-09 20:55:50,257 INFO  f.a.eclair.Diagnostics SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0)
2020-06-09 20:55:50,257 INFO  f.a.e.i.Peer$MessageLogs SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0)
2020-06-09 20:55:50,257 WARN  f.a.e.channel.Channel SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - unhandled event AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0) in state SYNCING 

and here is when I manually disconnected to get the channel unstuck:

2020-06-10 00:52:15,521 INFO  fr.acinq.eclair.io.Peer CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - disconnecting
2020-06-10 00:52:15,521 INFO  fr.acinq.eclair.io.Peer CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - connection lost
2020-06-10 00:52:15,522 INFO  f.a.e.i.ReconnectionTask CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - peer is disconnected
2020-06-10 00:52:16,374 INFO  f.a.e.io.PeerConnection SYN n:039eca59b1fe20fd725b11ae1d6bae8bbf91f09366534db2bedf6d1a123d9b127f - sent announcements: channels=0 updates=15 nodes=1
2020-06-10 00:52:17,377 INFO  f.a.e.io.PeerConnection SYN n:0301b4bd53dcf2363da523469e4888e7453e242995788efe8efc51f0d476fc3ef1 - sent announcements: channels=0 updates=15 nodes=1
2020-06-10 00:52:17,542 INFO  f.acinq.eclair.io.Server CON - connected to /46.165.245.154:34898
2020-06-10 00:52:17,989 INFO  f.a.e.io.PeerConnection CON - connection authenticated with 031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda@46.165.245.154:34898 direction=incoming
2020-06-10 00:52:17,989 INFO  f.a.e.io.PeerConnection CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - using features=000010000000101010001010
2020-06-10 00:52:17,990 INFO  f.a.e.io.PeerConnection CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - peer is using features=000000101010001010100001, networks=
2020-06-10 00:52:17,990 INFO  f.a.e.io.PeerConnection CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - sending sync channel range query with flags_opt=None
2020-06-10 00:52:17,990 INFO  f.a.e.io.PeerConnection CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - rebroadcast will be delayed by 29 seconds
2020-06-10 00:52:17,990 INFO  f.a.eclair.router.Router n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - sending query_channel_range=QueryChannelRange(6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000,0,2147483647,TlvStream(List(),List()))
2020-06-10 00:52:17,990 INFO  f.a.e.i.ReconnectionTask CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - peer is connected
2020-06-10 00:52:17,990 INFO  f.a.e.i.Peer$MessageLogs CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=ChannelReestablish(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,1,0,PrivateKey(0000000000000000000000000000000000000000000000000000000000000000),02da536a8a1156da3449e452cf7024b40631a43e8a4ad60d695fd8f03f6f8da7e9)
2020-06-10 00:52:17,991 INFO  f.a.eclair.Diagnostics CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=ChannelReestablish(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,1,0,PrivateKey(0000000000000000000000000000000000000000000000000000000000000000),02da536a8a1156da3449e452cf7024b40631a43e8a4ad60d695fd8f03f6f8da7e9)
2020-06-10 00:52:18,306 INFO  f.a.eclair.Diagnostics SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0)
2020-06-10 00:52:18,306 INFO  f.a.eclair.Diagnostics CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=ChannelReestablish(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,1,0,PrivateKey(0000000000000000000000000000000000000000000000000000000000000000),024e15e12457d747d318f4d42ad3e135f2f107ce484ad700e05c714c3ef7e4ac84)
2020-06-10 00:52:18,307 INFO  f.a.eclair.Diagnostics n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=FundingLocked(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,02449c86ea4e6d8a6f486c49bf84921f9fa950578db78cab7204d46a62f69b536c)
2020-06-10 00:52:18,307 INFO  f.a.e.i.Peer$MessageLogs SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0)
2020-06-10 00:52:18,307 INFO  f.a.e.i.Peer$MessageLogs CON n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=ChannelReestablish(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,1,0,PrivateKey(0000000000000000000000000000000000000000000000000000000000000000),024e15e12457d747d318f4d42ad3e135f2f107ce484ad700e05c714c3ef7e4ac84)
2020-06-10 00:52:18,307 INFO  f.a.e.i.Peer$MessageLogs n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - IN msg=FundingLocked(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,02449c86ea4e6d8a6f486c49bf84921f9fa950578db78cab7204d46a62f69b536c)
2020-06-10 00:52:18,307 WARN  f.a.e.channel.Channel SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - unhandled event AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,0d2c8a2b7d8e5cb04456b34d68484de0eb1c5ce92906e9477f0196933fd5add933cc6f04136d655474872ae16d347b2b2677d7848675c69bb2a3bd4666023646,ba5b5226feb283df4ed0f49b2990db11a00776912831b8059ac46e137f1346283d9b51ef7a4efd3cf6a8e7d33900e264128d42f8d719fc681350b35aaf6ec9e0) in state SYNCING
2020-06-10 00:52:18,307 INFO  f.a.e.io.PeerConnection SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - setting up gossipTimestampFilter=GossipTimestampFilter(6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000,1591743140,4294967295)
2020-06-10 00:52:18,310 INFO  f.a.e.channel.Channel n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - received their FundingLocked, deferring message
2020-06-10 00:52:18,374 INFO  f.a.e.b.b.ZmqWatcher - triggering WatchConfirmed(Actor[akka://eclair-node-gui/user/$l/switchboard/peer-031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda/$a#1046684304],53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a,ByteVector(34 bytes, 0x0020e91a7c0337f49ebafae520cfe1d5ca3651eb6867dba781129b08346dc0042a78),3,BITCOIN_FUNDING_DEPTHOK)
2020-06-10 00:52:18,377 INFO  f.a.e.io.PeerConnection SYN n:03cd75b4da2dcb7d1f3267f647ee5d17be436f10f02809351f191d37959a4cf290 - sent announcements: channels=0 updates=15 nodes=1
2020-06-10 00:52:18,380 INFO  f.a.e.channel.Channel n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - channelId=4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 was confirmed at blockHeight=633911 txIndex=2048
2020-06-10 00:52:18,388 INFO  f.a.e.i.Peer$MessageLogs n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=FundingLocked(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,02afa92cc399725f5fc90e35c441d55755d859bd1e8889bdb2bb596d40eedcb7de)
2020-06-10 00:52:18,389 INFO  f.a.eclair.Diagnostics n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=FundingLocked(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,02afa92cc399725f5fc90e35c441d55755d859bd1e8889bdb2bb596d40eedcb7de)
2020-06-10 00:52:18,461 INFO  f.a.e.b.b.ZmqWatcher - triggering WatchConfirmed(Actor[akka://eclair-node-gui/user/$l/switchboard/peer-031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda/$a#1046684304],53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a,ByteVector(34 bytes, 0x0020e91a7c0337f49ebafae520cfe1d5ca3651eb6867dba781129b08346dc0042a78),6,BITCOIN_FUNDING_DEEPLYBURIED)
2020-06-10 00:52:18,589 INFO  f.a.e.db.BackupHandler - database backup triggered by channelId=4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 took 200ms
2020-06-10 00:52:18,592 INFO  f.a.e.channel.Channel n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - emitting channel_update=ChannelUpdate(4a3b8ca6cbc3f01ffb1cdb4479b9ccf710750f2618b50e505bb49ebb83a351392e4d88773d008671fae7f4de10d24cdeb11e1a798c27bc0bbd1400bccdb60777,6fe28c0ab6f1b372c1a6a246ae63f74f931e8365e15a089c68d6190000000000,633911x2048x1,1591743138,1,2,CltvExpiryDelta(120),1 msat,1000 msat,500,Some(400000000 msat),ByteVector(empty)) enabled=false 
2020-06-10 00:52:18,593 INFO  f.a.e.channel.Channel n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - funding tx is deeply buried at blockHeight=633911 txIndex=2048 shortChannelId=633911x2048x1
2020-06-10 00:52:18,603 INFO  f.a.eclair.router.Router SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - received reply_channel_range with 8000 channels, we're missing 4150 channel announcements and 16000 updates, format=UNCOMPRESSED
2020-06-10 00:52:18,709 INFO  f.a.e.i.Peer$MessageLogs SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,60913f52d6f22ccfce295a9d0ba019a19eb0463798515518651c7874f467f57452839761f9ec1d0849eb54f6d1346115fe76df6067e8ec301777a3bb60cc3d77,593e4559719e3c5192fd92c8111b855b9e6a6b08a494e397a369bc9ec03d59cc7430116d64a0c11885bd4c791806bcc0cbdbec895e4e58d802bb54aa365a8d53)
2020-06-10 00:52:18,709 INFO  f.a.eclair.Diagnostics SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda c:4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 - OUT msg=AnnouncementSignatures(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152,633911x2048x1,60913f52d6f22ccfce295a9d0ba019a19eb0463798515518651c7874f467f57452839761f9ec1d0849eb54f6d1346115fe76df6067e8ec301777a3bb60cc3d77,593e4559719e3c5192fd92c8111b855b9e6a6b08a494e397a369bc9ec03d59cc7430116d64a0c11885bd4c791806bcc0cbdbec895e4e58d802bb54aa365a8d53)
2020-06-10 00:52:18,814 INFO  f.a.eclair.router.Router SYN n:031e643c3343d0561fcd7e303b468124e86b9ed3403ce82204bf93cc9d2546beda - received reply_channel_range with 8000 channels, we're missing 793 channel announcements and 16000 updates, format=UNCOMPRESSED
2020-06-10 00:52:18,826 INFO  f.a.e.db.BackupHandler - database backup triggered by channelId=4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 took 234ms
2020-06-10 00:52:18,962 INFO  f.a.e.db.BackupHandler - database backup triggered by channelId=4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 took 136ms

seems like my node is at fault and refusing to take the channel out of syncing. It shouldn't be in syncing and rather stay in wait_for_funding_locked or wait_for_confirmation I'd assume?!

The channel was already stuck in syncing long before the first confirmation.

Ok so it looks like at some point you got disconnected before the funding tx confirmation (this is why the channel is in SYNCING state).

The channel is still SYNCING because we're waiting for a channel_reestablish (since we were disconnected). It looks like we follow this requirement from the spec:

- upon reconnection:
    - if a channel is in an error state:
        - (...)
    - otherwise:
        - MUST transmit channel_reestablish for each channel.
        - MUST wait to receive the other node's channel_reestablish message before sending any other messages for that channel.

So we ignore the announcement_signatures because we didn't receive (or didn't correctly handle) a channel_reestablish to make us go back to the wait_for_funding_locked state.

Could you look for logs before 2020-06-09 20:55:50,257 to verify that 1) there was indeed a disconnection and 2) check that we didn't miss the channel_reestablish message for some reason?

Maybe @STAWKEYE's logs can also help check what was seen from the other side of the channel (if a disconnection was observed, if a channel_reestablish was attempted or not, etc).

In the logs, here is the portion at the creation of the funding tx, and then after the "..." is the first instance of "re-establish" many hours later, possibly when peer forced it to reconnect?
Theseus_Log.txt

I think it would be useful to see the truncated part of the logs, can you look for:

  • a disconnection
  • whether funding_locked was sent/received
  • whether channel_reestablish was sent/received

This is what came up:

2020-06-09 14:43:29.441 [ERR] FNDG: unable to report short chan id: link 4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152 not found
2020-06-09 14:43:29.445 [INF] FNDG: Peer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844) is online, sending FundingLocked for ChannelID(4acdd2fe1063110af6b40a0741d3663e9c772d605b436d0d65b09fd739804152)
2020-06-09 14:43:29.446 [INF] CHBU: Updating on-disk multi SCB backup: num_old_chans=1, num_new_chans=1

2020-06-09 18:52:20.173 [INF] HSWC: ChannelLink(633911:2048:1): resending FundingLocked message to peer
2020-06-09 18:52:20.173 [INF] HSWC: ChannelLink(633911:2048:1): received re-establishment message from remote side
2020-06-09 18:52:20.173 [INF] DISC: GossipSyncer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844): applying gossipFilter(start=2020-06-09 18:52:20.1730941 -0400 EDT m=+35855.866903401, end=2156-07-17 00:20:35.1730941 -0500 EST m=+4295003150.866903401)

2020-06-09 18:52:20.190 [INF] DISC: GossipSyncer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844): sending final chan range chunk, size=5327
2020-06-09 18:52:20.633 [INF] PEER: Already have ChannelPoint(53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a:1), ignoring.
2020-06-09 18:52:20.633 [INF] PEER: Processing retransmitted FundingLocked for ChannelPoint(53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a:1)
2020-06-09 18:52:20.778 [INF] DISC: GossipSyncer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844): fetching chan anns for 100 chans
2020-06-09 18:52:20.866 [INF] DISC: Received new remote channel announcement for 633911:2048:1
2020-06-09 18:52:20.884 [INF] DISC: Fully valid channel proof for short_chan_id=696992515609329665 constructed, adding to next ann batch

2020-06-09 22:17:11.944 [INF] DISC: GossipSyncer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844): sending final chan range chunk, size=5405
2020-06-09 22:17:12.014 [INF] HSWC: ChannelLink(633911:2048:1): received re-establishment message from remote side
2020-06-09 22:17:12.148 [INF] DISC: Received new remote channel announcement for 633911:2048:1
2020-06-09 22:17:12.439 [INF] DISC: GossipSyncer(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844): fetching chan anns for 100 chans

``` 2020-06-09 10:07:15.247 [INF] PEER: disconnecting 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735, reason: Peer 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735 no answer for 5m0s -- disconnecting
2020-06-09 10:07:15.248 [INF] PEER: unable to read message from 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735: read tcp 127.0.0.1:50436->127.0.0.1:9050: use of closed network connection
2020-06-09 10:07:15.348 [INF] DISC: Removing GossipSyncer for peer=02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
2020-06-09 10:07:15.348 [INF] DISC: GossipSyncer(03eb4425070e2d1f3ce13c550e8e02307270039aca1a0603b9c56193923cf66324): applying gossipFilter(start=2020-06-09 10:07:15.3483808 -0400 EDT m=+4351.042190101, end=2156-07-16 15:35:30.3483808 -0500 EST m=+4294971646.042190101)
2020-06-09 10:07:16.945 [INF] SRVR: Established connection to: 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735
2020-06-09 10:07:16.946 [INF] SRVR: Finalizing connection to 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735, inbound=false
2020-06-09 10:07:17.205 [INF] PEER: NodeKey(02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844) loading ChannelPoint(53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a:1)
2020-06-09 10:07:17.206 [WRN] PEER: Unable to find our forwarding policy for channel 53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a:1, using default values
2020-06-09 10:07:17.207 [INF] SRVR: Negotiated chan series queries with 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
2020-06-09 10:07:17.207 [INF] DISC: Creating new GossipSyncer for peer=02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844
2020-06-09 10:07:18.379 [INF] CRTR: Processed channels=0 updates=12 nodes=0 in last 59.9983834s


2020-06-09 18:52:17.686 [INF] PEER: unable to read message from 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735: EOF
2020-06-09 18:52:17.688 [INF] PEER: disconnecting 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735, reason: read handler closed
```

So we ignore the announcement_signatures because we didn't receive (or didn't correctly handle) a channel_reestablish to make us go back to the wait_for_funding_locked state.

AnnouncementSignatures and ChannelReestablish in lnd are not synchronized as tightly as the current version of the spec suggests, it's not uncommon to send AnnouncementSignatures before ChannelReestablish. This behavior has existed as long as I can remember but only recently did this stricter ordering receive clarification in the spec, I presume Eclair had a way of handling this in the past? Curious if the processing might have changed on your end in this regard.

In Theseus_Log.txt I see

2020-06-09 18:52:17.686 [INF] PEER: unable to read message from 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735: EOF
2020-06-09 18:52:17.688 [INF] PEER: disconnecting 02e9046555a9665145b0dbd7f135744598418df7d61d3660659641886ef1274844@31.17.70.80:9735, reason: read handler closed

2020-06-09 18:52:20.172 [INF] HSWC: ChannelLink(633911:2048:1): attempting to re-resynchronize
2020-06-09 18:52:20.173 [INF] HSWC: ChannelLink(633911:2048:1): resending FundingLocked message to peer
2020-06-09 18:52:20.173 [INF] HSWC: ChannelLink(633911:2048:1): received re-establishment message from remote side
2020-06-09 18:52:20.633 [INF] PEER: Processing retransmitted FundingLocked for ChannelPoint(53418039d79fb0650d6d435b602d779c3e66d341070ab4f60a116310fed2cd4a:1
2020-06-09 18:52:20.866 [INF] DISC: Received new remote channel announcement for 633911:2048:1

So from LND's PoV we have

XX Disconnect XX

ChannelReestablish -->
FundingLocked -->
 <-- ChannelReestablish
 <-- FundingLocked
 <-- AnnouncementSignatures

These logs don't show when LND sends its AnnouncmentSignatures (probably on a different log level), but according to the Eclair logs it was received before the ChannelReestablish.

@STAWKEYE thanks for the logs! you can use single back ticks for inline, and

triple back ticks to do blocks

So from LND's PoV we have [...]

Can you clarify where you see that from the lnd logs?
I can't see where lnd says it's sending its channel_reestablish...and from what I understand of the eclair logs, eclair doesn't seem to receive it.

it's not uncommon to send AnnouncementSignatures before ChannelReestablish

I don't think this should be an issue. If AnnouncementSignatures is received before ChannelReestablish we simply ignore it, it will only delay the moment where the channel will be announced but shouldn't be blocking.

@viaj3ro @STAWKEYE this is really hard to debug with only a partial view of the messages exchanged. It would be really helpful if you could both share the logs for what happened before @viaj3ro manually disconnected/reconnected (which seemed to happen at 2020-06-10 00:52:15,521 UTC). What happens after this reconnection isn't very interesting because it's the part that worked correctly (the channel was back up and working after the manual reconnection).

@t-bast Here's the full logs up until the above mentioned time. Unfortunately my logs appear to be in local time (EST) and so your timestamp corresponds to 2020-06-09 20:52 EST if I did that correctly.
Theseus_Log_Full.txt

Can you clarify where you see that from the lnd logs?

It's a little tough to infer, the logs aren't showing PEER which shows the actual wire ordering, they're a bit higher level.

That said, this line is logged before sending channel reestablish:

2020-06-09 18:52:20.172 [INF] HSWC: ChannelLink(633911:2048:1): attempting to re-resynchronize

Also ignore my prior ordering diagram, just noticed that the logs on LND's side is actually out of order with the actual message sequence (we should fix that). We log that we received ChannelReestablish after deciding whether to retransmit FundingLocked, so actually the exchange looked like this, which seems correct apart from not showing when/if LND sent AnnouncementSignatures:

XX Disconnect XX

ChannelReestablish -->
 <-- ChannelReestablish
FundingLocked -->
 <-- FundingLocked
 <-- AnnouncementSignatures

I don't think this should be an issue. If AnnouncementSignatures is received before ChannelReestablish we simply ignore it, it will only delay the moment where the channel will be announced but shouldn't be blocking.

Cool, also considering that the above looks correct, i think we can rule out issues related to AnnouncementSignatures and focus on the initial, unbroken connection and/or prior reconnections.

@STAWKEYE are you able to upload a more complete section of logs to pastebin? I'm happy to grep through them for the relevant info. As @t-bast notes it'd be useful to have older logs, ideally as early as the channel funding (if that's possible).

EDIT: ChannelReestablish -> AnnouncementSignatures

@cfromknecht The prior linked logs (Theseus_Log_Full.txt) was full from the initial setup of LND up until 2020-06-10 00:52:15,521 UTC at which point @viaj3ro manually disconnected/reconnected (which seemed to happen at 2020-06-10 00:52:15,521 UTC).

Here is what I can understand from the lnd logs:

  • Funding initiated around 2020-06-09 09:07:32 EST by lnd
  • Disconnection around 2020-06-09 10:07:15 EST
  • Reconnection right after that
  • Unclear from the lnd logs if channel_reestablish was sent by lnd
  • Funding locked happened around 2020-06-09 14:43:29 EST, sent from lnd to eclair
  • Disconnection around 2020-06-09 18:52:17 EST (manual disconnect by @viaj3ro)
  • Reconnection right after that
  • Channel reestablish received by lnd
  • Funding locked re-sent from lnd to eclair
  • Unclear from the lnd logs if channel_reestablish was sent by lnd

One notable difference is that after the second disconnect/reconnect (which happened around 2020-06-09 18:52:17 EST) there is this log line in lnd that wasn't there for the first disconnect/reconnect:

2020-06-09 18:52:20.173 [INF] HSWC: ChannelLink(633911:2048:1): received re-establishment message from remote side

Eclair should have sent a channel_reestablish at that point, it would be interesting to verify that.
If @viaj3ro could post the logs from the eclair side it would be really helpful, we log all incoming/outgoing messages so it will be easy to check.

I'm traveling atm. Will try to get you the logs asap

@t-bast thanks the details, in grepping through the logs I came to the same conclusions and I think I've pinpointed the issue.

After disconnecting/reconnecting with a pending channel, LND doesn't immediately send ChannelReestablish. Instead it waits on another subsystem to process the FundingLocked message as if the connection had never been broken.

The interaction enters a message-level deadlock because LND is waiting for Eclair to send FundingLocked, but Eclair won't send FundingLocked until it receives a ChannelReestablish from LND. LND-to-LND channels never experience this issue because they always resend FundingLocked if they detect the channel in a pending state, and can handle either ordering of FundingLocked and ChannelReestablish.

The "simple" solution is for LND to move it's funding locked handling into the htlcswitch rather than the fundingmanager, which is a change I've been wanting to make for some time. In 0.10 we did a bit of refactoring in this area in order to prepare for the transition, so it is on the radar.

In practice though this refactor turns out to be somewhat difficult since fundingmanager coordinates two or three other independent subsystems in making a channel active. We also need to consider how to do make the change in backwards compatibility with older LND nodes, otherwise newer LND nodes will enter this same deadlock with older ones as they upgrade.

For some time this coordination logic will be need to be spread across two different locations, further complicated by differing subsystem lifetimes, locking precedence, etc. The amount of effort and testing required to make the change properly is relatively high compared to the gain, which is why it tends to sit pretty low on the list of priorities. That being said, it has been a part of my agenda to prioritize more incremental improvements in this area, but it's probably not going to happen overnight sadly.

Gotcha, thanks for the details! I totally get that refactoring these parts of the code always take more time than expected. This bug is in no way critical, a simple reconnection after the FundingLocked fixes it which is acceptable.

Thanks for all the help in investigating 馃

yeah, as long as the channel can get unstuck manually it's not really a big deal.
In the past I had channels where I wasn't able to get them unstuck. So far I wasn't able to reproduce this behavior though. Will keep an eye on it and report here, if it happens again.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevenroose picture stevenroose  路  3Comments

whizz picture whizz  路  3Comments

AnthonyRonning picture AnthonyRonning  路  3Comments

sunnya97 picture sunnya97  路  3Comments

pm47 picture pm47  路  3Comments