Lightning: Problem connecting after a new IP address

Created on 23 Oct 2020  路  14Comments  路  Source: ElementsProject/lightning

Issue and Steps to Reproduce

Today I got a new IP address and after a restart my test lightning node wouldn't connect to my main lightning node anymore. Other external lightning nodes connected correctly (to my main lightning node).

So I did, at my test lightning node (NodeID is my main lightning node ID (of course)):

lightning-cli connect NodeID@NewIP:9735
{
   "code": 401,
   "message": "OldIP:9735: Connection establishment: Connection timed out. "
}

This doesn't seem right, it still points to my old IP address!

Removing the gossip_store didn't help.

Only after (at my test lightning node):

lightning-cli disconnect NodeID true

was I able to connect without a problem (with the above command lightning-cli connect NodeID@NewIP:9735)

This behavior doesn't seem right to me.

getinfo output

v0.9.1-80-g71381eb

(Note that getinfo gives the right new IP address)

All 14 comments

Is there any active channel? Which direction (from main to test or vice versa)?

@jsarenik Yes, the channel is active with send and receive both ways.

@sumBTC active both ways, I see. Which of the nodes initialized the channel? No matter how active or balanced the channel is, there is always one who started it.

@jsarenik Don't remember. My main node listpeers says direction: 0. My test node listpeers says direction: 1. That's probably what you want to now I guess.

@sumBTC Yes. Thanks! Is the test node connected to any other node than just the main node?

@jsarenik no, it's only connected to the main node

Looking at channeld/channeld.c it seems to me that your main node initiated the channel. Though that is just my quick understanding and may be wrong. Feel free to correct me @anyone.

In following text A is the main node and B is the test node.

If A opens a channel to B, A reports channel_direction=0. Both cases are described in following text anyway and now i think that no matter which one initiated the channel, both will try to reconnect (to the last-known address of the other).

When A changes address (and there is no other channel on B's side through which it could possibly get the gossip about A's new address): B may be trying to reach A on A's address from before, but since A has changed the address, it can not reach it but there are no other operating channels which would spread gossip of A's new node ID <-> address mapping. But also A should try to reach B and that should work if B's address has not changed. @sumBTC has it?

When B changes address (and is not connected to any other node via another channel): there is no way for A to know B's new address. All the gossip would be coming from the channel with A but the channel is not operating because A can not connect to the address it used to connect before. But B should try to reach A on its last-known address and that should work. Through that connection B gossips about its new address.

In both cases, exactly what you did (the manual intervention through lightning-cli connect) is the best way to quickly fix it. But it should be really needed only in case both A and B change their addresses at the same time. Did they?

That's not correct: the direction does not correspond to the funder/fundee direction, but rather the order in which the nodes are gossiped about and to identify which direction is meant when sending an update. direction will always be 0 on the lexicographically lower node_id, and 1 on the lexicographically higher node_id. In order to identify the funder you can look at the funding_msat field in lightning-cli listpeers. The funder will initially have all funds assigned to itself.

It doesn't really matter which node connects to which other node, the channel will be re-established either way, so looking into direction is a red-herring here. Rather it seems connectd prioritizes the previous known IP address over the one provided in the call to connect (and maybe prioritizes gossip messages over explicitly stated IP addresses). We need to invert that ordering and prefer manually stated ones over the ones from gossipd and the DB.

@cdecker I have to admit I didn't wait long enough for the gossip_store to rebuild. Would that have solved it in the long run?

@jsarenik Both had the same IP address change (they are on the same server). I now think I should have waited for the gossip_store to rebuild.

Eventually the nodes would have found each other, once they published a node_announcement and synced with the gossip. However in this case, since you explicitly provided an ip address we should be using that, and using gossip as a fallback.

@sumBTC any resolution on this?

@jsarenik I dunno. I just report issues and the c-lightning devs have to solve them while I'm sipping mai tais on the beach :-)

The issue is that we're already in the process of connecting to that peer.

https://github.com/ElementsProject/lightning/blob/fa1483a00d86303862ca76559d6c8b6837de6a19/connectd/connectd.c#L1459-L1461

This is why disconnect then connect works, it stops the in-process connection attempt.

I've got an attempted fix for this, working on getting a test for it (unfortunately not the easiest thing to test!)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ldn2017 picture ldn2017  路  4Comments

mloop1 picture mloop1  路  4Comments

cdecker picture cdecker  路  4Comments

gallizoltan picture gallizoltan  路  3Comments

saubyk picture saubyk  路  3Comments