xud is unable to shutdown gracefully

Created on 16 Jun 2020  ยท  43Comments  ยท  Source: ExchangeUnion/xud

Xud logs from the start of the application that resulted in memory leaks.
xud-not-able-to-gracefully-shutdown.log

I suspect it's because after swap failure/timeout we're forgetting about cleaning up some listeners.

P1 bug

Most helpful comment

I would like to take a look at this if that's okay for you guys?

All 43 comments

I also noticed that in our security tests Alice is unable to to shutdown gracefully:

=== RUN   TestSecurity
2020/06/17 07:49:42 xud: creating network
2020/06/17 07:49:45 ltcd: launching node...
2020/06/17 07:49:45 ltcd: 200 blocks generated
2020/06/17 07:49:45 lnd-ltc: launching network...
2020/06/17 07:49:53 btcd: launching node...
2020/06/17 07:49:54 btcd: 200 blocks generated
2020/06/17 07:49:54 lnd-btc: launching network...
2020/06/17 07:49:59 xud: launching network...
2020/06/17 07:50:01 Running 4 security tests
=== RUN   TestSecurity/network_initialization
=== RUN   TestSecurity/taker_stalling_on_swapAccepted
&{0xc0006fb260 <nil>} is<nil> e.errorder_id:"284e73f0-b06f-11ea-9dc1-4b0fea2b3766" pair_id:"LTC/BTC" quantity:1000000 peer_pub_key:"02c72be41e310427b60e726d10e9704b252c089da9cde76350888c5c8db696ff29" failure_reason:"DealTimedOut"  swapFailure=== RUN   TestSecurity/maker_stalling_after_1st_htlc
=== RUN   TestSecurity/taker_stalling_after_swap_succeeded
2020/06/17 07:50:58 lnd-btc: network harness teared down
2020/06/17 07:50:58 btcd: harness teared down
2020/06/17 07:50:58 lnd-ltc: network harness teared down
2020/06/17 07:50:58 ltcd: harness teared down
2020/06/17 07:50:58 xud network harness teared down
--- PASS: TestSecurity (76.47s)
    --- PASS: TestSecurity/network_initialization (0.13s)
    --- PASS: TestSecurity/taker_stalling_on_swapAccepted (11.21s)
    --- PASS: TestSecurity/maker_stalling_after_1st_htlc (36.26s)
    --- PASS: TestSecurity/taker_stalling_after_swap_succeeded (1.58s)
=== RUN   TestSecurityUnsettledChannels
2020/06/17 07:50:58 xud: creating network
2020/06/17 07:50:58 ltcd: launching node...
2020/06/17 07:50:59 ltcd: 200 blocks generated
2020/06/17 07:50:59 lnd-ltc: launching network...
2020/06/17 07:51:07 btcd: launching node...
2020/06/17 07:51:07 btcd: 200 blocks generated
2020/06/17 07:51:07 lnd-btc: launching network...
2020/06/17 07:51:13 xud: launching network...
2020/06/17 07:51:15 Running 4 unsettled-channels security tests
=== RUN   TestSecurityUnsettledChannels/network_initialization
=== RUN   TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc
=== RUN   TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc
=== RUN   TestSecurityUnsettledChannels/taker_shutdown_after_2nd_htlc
2020/06/17 07:56:04 xud process (13-Alice) did not shutdown gracefully. process (6759) killed
2020/06/17 07:58:56 lnd-btc: network harness teared down
2020/06/17 07:58:56 btcd: harness teared down
2020/06/17 07:58:56 lnd-ltc: network harness teared down
2020/06/17 07:58:56 ltcd: harness teared down
2020/06/17 07:58:56 xud network harness teared down
--- PASS: TestSecurityUnsettledChannels (478.01s)
    --- PASS: TestSecurityUnsettledChannels/network_initialization (0.14s)
    --- PASS: TestSecurityUnsettledChannels/maker_shutdown_after_1st_htlc (108.08s)
    --- PASS: TestSecurityUnsettledChannels/taker_stalling_after_2nd_htlc (170.64s)
    --- PASS: TestSecurityUnsettledChannels/taker_shutdown_after_2nd_htlc (180.98s)
PASS
ok      github.com/ExchangeUnion/xud-simulation 554.514s

In another non-graceful shutdown (which was eventually killed by the 3 min docker timeout) I observed xud continuing to connect to peers:

17/06/2020 08:31:44.886 [GLOBAL] info: XUD is shutting down
17/06/2020 08:31:44.888 [LND-BTC] info: new status: Disconnected
17/06/2020 08:31:44.889 [LND-LTC] info: new status: Disconnected
17/06/2020 08:31:44.890 [CONNEXT] info: new status: Disconnected
17/06/2020 08:31:44.893 [HTTP] info: http server has closed
17/06/2020 08:31:44.895 [P2P] debug: Peer 025443023c0303eb5671522cda0d49acb0e74c77f1371bf86262bad637835eb9e4 (LaundryBrush): closing socket. reason: Shutdown
17/06/2020 08:31:44.897 [LND-BTC] error: failed to update total outbound capacity: Error: 2 UNKNOWN: Channel Destroyed
    at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
    at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1209:28)
    at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
    at callback (/app/node_modules/grpc/src/client_interceptors.js:847:24)
17/06/2020 08:31:44.898 [LND-LTC] error: failed to update total outbound capacity: Error: 2 UNKNOWN: Channel Destroyed
    at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:91:15)
    at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1209:28)
    at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:568:42)
    at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:618:8)
    at callback (/app/node_modules/grpc/src/client_interceptors.js:847:24)
17/06/2020 08:31:44.899 [RPC] info: gRPC server completed shutdown
17/06/2020 08:31:44.901 [ORDERBOOK] debug: removed all orders for peer 025443023c0303eb5671522cda0d49acb0e74c77f1371bf86262bad637835eb9e4 (LaundryBrush)
17/06/2020 08:31:44.905 [GLOBAL] info: XUD shutdown gracefully
17/06/2020 08:31:44.906 [P2P] info: Peer 025443023c0303eb5671522cda0d49acb0e74c77f1371bf86262bad637835eb9e4 (LaundryBrush) socket closed
17/06/2020 08:31:45.060 [P2P] debug: Connection attempt #1 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - Failure. retrying in 5 sec...
17/06/2020 08:31:50.318 [P2P] debug: Connection attempt #2 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - Failure. retrying in 10 sec...
17/06/2020 08:32:00.591 [P2P] debug: Connection attempt #3 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - Failure. retrying in 20 sec...
17/06/2020 08:32:23.865 [P2P] debug: Connection attempt #4 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 40 sec...
17/06/2020 08:33:04.363 [P2P] debug: Connection attempt #5 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 80 sec...
17/06/2020 08:34:24.706 [P2P] debug: Connection attempt #6 to 032f570ca0fa0f94564fa7d911f185b509357cbc06258946900eb107e78aaab251@evnyc7w6lyqlh4735g2yssvw4pw4rl75ubtphpyf3mwx2zhshd57p4id.onion:28885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 160 sec...

Saw the same today:

=== RUN   TestSecurityUnsettledChannels/taker_shutdown_after_2nd_htlc
2020/06/17 10:29:53 xud process (25-Alice) did not shutdown gracefully. process (6333) killed

that defitely increase travis runtime.

can be easily reproducible with combined setup and buy ETH/BTC trade.

  1. perform combine setup.
  2. send buy ETH/BTC order -> failed swap (because of current connext client behaviour)
  3. shutdown xud -> fail

I would like to take a look at this if that's okay for you guys?

I tried a few scenarios (simnet stop/start during when xud working, don't unlock, don't wait fully initialization) but I couldn't reproduce this, could you please step by step explain how to reproduce?

โžœ  xud git:(master) โœ— bin/xud
16/08/2020 08:55:14.253 [GLOBAL] info: config file loaded
16/08/2020 08:55:14.258 [GLOBAL] info: CUSTOM-XUD
16/08/2020 08:55:14.274 [RPC] info: gRPC server listening on localhost:28886
16/08/2020 08:55:14.370 [DB] info: connected to database /Users/sercanozdemir/.xud/xud-simnet.db
16/08/2020 08:55:14.395 [LND-LTC] debug: loaded tls cert from /Users/sercanozdemir/lndltc/tls.cert
16/08/2020 08:55:14.396 [LND-BTC] debug: loaded tls cert from /Users/sercanozdemir/lndbtc/tls.cert
16/08/2020 08:55:14.399 [LND-LTC] debug: loaded macaroon from /Users/sercanozdemir/lndltc/data/chain/litecoin/simnet/admin.macaroon
16/08/2020 08:55:14.399 [LND-LTC] info: new status: Initialized
16/08/2020 08:55:14.401 [LND-LTC] info: trying to verify connection to lnd at localhost:31009
16/08/2020 08:55:14.403 [LND-BTC] debug: loaded macaroon from /Users/sercanozdemir/lndbtc/data/chain/bitcoin/simnet/admin.macaroon
16/08/2020 08:55:14.404 [LND-BTC] info: new status: Initialized
16/08/2020 08:55:14.404 [LND-BTC] info: trying to verify connection to lnd at localhost:30009
16/08/2020 08:55:14.433 [LND-BTC] debug: pubkey is 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2
16/08/2020 08:55:14.433 [LND-BTC] debug: uris are 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2@hknsjxzibu53ecdstnjn4mqexv7agucdktm4ddra2quqi7sa2jqqv2yd.onion:29735
16/08/2020 08:55:14.434 [LND-BTC] debug: chain is bitcoin-simnet
16/08/2020 08:55:14.437 [LND-LTC] debug: pubkey is 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2
16/08/2020 08:55:14.438 [LND-LTC] debug: uris are 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2@ymjxg7hvxjy5beaq4hdr2242uzzwnjchzryoklyzdqgf6k4ayatqzhad.onion:30735
16/08/2020 08:55:14.438 [LND-LTC] debug: chain is litecoin-simnet
16/08/2020 08:55:14.449 [LND-LTC] debug: new channel maximum outbound capacity: 490000000
16/08/2020 08:55:14.449 [LND-LTC] debug: new channel inbound capacity: 489963800
16/08/2020 08:55:14.450 [LND-LTC] debug: new channel total outbound capacity: 490000000
16/08/2020 08:55:14.450 [LND-LTC] info: new status: ConnectionVerified
16/08/2020 08:55:14.451 [LND-BTC] debug: new channel maximum outbound capacity: 245000000
16/08/2020 08:55:14.452 [LND-BTC] debug: new channel inbound capacity: 244990950
16/08/2020 08:55:14.452 [LND-BTC] debug: new channel total outbound capacity: 245000000
16/08/2020 08:55:14.452 [LND-BTC] info: new status: ConnectionVerified
16/08/2020 08:55:14.453 [GLOBAL] info: Node key is encrypted, unlock with 'xucli unlock', 'xucli create', or 'xucli restore'
^C16/08/2020 08:55:18.775 [GLOBAL] info: XUD is shutting down
16/08/2020 08:55:18.775 [LND-BTC] info: new status: Disconnected
16/08/2020 08:55:18.776 [LND-LTC] info: new status: Disconnected
16/08/2020 08:55:18.776 [CONNEXT] error: cannot set status to Disconnected from NotInitialized
16/08/2020 08:55:18.777 [CONNEXT] error: cannot set status to Disconnected from NotInitialized
16/08/2020 08:55:18.777 [RPC] info: gRPC server completed shutdown
16/08/2020 08:55:18.778 [GLOBAL] info: XUD shutdown gracefully
โžœ  xud git:(master) โœ— 

I tried a few scenarios (simnet stop/start during when xud working, don't unlock, don't wait fully initialization) but I couldn't reproduce this, could you please step by step explain how to reproduce?

โžœ  xud git:(master) โœ— bin/xud
16/08/2020 08:55:14.253 [GLOBAL] info: config file loaded
16/08/2020 08:55:14.258 [GLOBAL] info: CUSTOM-XUD
16/08/2020 08:55:14.274 [RPC] info: gRPC server listening on localhost:28886
16/08/2020 08:55:14.370 [DB] info: connected to database /Users/sercanozdemir/.xud/xud-simnet.db
16/08/2020 08:55:14.395 [LND-LTC] debug: loaded tls cert from /Users/sercanozdemir/lndltc/tls.cert
16/08/2020 08:55:14.396 [LND-BTC] debug: loaded tls cert from /Users/sercanozdemir/lndbtc/tls.cert
16/08/2020 08:55:14.399 [LND-LTC] debug: loaded macaroon from /Users/sercanozdemir/lndltc/data/chain/litecoin/simnet/admin.macaroon
16/08/2020 08:55:14.399 [LND-LTC] info: new status: Initialized
16/08/2020 08:55:14.401 [LND-LTC] info: trying to verify connection to lnd at localhost:31009
16/08/2020 08:55:14.403 [LND-BTC] debug: loaded macaroon from /Users/sercanozdemir/lndbtc/data/chain/bitcoin/simnet/admin.macaroon
16/08/2020 08:55:14.404 [LND-BTC] info: new status: Initialized
16/08/2020 08:55:14.404 [LND-BTC] info: trying to verify connection to lnd at localhost:30009
16/08/2020 08:55:14.433 [LND-BTC] debug: pubkey is 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2
16/08/2020 08:55:14.433 [LND-BTC] debug: uris are 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2@hknsjxzibu53ecdstnjn4mqexv7agucdktm4ddra2quqi7sa2jqqv2yd.onion:29735
16/08/2020 08:55:14.434 [LND-BTC] debug: chain is bitcoin-simnet
16/08/2020 08:55:14.437 [LND-LTC] debug: pubkey is 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2
16/08/2020 08:55:14.438 [LND-LTC] debug: uris are 0316725962eaea39eb70286eb0597f626af28091dff8944ec81b5dd23e8b2f55f2@ymjxg7hvxjy5beaq4hdr2242uzzwnjchzryoklyzdqgf6k4ayatqzhad.onion:30735
16/08/2020 08:55:14.438 [LND-LTC] debug: chain is litecoin-simnet
16/08/2020 08:55:14.449 [LND-LTC] debug: new channel maximum outbound capacity: 490000000
16/08/2020 08:55:14.449 [LND-LTC] debug: new channel inbound capacity: 489963800
16/08/2020 08:55:14.450 [LND-LTC] debug: new channel total outbound capacity: 490000000
16/08/2020 08:55:14.450 [LND-LTC] info: new status: ConnectionVerified
16/08/2020 08:55:14.451 [LND-BTC] debug: new channel maximum outbound capacity: 245000000
16/08/2020 08:55:14.452 [LND-BTC] debug: new channel inbound capacity: 244990950
16/08/2020 08:55:14.452 [LND-BTC] debug: new channel total outbound capacity: 245000000
16/08/2020 08:55:14.452 [LND-BTC] info: new status: ConnectionVerified
16/08/2020 08:55:14.453 [GLOBAL] info: Node key is encrypted, unlock with 'xucli unlock', 'xucli create', or 'xucli restore'
^C16/08/2020 08:55:18.775 [GLOBAL] info: XUD is shutting down
16/08/2020 08:55:18.775 [LND-BTC] info: new status: Disconnected
16/08/2020 08:55:18.776 [LND-LTC] info: new status: Disconnected
16/08/2020 08:55:18.776 [CONNEXT] error: cannot set status to Disconnected from NotInitialized
16/08/2020 08:55:18.777 [CONNEXT] error: cannot set status to Disconnected from NotInitialized
16/08/2020 08:55:18.777 [RPC] info: gRPC server completed shutdown
16/08/2020 08:55:18.778 [GLOBAL] info: XUD shutdown gracefully
โžœ  xud git:(master) โœ— 

Try a swap that fails/times out.

Okay, at the first glance it looks like as an grpc client lndclient hangs when a swap fails/times out or if we shutdown xud during a swap (as soon as swap starts for example). Xud hangs in these situations because of a hanged up socket (with ESTABLISHED statement).

At first I tried to find a way to listen grpc status and implement a complicated logic to shutdown gracefully, but in the end we're already shutting down gracefully only problem is a TCP socket with ESTABLISHED statement, and nobody actually listens it.

So I suggest a very simple solution, can we call process.exit(); at the end of our shutdown method? Tried and worked like a charm for me! Let me know for bad sides if there's any that I couldn't think of.

At first I tried to find a way to listen grpc status and implement a complicated logic to shutdown gracefully, but in the end we're already shutting down gracefully only problem is a TCP socket with ESTABLISHED statement, and nobody actually listens it.

Can you link to the code you believe is preventing the shutdown? Perhaps we can clean up the listener?

At first I tried to find a way to listen grpc status and implement a complicated logic to shutdown gracefully, but in the end we're already shutting down gracefully only problem is a TCP socket with ESTABLISHED statement, and nobody actually listens it.

Can you link to the code you believe is preventing the shutdown? Perhaps we can clean up the listener?

This actually happens on underlying socket, so when we call this https://github.com/ExchangeUnion/xud/blob/2ff63b9f4a3f7d69494c4ade5fdc1ac7ee2cd239/lib/lndclient/LndClient.ts#L210
we think everything is okay, actually everything must be okay, but somehow GRPC fails to switch outgoing TCP port to CLOSED and it hangs on ESTABLISHED if there's a failed swap or an ongoing swap.

You can try yourself by taking xud stuck statement and checking which port is on use by node. For mac;

sudo lsof -i -n -P | grep TCP

And you'll see that there's a random port is ESTABLISHED with 30009 (in my case with lndbtc) As soon as you stop lndbtc in this case, XUD shutdown too

What do you guys think?

At first I tried to find a way to listen grpc status and implement a complicated logic to shutdown gracefully, but in the end we're already shutting down gracefully only problem is a TCP socket with ESTABLISHED statement, and nobody actually listens it.

Can you link to the code you believe is preventing the shutdown? Perhaps we can clean up the listener?

This actually happens on underlying socket, so when we call this

https://github.com/ExchangeUnion/xud/blob/2ff63b9f4a3f7d69494c4ade5fdc1ac7ee2cd239/lib/lndclient/LndClient.ts#L210

we think everything is okay, actually everything must be okay, but somehow GRPC fails to switch outgoing TCP port to CLOSED and it hangs on ESTABLISHED if there's a failed swap or an ongoing swap.

You can try yourself by taking xud stuck statement and checking which port is on use by node. For mac;

sudo lsof -i -n -P | grep TCP

And you'll see that there's a random port is ESTABLISHED with 30009 (in my case with lndbtc) As soon as you stop lndbtc in this case, XUD shutdown too

Could you give me a list of steps to reproduce this? I tried a failing swap on current master branch and to my surprise xud was able to shut down gracefully.

I have a DO setup xud which I changed nothing but set it up from xud.sh script with simnet. Then locally started my xud with current master branch and put some buy orders from DO then I tried to match it with local xud and then as soon as I put a matched order (which starts a swap) I stopped my local XUD as soon as I start swap with it, then it failed to shutdown. Afterwards I checked the ports as I described

I have a DO setup xud which I changed nothing but set it up from xud.sh script with simnet. Then locally started my xud with current master branch and put some buy orders from DO then I tried to match it with local xud and then as soon as I put a matched order (which starts a swap) I stopped my local XUD as soon as I start swap with it, then it failed to shutdown. Afterwards I checked the ports as I described

Hmm, in this scenario I'm not sure if xud should even allow to start a graceful shutdown process since we're in the middle of a swap deal. Thoughts @sangaman?

The original issue was that xud was unable to shutdown gracefully AFTER a failed swap, but I'm no longer able to reproduce it.

I was unable to fail a swap too tho. But probably in the case you mentioned same thing was the problem

I just encountered this again with a SwapTimeOut issue, so I still say process exit would fix this.

image

I just encountered this again with a SwapTimeOut issue, so I still say process exit would fix this.

image

I'd prefer to look into cleaning up the listener first before going with the force process exit solution. It's still a potential memory leak that could accumulate over time.

I just encountered this again with a SwapTimeOut issue, so I still say process exit would fix this.
image

I'd prefer to look into cleaning up the listener first before going with the force process exit solution. It's still a potential memory leak that could accumulate over time.

But socket connection is getting closed as soon as we call process.exit, isn't it?

What Karl means is if we dont clean up listeners in normal operations that could leak memory over time and become a problem. Since this should be the root cause for the remaining ungraceful shutdowns, I would also prefer searching for and fixing up these.

xud not shutting down while swapping makes sense I think. We can check what lnd does as reference @rsercano @sangaman

I have a DO setup xud which I changed nothing but set it up from xud.sh script with simnet. Then locally started my xud with current master branch and put some buy orders from DO then I tried to match it with local xud and then as soon as I put a matched order (which starts a swap) I stopped my local XUD as soon as I start swap with it, then it failed to shutdown. Afterwards I checked the ports as I described

Hmm, in this scenario I'm not sure if xud should even allow to start a graceful shutdown process since we're in the middle of a swap deal. Thoughts @sangaman?

The original issue was that xud was unable to shutdown gracefully AFTER a failed swap, but I'm no longer able to reproduce it.

I think it makes sense not to allow a graceful shutdown when we're mid-swap. We want to wait for a swap resolution at least.

I also don't think we should be resorting to process.exit - at that point we're just doing a forced exit anyhow.

As things relate to swap timeouts, I wonder if we are not exiting gracefully because we are stuck waiting for a response from a SendPayment call to lnd or connext. In which case, again, we wouldn't want to exit, because we want to know what happens with that payment. If it goes through and we are the maker, we need to be ready to claim the payment coming to us.

If we've confirmed that the send payment call has failed for good, but xud is still not shutting down gracefully, that would merit a closer look.

If we've confirmed that the send payment call has failed for good, but xud is still not shutting down gracefully, that would merit a closer look.

In my case, this was the problem, after a swap timeout we still had open TCP port to lnd.

image

I'm not sure how to handle this, since GRPC only gives a close method (which we already call) and not sure if there's an event for this. I'll try to find a way tho, but still in this case we couldn't be sure why the port is open, since there're two possibilities; swap timeout or a swap is ongoing?

Good finding! Ideas what to do? @sangaman

If we've confirmed that the send payment call has failed for good, but xud is still not shutting down gracefully, that would merit a closer look.

In my case, this was the problem, after a swap timeout we still had open TCP port to lnd.

image

I'm not sure how to handle this, since GRPC only gives a close method (which we already call) and not sure if there's an event for this. I'll try to find a way tho, but still in this case we couldn't be sure why the port is open, since there're two possibilities; swap timeout or a swap is ongoing?

Can you provide logs from this? The reason for the timeout matters a lot here. If the swap times out while we are in the middle of a call to send payment or settle an invoice, we would want to keep the socket alive until we get a response.

I used steps from here https://github.com/ExchangeUnion/xud/issues/1637#issuecomment-657485019 and got SwapTimeout before second buy, at first buy. Here're the logs;

097272\",\"reqId\":\"312ecfc1-e2b9-11ea-844d-85fa3224e080\"}}"
20/08/2020 10:46:03.974 [CONNEXT] trace: sending request to /balance/0x0000000000000000000000000000000000000000
20/08/2020 10:46:03.974 [CONNEXT] trace: sending request to /balance/0x69C3d485623bA3f382Fc0FB6756c4574d43C1618
20/08/2020 10:46:06.973 [CONNEXT] trace: sending request to /balance/0x0000000000000000000000000000000000000000
20/08/2020 10:46:06.974 [CONNEXT] trace: sending request to /balance/0x69C3d485623bA3f382Fc0FB6756c4574d43C1618
20/08/2020 10:46:08.091 [SWAPS] trace: failing deal afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647 in state Active & phase SendingPayment due to SwapTimedOut
20/08/2020 10:46:08.092 [SWAPS] debug: deal afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647 failed due to SwapTimedOut
20/08/2020 10:46:08.093 [CONNEXT] warn: could not find paymentId for incoming transfer with hash afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647
20/08/2020 10:46:08.093 [SWAPS] trace: emitting swap.failed event for afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647
20/08/2020 10:46:08.093 [SWAPS] debug: Sending SwapTimedOut error to peer: {"rHash":"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647","failureReason":10}
20/08/2020 10:46:08.094 [ORDERBOOK] error: swap between orders f037f3c0-e2b8-11ea-92e4-91ba11097272 & ff9f9610-e2b8-11ea-844d-85fa3224e080 failed due to SwapTimedOut
20/08/2020 10:46:08.094 [ORDERBOOK] warn: swap for 200000000 failed during order matching due to SwapTimedOut, will repeat matching routine for failed quantity
20/08/2020 10:46:08.095 [ORDERBOOK] debug: repeating matching routine for ff9f9610-e2b8-11ea-844d-85fa3224e080 for failed quantity of 200000000
20/08/2020 10:46:08.095 [ORDERBOOK] debug: placeOrder max time exceeded. order ({"pairId":"ETH/BTC","price":null,"quantity":200000000,"isBuy":true,"localId":"ff9f9610-e2b8-11ea-844d-85fa3224e080","id":"ff9f9610-e2b8-11ea-844d-85fa3224e080","initialQuantity":200000000,"hold":0,"createdAt":1597909477873}) won't be fully matched
20/08/2020 10:46:08.096 [DB] trace: Executing (default): SELECT `id`, `nodePubKey`, `addressesText`, `lastAddressText`, `banned`, `createdAt`, `updatedAt` FROM `nodes` AS `Node` WHERE `Node`.`nodePubKey` = '03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095';
20/08/2020 10:46:08.096 [RPC] trace: call /xudrpc.Xud/PlaceOrder succeeded
20/08/2020 10:46:08.097 [P2P] trace: Sent SwapFailed packet to 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): "{\"body\":{\"failureReason\":10,\"rHash\":\"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647\"},\"header\":{\"id\":\"356610d0-e2b9-11ea-844d-85fa3224e080\"}}"
20/08/2020 10:46:08.102 [DB] trace: Executing (default): INSERT INTO `swapdeals` (`rHash`,`role`,`state`,`phase`,`failureReason`,`rPreimage`,`nodeId`,`orderId`,`localId`,`proposedQuantity`,`quantity`,`takerAmount`,`takerCurrency`,`takerCltvDelta`,`makerCltvDelta`,`makerAmount`,`makerCurrency`,`createTime`,`executeTime`,`completeTime`) VALUES ('afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647',0,1,3,10,'253f095ab871315614dc45a1574b78e8585a16ea122b29fe56a8620ff09a31da',4,'f037f3c0-e2b8-11ea-92e4-91ba11097272','ff9f9610-e2b8-11ea-844d-85fa3224e080',200000000,200000000,200000000,'ETH',200,62,2000000,'BTC',1597909477907,1597909478090,1597909568092) ON CONFLICT (`rHash`) DO UPDATE SET `takerCltvDelta`=EXCLUDED.`takerCltvDelta`,`rHash`=EXCLUDED.`rHash`,`orderId`=EXCLUDED.`orderId`,`proposedQuantity`=EXCLUDED.`proposedQuantity`,`rPreimage`=EXCLUDED.`rPreimage`,`takerCurrency`=EXCLUDED.`takerCurrency`,`makerCurrency`=EXCLUDED.`makerCurrency`,`takerAmount`=EXCLUDED.`takerAmount`,`makerAmount`=EXCLUDED.`makerAmount`,`localId`=EXCLUDED.`localId`,`phase`=EXCLUDED.`phase`,`state`=EXCLUDED.`state`,`role`=EXCLUDED.`role`,`createTime`=EXCLUDED.`createTime`,`makerCltvDelta`=EXCLUDED.`makerCltvDelta`,`quantity`=EXCLUDED.`quantity`,`executeTime`=EXCLUDED.`executeTime`,`completeTime`=EXCLUDED.`completeTime`,`failureReason`=EXCLUDED.`failureReason`,`nodeId`=EXCLUDED.`nodeId`;
20/08/2020 10:46:09.975 [CONNEXT] trace: sending request to /balance/0x0000000000000000000000000000000000000000
20/08/2020 10:46:09.976 [CONNEXT] trace: sending request to /balance/0x69C3d485623bA3f382Fc0FB6756c4574d43C1618
20/08/2020 10:46:12.975 [CONNEXT] trace: sending request to /balance/0x0000000000000000000000000000000000000000
20/08/2020 10:46:12.976 [CONNEXT] trace: sending request to /balance/0x69C3d485623bA3f382Fc0FB6756c4574d43C1618
20/08/2020 10:46:13.073 [P2P] trace: Received SwapFailed packet from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): "{\"body\":{\"failureReason\":10,\"rHash\":\"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647\"},\"header\":{\"id\":\"3858c3f0-e2b9-11ea-92e4-91ba11097272\"}}"
20/08/2020 10:46:13.073 [P2P] debug: received swapFailed due to SwapTimedOut from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): {"rHash":"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647","failureReason":10}
20/08/2020 10:46:13.078 [DB] trace: Executing (default): SELECT `SwapDeal`.`rHash`, `SwapDeal`.`role`, `SwapDeal`.`state`, `SwapDeal`.`phase`, `SwapDeal`.`failureReason`, `SwapDeal`.`errorMessage`, `SwapDeal`.`rPreimage`, `SwapDeal`.`nodeId`, `SwapDeal`.`orderId`, `SwapDeal`.`localId`, `SwapDeal`.`proposedQuantity`, `SwapDeal`.`quantity`, `SwapDeal`.`takerAmount`, `SwapDeal`.`takerCurrency`, `SwapDeal`.`takerPubKey`, `SwapDeal`.`takerCltvDelta`, `SwapDeal`.`makerCltvDelta`, `SwapDeal`.`makerAmount`, `SwapDeal`.`makerCurrency`, `SwapDeal`.`createTime`, `SwapDeal`.`executeTime`, `SwapDeal`.`completeTime`, `Node`.`id` AS `Node.id`, `Node`.`nodePubKey` AS `Node.nodePubKey`, `Node`.`addressesText` AS `Node.addressesText`, `Node`.`lastAddressText` AS `Node.lastAddressText`, `Node`.`banned` AS `Node.banned`, `Node`.`createdAt` AS `Node.createdAt`, `Node`.`updatedAt` AS `Node.updatedAt`, `Order`.`id` AS `Order.id`, `Order`.`nodeId` AS `Order.nodeId`, `Order`.`localId` AS `Order.localId`, `Order`.`initialQuantity` AS `Order.initialQuantity`, `Order`.`pairId` AS `Order.pairId`, `Order`.`price` AS `Order.price`, `Order`.`isBuy` AS `Order.isBuy`, `Order`.`createdAt` AS `Order.createdAt` FROM `swapdeals` AS `SwapDeal` LEFT OUTER JOIN `nodes` AS `Node` ON `SwapDeal`.`nodeId` = `Node`.`id` LEFT OUTER JOIN `orders` AS `Order` ON `SwapDeal`.`orderId` = `Order`.`id` WHERE `SwapDeal`.`rHash` = 'afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647';
20/08/2020 10:46:13.082 [SWAPS] warn: received unexpected swap failed packet for deal with payment hash afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647
20/08/2020 10:46:15.976 [CONNEXT] trace: sending request to /balance/0x0000000000000000000000000000000000000000
20/08/2020 10:46:15.976 [CONNEXT] trace: sending request to /balance/0x69C3d485623bA3f382Fc0FB6756c4574d43C1618
20/08/2020 10:46:17.884 [P2P] trace: Received SwapFailed packet from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): "{\"body\":{\"errorMessage\":\"Timeout has occurred\",\"failureReason\":12,\"rHash\":\"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647\"},\"header\":{\"id\":\"3b36dda0-e2b9-11ea-92e4-91ba11097272\"}}"
20/08/2020 10:46:17.885 [P2P] debug: received swapFailed due to UnknownError from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): {"rHash":"afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647","errorMessage":"Timeout has occurred","failureReason":12}
20/08/2020 10:46:17.887 [DB] trace: Executing (default): SELECT `SwapDeal`.`rHash`, `SwapDeal`.`role`, `SwapDeal`.`state`, `SwapDeal`.`phase`, `SwapDeal`.`failureReason`, `SwapDeal`.`errorMessage`, `SwapDeal`.`rPreimage`, `SwapDeal`.`nodeId`, `SwapDeal`.`orderId`, `SwapDeal`.`localId`, `SwapDeal`.`proposedQuantity`, `SwapDeal`.`quantity`, `SwapDeal`.`takerAmount`, `SwapDeal`.`takerCurrency`, `SwapDeal`.`takerPubKey`, `SwapDeal`.`takerCltvDelta`, `SwapDeal`.`makerCltvDelta`, `SwapDeal`.`makerAmount`, `SwapDeal`.`makerCurrency`, `SwapDeal`.`createTime`, `SwapDeal`.`executeTime`, `SwapDeal`.`completeTime`, `Node`.`id` AS `Node.id`, `Node`.`nodePubKey` AS `Node.nodePubKey`, `Node`.`addressesText` AS `Node.addressesText`, `Node`.`lastAddressText` AS `Node.lastAddressText`, `Node`.`banned` AS `Node.banned`, `Node`.`createdAt` AS `Node.createdAt`, `Node`.`updatedAt` AS `Node.updatedAt`, `Order`.`id` AS `Order.id`, `Order`.`nodeId` AS `Order.nodeId`, `Order`.`localId` AS `Order.localId`, `Order`.`initialQuantity` AS `Order.initialQuantity`, `Order`.`pairId` AS `Order.pairId`, `Order`.`price` AS `Order.price`, `Order`.`isBuy` AS `Order.isBuy`, `Order`.`createdAt` AS `Order.createdAt` FROM `swapdeals` AS `SwapDeal` LEFT OUTER JOIN `nodes` AS `Node` ON `SwapDeal`.`nodeId` = `Node`.`id` LEFT OUTER JOIN `orders` AS `Order` ON `SwapDeal`.`orderId` = `Order`.`id` WHERE `SwapDeal`.`rHash` = 'afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647';
20/08/2020 10:46:17.889 [SWAPS] warn: received unexpected swap failed packet for deal with payment hash afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647
^C20/08/2020 10:46:18.213 [GLOBAL] info: XUD is shutting down
20/08/2020 10:46:18.214 [LND-BTC] info: new status: Disconnected
20/08/2020 10:46:18.214 [LND-LTC] info: new status: Disconnected
20/08/2020 10:46:18.215 [CONNEXT] info: new status: Disconnected
20/08/2020 10:46:18.216 [P2P] debug: Peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): closing socket. reason: Shutdown
20/08/2020 10:46:18.216 [HTTP] info: http server has closed
20/08/2020 10:46:18.217 [RPC] info: gRPC server completed shutdown
20/08/2020 10:46:18.217 [P2P] trace: Sent Disconnecting packet to 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): "{\"body\":{\"reason\":9},\"header\":{\"id\":\"3b6eb680-e2b9-11ea-844d-85fa3224e080\"}}"
20/08/2020 10:46:18.219 [ORDERBOOK] debug: removed all orders for peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 10:46:18.221 [GLOBAL] info: XUD shutdown gracefully

@sangaman

Found another case, I stopped/started connext docker container multiple times, and apparently a TCP socket has stuck on ESTABLISHED

โžœ  xud git:(master) โœ— bin/xud
20/08/2020 12:55:53.977 [GLOBAL] info: config file loaded
20/08/2020 12:55:54.042 [RPC] info: gRPC server listening on localhost:28886
20/08/2020 12:55:54.225 [DB] info: connected to database /Users/sercanozdemir/.xud/xud-simnet.db
20/08/2020 12:55:54.256 [LND-BTC] debug: loaded tls cert from /Users/sercanozdemir/lndbtc/tls.cert
20/08/2020 12:55:54.257 [LND-LTC] debug: loaded tls cert from /Users/sercanozdemir/lndltc/tls.cert
20/08/2020 12:55:54.262 [LND-BTC] debug: loaded macaroon from /Users/sercanozdemir/lndbtc/data/chain/bitcoin/simnet/admin.macaroon
20/08/2020 12:55:54.262 [LND-BTC] info: new status: Initialized
20/08/2020 12:55:54.265 [LND-BTC] info: trying to verify connection to lnd at localhost:30009
20/08/2020 12:55:54.266 [LND-LTC] debug: loaded macaroon from /Users/sercanozdemir/lndltc/data/chain/litecoin/simnet/admin.macaroon
20/08/2020 12:55:54.267 [LND-LTC] info: new status: Initialized
20/08/2020 12:55:54.267 [LND-LTC] info: trying to verify connection to lnd at localhost:31009
20/08/2020 12:55:54.301 [LND-LTC] debug: pubkey is 024aa6f0cfe19b241195479f778390e5b71cf950253ff818b0aff38ca96706ba05
20/08/2020 12:55:54.302 [LND-LTC] debug: uris are 024aa6f0cfe19b241195479f778390e5b71cf950253ff818b0aff38ca96706ba05@g4n7673vbm32uldewnahsn3r7yxdzcwnaevyllsuoz57wzcehqjgi5id.onion:30735
20/08/2020 12:55:54.302 [LND-LTC] debug: chain is litecoin-simnet
20/08/2020 12:55:54.306 [LND-BTC] debug: pubkey is 024aa6f0cfe19b241195479f778390e5b71cf950253ff818b0aff38ca96706ba05
20/08/2020 12:55:54.306 [LND-BTC] debug: uris are 024aa6f0cfe19b241195479f778390e5b71cf950253ff818b0aff38ca96706ba05@33f67f4oehxoicdfivmwlhxeasv6rvsq2mbbtjt3tzcno5ifun3sotqd.onion:29735
20/08/2020 12:55:54.307 [LND-BTC] debug: chain is bitcoin-simnet
20/08/2020 12:55:54.317 [LND-LTC] debug: new channel maximum outbound capacity: 490000000
20/08/2020 12:55:54.317 [LND-LTC] debug: new channel inbound capacity: 489963800
20/08/2020 12:55:54.317 [LND-LTC] debug: new channel total outbound capacity: 490000000
20/08/2020 12:55:54.318 [LND-LTC] info: new status: ConnectionVerified
20/08/2020 12:55:54.327 [LND-BTC] debug: new channel maximum outbound capacity: 247000000
20/08/2020 12:55:54.327 [LND-BTC] debug: new channel inbound capacity: 244990950
20/08/2020 12:55:54.328 [LND-BTC] debug: new channel total outbound capacity: 247000000
20/08/2020 12:55:54.328 [LND-BTC] info: new status: ConnectionVerified
20/08/2020 12:55:54.329 [GLOBAL] info: Node key is encrypted, unlock with 'xucli unlock', 'xucli create', or 'xucli restore'
20/08/2020 12:56:00.071 [GLOBAL] info: Local nodePubKey is 02ab876bd1e58c472533e5b3fa43c90b34e625363ce767b93948bbb4ed329bae32
20/08/2020 12:56:00.089 [CONNEXT] info: new status: Initialized
20/08/2020 12:56:00.090 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:00.098 [CONNEXT] info: new status: Disconnected
20/08/2020 12:56:00.099 [CONNEXT] error: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:00.099 [CONNEXT] error: could not verify connection to connext, retrying in 5000 ms: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:00.099 [P2P] info: Connecting to simnet XU network
20/08/2020 12:56:00.101 [P2P] info: p2p server listening on 0.0.0.0:28885
20/08/2020 12:56:00.104 [HTTP] info: http server listening on localhost:28887
20/08/2020 12:56:00.127 [P2P] info: Connecting to known / previously connected peers
20/08/2020 12:56:00.128 [P2P] debug: creating new outbound socket connection to xud1.simnet.exchangeunion.com:28885
20/08/2020 12:56:00.131 [P2P] debug: creating new outbound socket connection to 104.248.29.249:28885
20/08/2020 12:56:00.131 [P2P] debug: creating new outbound socket connection to 104.248.29.249:28885
20/08/2020 12:56:00.132 [P2P] debug: creating new outbound socket connection to xud.kilrau.com:28885
20/08/2020 12:56:00.210 [P2P] debug: Connected pre-handshake to 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542@xud1.simnet.exchangeunion.com:28885
20/08/2020 12:56:00.239 [P2P] debug: Connected pre-handshake to 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095@104.248.29.249:28885
20/08/2020 12:56:00.245 [P2P] debug: Connected pre-handshake to 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885
20/08/2020 12:56:00.252 [P2P] debug: Connected pre-handshake to 03ebe60e1712fe0fe526e5cbd97770c4471ae7612ea294f7d17379fd5b80d0d5f8@xud.kilrau.com:28885
20/08/2020 12:56:00.314 [P2P] debug: Peer 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542@xud1.simnet.exchangeunion.com:28885 session in-encryption enabled
20/08/2020 12:56:00.331 [P2P] debug: Peer 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow) session out-encryption enabled
20/08/2020 12:56:00.332 [P2P] verbose: opened connection to 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow) at xud1.simnet.exchangeunion.com:28885
20/08/2020 12:56:00.339 [P2P] debug: Peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095@104.248.29.249:28885 session in-encryption enabled
20/08/2020 12:56:00.343 [P2P] debug: received disconnecting packet from 03ebe60e1712fe0fe526e5cbd97770c4471ae7612ea294f7d17379fd5b80d0d5f8@xud.kilrau.com:28885 due to AuthFailureInvalidTarget
20/08/2020 12:56:00.344 [P2P] info: Socket closed prior to handshake with 03ebe60e1712fe0fe526e5cbd97770c4471ae7612ea294f7d17379fd5b80d0d5f8@xud.kilrau.com:28885
20/08/2020 12:56:00.345 [P2P] warn: could not open connection to outbound peer (03ebe60e1712fe0fe526e5cbd97770c4471ae7612ea294f7d17379fd5b80d0d5f8@xud.kilrau.com:28885): Peer 03ebe60e1712fe0fe526e5cbd97770c4471ae7612ea294f7d17379fd5b80d0d5f8@xud.kilrau.com:28885 disconnected from us due to AuthFailureInvalidTarget
20/08/2020 12:56:00.346 [P2P] debug: received disconnecting packet from 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885 due to AuthFailureInvalidTarget
20/08/2020 12:56:00.347 [P2P] info: Socket closed prior to handshake with 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885
20/08/2020 12:56:00.347 [P2P] warn: could not open connection to outbound peer (02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885): Peer 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885 disconnected from us due to AuthFailureInvalidTarget
20/08/2020 12:56:00.348 [P2P] debug: creating new outbound socket connection to 104.248.29.249:28885
20/08/2020 12:56:00.364 [P2P] debug: Peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic) session out-encryption enabled
20/08/2020 12:56:00.365 [P2P] verbose: opened connection to 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic) at 104.248.29.249:28885
20/08/2020 12:56:00.407 [P2P] verbose: received 0 orders from 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow)
20/08/2020 12:56:00.409 [P2P] verbose: received 3 nodes (2 new) from 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow)
20/08/2020 12:56:00.428 [P2P] debug: Connected pre-handshake to 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885
20/08/2020 12:56:00.457 [P2P] verbose: received 0 orders from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 12:56:00.458 [P2P] verbose: received 3 nodes (2 new) from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 12:56:00.521 [P2P] debug: received disconnecting packet from 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885 due to AuthFailureInvalidTarget
20/08/2020 12:56:00.521 [P2P] info: Socket closed prior to handshake with 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885
20/08/2020 12:56:00.522 [P2P] warn: could not open connection to outbound peer (02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885): Peer 02ea79f3a2a26deaee328839638df9dc8e2886ec863485f9ebb65c7f3de52400d7@104.248.29.249:28885 disconnected from us due to AuthFailureInvalidTarget
20/08/2020 12:56:00.522 [P2P] info: Completed start-up connections to known peers
20/08/2020 12:56:03.213 [CONNEXT] error: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:03.217 [CONNEXT] error: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:05.091 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:05.093 [CONNEXT] error: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:05.093 [CONNEXT] error: could not verify connection to connext, retrying in 5000 ms: Error: connect ECONNREFUSED 127.0.0.1:25040
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1128:14)
20/08/2020 12:56:09.913 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:09.914 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:10.097 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:10.101 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:10.101 [CONNEXT] error: could not verify connection to connext, retrying in 5000 ms: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:15.103 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:15.108 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:15.109 [CONNEXT] error: could not verify connection to connext, retrying in 5000 ms: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:15.199 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:15.201 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:20.109 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:20.113 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:20.113 [CONNEXT] error: could not verify connection to connext, retrying in 5000 ms: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:23.603 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:23.604 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
20/08/2020 12:56:25.117 [CONNEXT] info: trying to verify connection to connext
20/08/2020 12:56:25.263 [CONNEXT] info: new status: ConnectionVerified
20/08/2020 12:56:30.784 [GLOBAL] info: Stopped monitoring Connext balances for automatic funding and channel creation
20/08/2020 12:57:00.403 [P2P] verbose: received 0 orders from 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow)
20/08/2020 12:57:00.449 [P2P] verbose: received 1 orders from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 12:57:00.450 [P2P] verbose: received 0 orders from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 12:57:00.450 [P2P] verbose: received 0 orders from 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 13:06:30.418 [CONNEXT] error: connext server error 500: Internal Server Error
20/08/2020 13:06:30.419 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.419 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.420 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.420 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.421 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.422 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.423 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.424 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.425 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:06:30.425 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.747 [CONNEXT] error: connext server error 500: Internal Server Error
20/08/2020 13:33:13.747 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.748 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.766 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.766 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.766 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.766 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.766 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.767 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.768 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.769 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.769 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.769 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.769 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.773 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.773 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.773 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.774 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.774 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.774 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.774 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.775 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.775 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.775 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
20/08/2020 13:33:13.775 [CONNEXT] error: failed to update total outbound capacity: 8.5 - connext server error 500: Internal Server Error
^C20/08/2020 13:40:39.386 [GLOBAL] info: XUD is shutting down
20/08/2020 13:40:39.387 [LND-BTC] info: new status: Disconnected
20/08/2020 13:40:39.388 [LND-LTC] info: new status: Disconnected
20/08/2020 13:40:39.388 [CONNEXT] info: new status: Disconnected
20/08/2020 13:40:39.389 [P2P] debug: Peer 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow): closing socket. reason: Shutdown
20/08/2020 13:40:39.390 [P2P] debug: Peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic): closing socket. reason: Shutdown
20/08/2020 13:40:39.391 [HTTP] info: http server has closed
20/08/2020 13:40:39.391 [RPC] info: gRPC server completed shutdown
20/08/2020 13:40:39.393 [ORDERBOOK] debug: removed all orders for peer 03ece33a30db1dbce4b62fa96a5e9541138a24997ef5672eebed2d332270e39542 (OzoneYellow)
20/08/2020 13:40:39.394 [ORDERBOOK] debug: removed all orders for peer 03ad8a98151caa68e41412359b9b9c1e1467f1ed88b4bd3e70d8fad4ab795b3095 (NeglectAcoustic)
20/08/2020 13:40:39.396 [GLOBAL] info: XUD shutdown gracefully

Screenshot of TCP port;

image

Update; After 5 minutes it shutdown without an issue.

It's not clear to me why the first logs timed out/failed in the first place, it looks like the log lines there start off midswap, but it looks like we're in the middle of sending payment and I'd guess we're waiting for a response to a send payment call. That seems desirable, I think, although we can improve the logging to say something to this effect rather than "xud shutdown gracefully" when we really haven't.

20/08/2020 10:46:08.091 [SWAPS] trace: failing deal afd763796a02efc80a7f6d85cfed64bccb8d9888aa01aa47f0b8d81b757e8647 in state Active & phase SendingPayment due to SwapTimedOut

Second one looks like flakiness with the connext client and long-running requests, it's alsoo hard to make perfect sense of it but 25040 is the connext port and we're making numerous balance and health checks in the logs it seems.

One thing I'm thinking that might make sense is to track the async requests we're waiting on to connext and possibly also lnd. For any "non-essential" requests, like a health check or balance check, we can just terminate upon graceful shutdown. That would take some refactoring but shouldn't be too bad. The send payment and settle invoice calls are mission critical though and we need to let them run their course, but making the connext error handling more robust would probably help.

There's also a question I'm thinking of now of whether we want to allow a graceful xud shutdown at all when we're in the middle of a swap recovery or have an outgoing payment as maker that's still in pending state.

For connext connections, it appears that connection eventually shuts down without an issue, even if it takes minutes (~3-5).

I'm not sure about terminating non-essential requests, IMHO they shouldn't take long or they shouldn't stay open since we call close on these channels, I'm not sure if that's the right thing to do here.

As you mentioned I guess the right thing to do for all cases would be preventing the shutdown if there's an ongoing swap? I'll try to add a check to shutdown to check if there's a swap ongoing, and will prevent the shutdown if there's.

What I noticed is, that quite often xud continues attempting to connect to pears oafter the shutdown command was fired, causing it to be force killed:

01/09/2020 12:49:32.632 [GLOBAL] info: XUD is shutting down
01/09/2020 12:49:32.633 [LND-BTC] info: new status: Disconnected
01/09/2020 12:49:32.634 [LND-LTC] info: new status: Disconnected
01/09/2020 12:49:32.636 [CONNEXT] info: new status: Disconnected
01/09/2020 12:49:32.641 [HTTP] info: http server has closed
01/09/2020 12:49:32.646 [P2P] debug: Peer 03fe8eed12784d2bc532a6943bd0fb5c86c55b82802650a6f6240df22617689ab1 (MediaSolution): closing socket. reason: Shutdown
01/09/2020 12:49:32.649 [P2P] debug: Peer 0282675ded0c0efe029faa8e371e07a930f8cc01c1c1f9d2558d11475fb229310f (ApologyEngine): closing socket. reason: Shutdown
01/09/2020 12:49:32.655 [ORDERBOOK] debug: removed all orders for peer 03fe8eed12784d2bc532a6943bd0fb5c86c55b82802650a6f6240df22617689ab1 (MediaSolution)
01/09/2020 12:49:32.659 [ORDERBOOK] debug: removed all orders for peer 0282675ded0c0efe029faa8e371e07a930f8cc01c1c1f9d2558d11475fb229310f (ApologyEngine)
01/09/2020 12:49:32.664 [RPC] info: gRPC server completed shutdown
01/09/2020 12:49:32.669 [GLOBAL] info: XUD shutdown gracefully
Sep 01 12:49:32.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:32.794 [P2P] debug: Connection attempt #2 to 03dc5dbe77106209287a404d2bd5648a8c8a4ffaa8f86d2cfa5c22e662e65e8809@7ohqlrhsxtqcq5guncqd67kn3qdl2bief6wwajaxp6q5joboh4mzlvid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:35.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:35.141 [P2P] debug: Connection attempt #2 to 02d7348d32c1e97f38c582946e4430a924c7a274d03051865a7e6a7d542cce642b@vyf5w5wmjsupjsmp5ilgjv5hgw4dbis6yheezvwhqrgljoo3zohnvoqd.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:35.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:35.501 [P2P] debug: Connection attempt #2 to 03283fd0f332c65fa09ab8e1baf12af2f54fa8a68e0d9f9cda3f80a7d5913d350f@2jeodabdh22bndi2yljgatjtlj5dyuwek6kkxsop6zvugsk5f6vkr3id.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:36.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:36.229 [P2P] debug: Connection attempt #2 to 02343e43afc1ca9d06f00f543009a34bc0f9be1962c284b959cb0f9fdb26515b95@zxu7lmdqwsudcsrj6klreephrowfbpdiushivrkj34vrso6cmfatctid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:36.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:36.342 [P2P] debug: Connection attempt #2 to 02e2fb25a63e6e1df25a883e3d19e8048131571808eba13e9ada68231625d1f8c0@tpe4x2e66ozdjwd7onxpevhfffsuttbso4ro2sk4p4zpmo3slbze5dyd.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:36.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:36.584 [P2P] debug: Connection attempt #2 to 028c352553d75089feaa3346c45fa0424d2a3b28b17d2eb00aab6bf2f8fb01df1e@ds24ehollfizdzr7ywj46k6g6qhqzi5gog2pp4oubxmque3quqthprid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
Sep 01 12:49:40.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:40.183 [P2P] debug: Connection attempt #2 to 02981221c795310b8725c31da5edba787a65825691871457861a9123030be3aa17@k44jcf66lcsgkkmtu3huzt7bmx5bxvfnc2antsxfron7ri555eaynwqd.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 10 sec...
01/09/2020 12:49:47.165 [P2P] debug: Connection attempt #1 to 0396e1bf69912270616f91fba3a6934051f465b54e7cec72513156f3d6980e1a3a@um42mivd25ilwoul2gxs3rnxu7r2uhwqv5edxhk76bnnjsyfzbosfyid.onion:18885 failed: Proxy connection timed out. retrying in 5 sec...
Sep 01 12:49:51.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:51.611 [P2P] debug: Connection attempt #3 to 03dc5dbe77106209287a404d2bd5648a8c8a4ffaa8f86d2cfa5c22e662e65e8809@7ohqlrhsxtqcq5guncqd67kn3qdl2bief6wwajaxp6q5joboh4mzlvid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...
Sep 01 12:49:53.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:53.354 [P2P] debug: Connection attempt #3 to 02e2fb25a63e6e1df25a883e3d19e8048131571808eba13e9ada68231625d1f8c0@tpe4x2e66ozdjwd7onxpevhfffsuttbso4ro2sk4p4zpmo3slbze5dyd.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...
Sep 01 12:49:53.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:53.889 [P2P] debug: Connection attempt #3 to 028c352553d75089feaa3346c45fa0424d2a3b28b17d2eb00aab6bf2f8fb01df1e@ds24ehollfizdzr7ywj46k6g6qhqzi5gog2pp4oubxmque3quqthprid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...
Sep 01 12:49:53.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:53.963 [P2P] debug: Connection attempt #3 to 03283fd0f332c65fa09ab8e1baf12af2f54fa8a68e0d9f9cda3f80a7d5913d350f@2jeodabdh22bndi2yljgatjtlj5dyuwek6kkxsop6zvugsk5f6vkr3id.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...
Sep 01 12:49:55.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:55.306 [P2P] debug: Connection attempt #3 to 02343e43afc1ca9d06f00f543009a34bc0f9be1962c284b959cb0f9fdb26515b95@zxu7lmdqwsudcsrj6klreephrowfbpdiushivrkj34vrso6cmfatctid.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...
Sep 01 12:49:56.000 [notice] Closed 1 streams for service [scrubbed].onion for reason resolve failed. Fetch status: No more HSDir available to query.
01/09/2020 12:49:56.206 [P2P] debug: Connection attempt #3 to 02d7348d32c1e97f38c582946e4430a924c7a274d03051865a7e6a7d542cce642b@vyf5w5wmjsupjsmp5ilgjv5hgw4dbis6yheezvwhqrgljoo3zohnvoqd.onion:18885 failed: Socks5 proxy rejected connection - HostUnreachable. retrying in 20 sec...

That wasn't the case for my own trials, that somehow there's another case with P2P which prevents shutdown, is there a way to reproduce easily?

start an older xud with tons of previously connected peers, best an old simnet xud, then immediately shutdown

Unfortunately I couldn't understand, don't know what you mean by "older xud" with current code base can you give me an environment that I can reproduce and see what's happening?

Sure, my simnet environment is in that state, I bet if you keep your environment online for some days and @raladev does his testing with tons of new simnet environments you'll get manya dead peers like me above and problems to shutdown too.

Oh gotcha, will keep my xud open :)

After a few days my xud failed to shutdown but there was no logs at all after gracefully shutdown log. Just the open ports, and seems like both connext & lnd connections failed to close. I'll try a few stuff.

I tried to kill docker containers when xud stuck on shutdown but it still failed to shutdown ๐Ÿค”

21/09/2020 09:08:10.203 [GLOBAL] info: XUD shutdown gracefully
21/09/2020 09:08:37.335 [CONNEXT] error: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
21/09/2020 09:08:37.335 [CONNEXT] error: failed to update total outbound capacity: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
21/09/2020 09:08:37.335 [CONNEXT] error: failed to update total outbound capacity: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
21/09/2020 09:08:37.335 [CONNEXT] error: failed to update total outbound capacity: Error: socket hang up
    at connResetException (internal/errors.js:570:14)
    at Socket.socketOnEnd (_http_client.js:440:23)
    at Socket.emit (events.js:215:7)
    at endReadableNT (_stream_readable.js:1183:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)

Just a quick question: was xud successfully shutting down before connext implementation, anyone remember?

That's really hard to tell. But why just not disable connext and see where that gets you?

Actually stucking on shutdown happens randomly so it's really hard to find out by disabling connext now, I checked ports as soon as I closed docker containers, there were no bound port, :/ I'll check how we integrate connext and what are the listeners that could cause such an issue on a spare time

Update This definitely can be related with connext, just now connext started to throw 500 and I tried to stop xud but it didn't gracefully shutdown so I killed my docker process and suddenly xud shutdown too with the following logs:

image

Update: With a very small test that uses eventemitter of nodejs as we do in the xud, I tried to send CTRL+C, and program shutdowns gracefully, so this cant be happening because of our internal listeners:

const { EventEmitter } = require('events');
const timerEventEmitter = new EventEmitter();
timerEventEmitter.emit("update");

let currentTime = 0;

// This will trigger the update event each passing second
setInterval(() => {
    currentTime++;
    timerEventEmitter.emit('update', currentTime);
}, 1000);

let listener = (time) => {
    console.log('Message Received from publisher');
    console.log(`${time} seconds passed since the program started`);
};

timerEventEmitter.on('update', listener);

setInterval(() => {
    timerEventEmitter.removeListener(listener);
}, 10000);

Output:

โžœ  test node test.js   
Message Received from publisher
1 seconds passed since the program started
Message Received from publisher
2 seconds passed since the program started
Message Received from publisher
3 seconds passed since the program started
^C
โžœ  test    

Since I do not have a single clean shutdown these days, could it be as simple as xud continuing wiht p2p connection attempts even though the shutdown procedure should prevent that? All my xud shutdown logs look like this: https://paste.ubuntu.com/p/VnrZPBXcGH/ @LePremierHomme @sangaman @erkarl @rsercano

Well after @sangaman's suggestion I checked once again and seems like there're multiple listeners still while closing, I'm creating a PR to remove them for server (p2p) and httpserver, let's discuss there.

Can you take over looking into this in parallel? @LePremierHomme

Closed by #1994.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

moshababo picture moshababo  ยท  6Comments

kilrau picture kilrau  ยท  6Comments

moshababo picture moshababo  ยท  3Comments

kilrau picture kilrau  ยท  6Comments

moshababo picture moshababo  ยท  7Comments