Parity-ethereum: Secret Store Nodes eventually disconnects

Created on 30 Jan 2019  ·  10Comments  ·  Source: openethereum/parity-ethereum

_Before filing a new issue, please provide the following information._

  • Parity Ethereum version: v2.3.0-nightly
  • Operating system: Linux
  • Installation: built from source
  • Fully synchronized: yes
  • Network: PoA
  • Restarted: yes

_Your issue description goes here below. Try to include actual vs. expected behavior and steps to reproduce the issue._

Our setup consist in a PoA Network with 3 nodes that have the secret store API enabled, and also these nodes act as Authorities in the PoA. Also we have other clients connected to the Ethereum network. The problem we are observing is that everything work as expected, but after some time running (order of hours), we see some errors in the secret store logs and the encryption calls fail.
After restarting the parity process in the secret store nodes, the problem disappears for some hours. I have checked the status of the instance, and try to make the connections manually when the nodes are failing, and everything seems good and working.

These are the logs we can see in the secret store nodes:

Jan 29 07:21:49 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:49 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:21:59 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:59 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:06 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:06 UTC    7/25 peers      6 MiB chain   87 MiB db  0 bytes queue  235 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Jan 29 07:22:09 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:09 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:19 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:19 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:29 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:29 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:36 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:36 UTC    7/25 peers      6 MiB chain   87 MiB db  0 bytes queue  235 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Jan 29 07:22:39 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:39 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
...

And other node with the logger options -l jsonrpc,secretstore_net,privatetx=trace:

Jan 27 21:18:05 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:05 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Jan 27 21:18:07 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:07 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Jan 27 21:18:10 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:10 UTC Verifier #3 INFO import  Imported #50952 0x9aec…4c45 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 WARN secretstore  GenerateServerKey request /shadow/85c28c30810d4d00821f39e55513d935f6273a6fd620479db645a7e05db8afb2/1794541472c108396084922b527b114a563a08e6d46d71c1cbc85595f4b8f2f855bea0b69a936dfe7941657e09aeb632c3f0ba30e1f404143fdb405dca6e36bc01/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 INFO secretstore_net  0xb98e…fec0: generation session completed
Jan 27 21:18:15 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 WARN secretstore  GenerateServerKey request /shadow/3113f6b38f5c414f8e9eb9263b413c410051d34634a04ea28d3aba56b8c2f404/af40958ddbdb71abcb8b4e5a33a2bdd8c494edd81f3ee346dfec38e0d6fe0ecb0b6f829a7409966db7fa9edf236f9fc7e2c7a409d332a99c577ca23fd566aa7800/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 INFO secretstore_net  0xb98e…fec0: generation session completed

This is the configuration file we are using for the secret store nodes:

[parity]
chain = "/etc/parity/secretstore/chain.json"
base_path = "/parity_data"

[ui]
disable = true

[rpc]
disable = true
interface = "0.0.0.0"
cors = ["all"]
hosts = ["all"]
apis = ["web3", "eth", "net", "parity", "traces", "rpc", "personal", "parity_accounts", "signer", "parity_set"]

[ipc]
disable = true

[dapps]
disable = true

[websockets]
disable = true
port = 8546
interface = "local"
origins = ["none"]
apis = ["all"]
hosts = ["all"]

[mining]
engine_signer = "0xf4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
reseal_on_txs = "none" # Authorities reseal automatically
usd_per_tx = "0" # Allows for free transactions.

[network]
nat = "extip:<public_ip>"
port = 30303
bootnodes = [
  "enode://e90d77ad3065d0767abe08cdd47746c29fae578e8d1c06aacc30da720dd7fd8c8aa0e1e2ca2bb20c4739edff82e5e7405a382cf17af6a4366e2540f72e46e829@172.31.48.123:30303",
  "enode://204b11794d16f382bab24a7cdde5ec1273b8c6c2b23e64560d7ef1707708158a45300e920607aa413633e45c5eea472c17f474866ce5901515249abfbe9c4e1c@172.31.0.123:30303",
  "enode://6881fbbf8f2400c05c61b696af7bb2be04ab22ce64d5221906b9c1c67f2bf976b71cd1d98be7743dfc139a7b590fe230212195714e71f2204252caf68f6f94c0@172.31.16.123:30303"
]
discovery = true

[ipfs]
enable = false

[snapshots]
disable_periodic = true

[secretstore]
self_secret = "f4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
disable = false
disable_http = false      # This node will expose a Secret Store HTTP API
http_interface = "all"  # The HTTP API is available locally only
http_port = 12001          # The Secret Store HTTP API is available at this port
acl_contract = "6bd5fdc37b9c87ba73dda230e5dc18e9fda71ff9"
server_set_contract = "none"
nodes = [
  "7cb22f9a17e8880f8009b519b3506fa9d52fe0ff315f68d0dc0be9576f116818be5f3843c3e4e57aecea40bdd14c8539f5714a51d3254c2955d8a9f1ea9723dc@<public_ips>:12000",
  "b98ec1ea6bd4cb64ac2b73fdda87a4276d027c879237490589a475dae71ebfad7598c7355a5afe2f45f9f007b502563769699e722ed3eba0d1478adbf832fec0@<public_ips>:12000",
  "fc10afc336daf751ea236acaf8053cb709d5a55b1379db7aeb2a9fa76604d2d837ccda4d348bbc5a41fc8a270db8c9feae19d4f86757ffc9789da4e455dbb0ab@<public_ips>:12000"
]
interface = "all"
port = 12000               # port used to communicated with other Secret Store nodes
path = "/parity_data/secret_store"

[account]
...

If I can provide any further information please do not hesitate to ask.
Thanks!

F2-bug 🐞

All 10 comments

Hey!

According to this log line:

Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab

, it is the node 0xfc10…b0ab who has lost its connection to the 0x7cb2…23dc. So it'd be good to see logs from these nodes, if this is possible (you've attached the log from 0xb98e…fec0, which well-connected to both nodes).

Could you also please specify - which git commit are you using for your tests?

Also - if you leave nodes for some time after disconnect, are they reconnecting again? Does (at least) pinging works during disconnect period?

Hi @svyatonik!, thanks for the answer.

Logs for node 0xb98e…fec0:

Feb 04 08:45:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 WARN secretstore  GenerateServerKey request /shadow/c7d9abcb6c9bca27c91c98dee17ca59de350b891f6a332c4e10a6b2b25d47052/fbe03244a10cf33cb1f8d0a59f0d9002586adc87d94b177d4dd3c7a0ba7d5b471c9a40001782d0ac861c917c94645872a73a780ddda254f0716fbf5d5e496dd001/1 has failed with: node required for this operation is currently disconnected
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 INFO secretstore_net  0xb98e…fec0: generation session completed
Feb 04 08:45:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:25 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:35 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:36 UTC IO Worker #0 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC Verifier #0 INFO import  Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:46:05 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:05 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:06 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:06 UTC IO Worker #1 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:15 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0x7cb2…23dc
Feb 04 08:46:26 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:26 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Feb 04 08:46:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:35 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 08:46:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:36 UTC IO Worker #2 INFO import     5/25 peers      2 MiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:40 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:40 UTC Verifier #2 INFO import  Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)

Logs for node 0xfc10…b0ab:

Feb 04 08:45:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:18 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:23 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:23 UTC 0xfc10…b0ab: generation session read error 'node required for this operation is currently disconnected' when requested for session from node 0xb98e…fec0
Feb 04 08:45:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:45 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:46 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:46 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:48 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:48 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:56 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:56 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:18 UTC    5/25 peers      3 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:40 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:40 UTC Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)

Logs for node 0x7cb2…23dc:

Feb 04 08:45:05 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:05 UTC Imported #67734 0x389a…32cf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:21 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:45:45 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:51 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:51 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:21 UTC    5/25 peers      2 MiB chain   88 MiB db  0 bytes queue  227 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 08:46:25 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:25 UTC 0x7cb2…23dc: generation session failed with timeout

Thanks!

@jcortejoso Could you, please, share the whole (since restart) log of node 0xfc10…b0ab (the node who gets a lot of network error 'early eof' when reading message from node 0x7cb2…23dc errors)? I'd like to see - what has caused this in first place.

Hi @svyatonik. I attach the logs file. Now the nodes that seems to disconnect is the 0xb98e…fec0. I think these are the more relevant lines:

Feb 04 11:21:15 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:15 UTC IO Worker #3 INFO import     5/25 peers    176 KiB chain   89 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 04 11:21:24 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:24 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 11:21:25 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:25 UTC Verifier #1 INFO import  Imported #67960 0xcda4…b8bf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: executing maintain procedures
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN secretstore_net  0xb98e…fec0: keep alive timeout for node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net  0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:40230
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN ethcore_secretstore::key_server_cluster::client_sessions::generation_session  0xb98e…fec0: generation session failed because 0x7cb2…23dc connection has timeouted
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore  GenerateServerKey request /shadow/7f00dfe7756149c492bcd603a8d1cc8c25e77f3d5cc14464a106deea545a2653/7022c89dfd2fbe82bdcc148b0e31ca6d51f83493ee546b9efcc6e372a54c3c814384bf7cc12486f212f252f41d13f9b4a55fcc8c0d0436df548fef645ab4b00101/0 has fail
ed with: node required for this operation is currently disconnected
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Generation.PublicKeyShare from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore_net  0xb98e…fec0: generation session read error 'no active session with given id' when requested for session from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net  0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: inserting connection to 0x7cb2…23dc at 18.210.120.241:12000. Connected to 2 of 2 nodes
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 WARN secretstore_net  0xb98e…fec0: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net  0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:12000

Thanks!
log_0x7cb2…23dc.txt
log_0xb98e…fec0.txt
log_0xfc10…b0ab.txt

Thanks, @jcortejoso - I think that's enough info for now. Definitely seems like issues are in SS code: (1) that node gets disconnected because of timeout (though seems like KeepAliveResponse messages are queued && processed right after disconnect) and (2) that node is unable to recover from disconnect issue (i.e. reconnect).

Seems like I found a reason - the SS' HTTP listener is currently written in a way that it won't work well with runtime shared with other components (i.e. it blocks the entire runtime thread until request is served). And since runtime is now (since #9888 - my bad :/) shared between ethereum sync + SS HTTP listener + SS network => some messages required for session completion could not be processed until session is completed (you could see it in logs - just after session is completed with timeout error, queued messages are processed).

@jcortejoso Could. you please try if everything works with this commit 1fdfa1e6c6833ec18e171e67e5925e17a243a21b?

Hi @svyatonik. We have running our SS cluster built from that commit since 2 hours ago. So far so good. I will update if there is any issue or if everything works as expected.

Thanks!

We have been testing our POA more intensively these days (running from commit https://github.com/paritytech/parity-ethereum/commit/1fdfa1e6c6833ec18e171e67e5925e17a243a21b) and I have the feeling that Secret Store reconnects correctly but Ethereum network does not. This is an example of the logs:

Feb 22 00:42:00 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:00 UTC Imported #87438 0x308c…cc5e (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:42:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:29 UTC    5/25 peers      5 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:42:59 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:59 UTC    5/25 peers      5 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:43:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:25 UTC Imported #87439 0xcbe5…52ca (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:43:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:29 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:44:04 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:04 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:44:15 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:15 UTC Imported #87440 0xad07…dbff (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:44:39 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:39 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:45:14 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:14 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 00:45:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:25 UTC Imported #87441 0x5886…7813 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:45:49 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:49 UTC    5/25 peers      6 MiB chain   88 MiB db  0 bytes queue   11 KiB sync  RPC:  0 conn,    0 req/s,    0 µs
Feb 22 01:00:06 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:06 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:07 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:07 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:17 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:17 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:18 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:18 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:29 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:30 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:30 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:44 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:44 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:45 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:45 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:01:35 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:35 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:01:36 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:36 UTC 0xfc10…b0ab: encryption session completed
...

Maybe because it is an old commit there is some already solved bug that is causing the issue. We will test now from branch https://github.com/paritytech/parity-ethereum/tree/fix_ss_blocking_wait

Hey, @jcortejoso ! There could have been fixes for PoA sync, for sure, since that commit. Could you please share results of fix_ss_blocking_wait testing (if they're available already)?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x7CFE picture 0x7CFE  ·  3Comments

dukei picture dukei  ·  3Comments

bryaan picture bryaan  ·  3Comments

vmenond picture vmenond  ·  3Comments

m-thomson picture m-thomson  ·  3Comments