_Before filing a new issue, please provide the following information._
_Your issue description goes here below. Try to include actual vs. expected behavior and steps to reproduce the issue._
Our setup consist in a PoA Network with 3 nodes that have the secret store API enabled, and also these nodes act as Authorities in the PoA. Also we have other clients connected to the Ethereum network. The problem we are observing is that everything work as expected, but after some time running (order of hours), we see some errors in the secret store logs and the encryption calls fail.
After restarting the parity process in the secret store nodes, the problem disappears for some hours. I have checked the status of the instance, and try to make the connections manually when the nodes are failing, and everything seems good and working.
These are the logs we can see in the secret store nodes:
Jan 29 07:21:49 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:49 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:21:59 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:21:59 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:06 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:06 UTC 7/25 peers 6 MiB chain 87 MiB db 0 bytes queue 235 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Jan 29 07:22:09 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:09 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:19 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:19 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:29 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:29 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
Jan 29 07:22:36 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:36 UTC 7/25 peers 6 MiB chain 87 MiB db 0 bytes queue 235 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Jan 29 07:22:39 ip-172-31-16-123.ec2.internal docker[20371]: 2019-01-29 07:22:39 UTC 0xfc10…b0ab: timeout when establishing outbound connection with <node_ip>:12000
...
And other node with the logger options -l jsonrpc,secretstore_net,privatetx=trace:
Jan 27 21:18:05 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:05 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Jan 27 21:18:07 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:07 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Jan 27 21:18:10 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:10 UTC Verifier #3 INFO import Imported #50952 0x9aec…4c45 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-3 WARN secretstore GenerateServerKey request /shadow/85c28c30810d4d00821f39e55513d935f6273a6fd620479db645a7e05db8afb2/1794541472c108396084922b527b114a563a08e6d46d71c1cbc85595f4b8f2f855bea0b69a936dfe7941657e09aeb632c3f0ba30e1f404143fdb405dca6e36bc01/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 INFO secretstore_net 0xb98e…fec0: generation session completed
Jan 27 21:18:15 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:15 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-2 WARN secretstore GenerateServerKey request /shadow/3113f6b38f5c414f8e9eb9263b413c410051d34634a04ea28d3aba56b8c2f404/af40958ddbdb71abcb8b4e5a33a2bdd8c494edd81f3ee346dfec38e0d6fe0ecb0b6f829a7409966db7fa9edf236f9fc7e2c7a409d332a99c577ca23fd566aa7800/0 has failed with: node required for this operation is currently disconnected
Jan 27 21:18:17 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:17 UTC tokio-runtime-worker-0 INFO secretstore_net 0xb98e…fec0: generation session completed
This is the configuration file we are using for the secret store nodes:
[parity]
chain = "/etc/parity/secretstore/chain.json"
base_path = "/parity_data"
[ui]
disable = true
[rpc]
disable = true
interface = "0.0.0.0"
cors = ["all"]
hosts = ["all"]
apis = ["web3", "eth", "net", "parity", "traces", "rpc", "personal", "parity_accounts", "signer", "parity_set"]
[ipc]
disable = true
[dapps]
disable = true
[websockets]
disable = true
port = 8546
interface = "local"
origins = ["none"]
apis = ["all"]
hosts = ["all"]
[mining]
engine_signer = "0xf4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
reseal_on_txs = "none" # Authorities reseal automatically
usd_per_tx = "0" # Allows for free transactions.
[network]
nat = "extip:<public_ip>"
port = 30303
bootnodes = [
"enode://e90d77ad3065d0767abe08cdd47746c29fae578e8d1c06aacc30da720dd7fd8c8aa0e1e2ca2bb20c4739edff82e5e7405a382cf17af6a4366e2540f72e46e829@172.31.48.123:30303",
"enode://204b11794d16f382bab24a7cdde5ec1273b8c6c2b23e64560d7ef1707708158a45300e920607aa413633e45c5eea472c17f474866ce5901515249abfbe9c4e1c@172.31.0.123:30303",
"enode://6881fbbf8f2400c05c61b696af7bb2be04ab22ce64d5221906b9c1c67f2bf976b71cd1d98be7743dfc139a7b590fe230212195714e71f2204252caf68f6f94c0@172.31.16.123:30303"
]
discovery = true
[ipfs]
enable = false
[snapshots]
disable_periodic = true
[secretstore]
self_secret = "f4a2dbd5d67ef3019a7e4b188b92128c86d76bf9"
disable = false
disable_http = false # This node will expose a Secret Store HTTP API
http_interface = "all" # The HTTP API is available locally only
http_port = 12001 # The Secret Store HTTP API is available at this port
acl_contract = "6bd5fdc37b9c87ba73dda230e5dc18e9fda71ff9"
server_set_contract = "none"
nodes = [
"7cb22f9a17e8880f8009b519b3506fa9d52fe0ff315f68d0dc0be9576f116818be5f3843c3e4e57aecea40bdd14c8539f5714a51d3254c2955d8a9f1ea9723dc@<public_ips>:12000",
"b98ec1ea6bd4cb64ac2b73fdda87a4276d027c879237490589a475dae71ebfad7598c7355a5afe2f45f9f007b502563769699e722ed3eba0d1478adbf832fec0@<public_ips>:12000",
"fc10afc336daf751ea236acaf8053cb709d5a55b1379db7aeb2a9fa76604d2d837ccda4d348bbc5a41fc8a270db8c9feae19d4f86757ffc9789da4e455dbb0ab@<public_ips>:12000"
]
interface = "all"
port = 12000 # port used to communicated with other Secret Store nodes
path = "/parity_data/secret_store"
[account]
...
If I can provide any further information please do not hesitate to ask.
Thanks!
Hey!
According to this log line:
Jan 27 21:18:11 ip-172-31-0-123.ec2.internal docker[13073]: 2019-01-27 21:18:11 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
, it is the node 0xfc10…b0ab who has lost its connection to the 0x7cb2…23dc. So it'd be good to see logs from these nodes, if this is possible (you've attached the log from 0xb98e…fec0, which well-connected to both nodes).
Could you also please specify - which git commit are you using for your tests?
Also - if you leave nodes for some time after disconnect, are they reconnecting again? Does (at least) pinging works during disconnect period?
Hi @svyatonik!, thanks for the answer.
The commit is https://github.com/paritytech/parity-ethereum/commit/4fec2f2fc26c0daf95f4d91cbbf55eeca74888fe
About the connection, I would say internal pings works (as seen in the logs of 0xb98e…fec0). Only had the logger configured for this node, but I have configured in the same level for the other two.
After restarting the parity process in the three nodes, everything started working again.
Logs for node 0xb98e…fec0:
Feb 04 08:45:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:15 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: received message Generation.ConfirmInitialization from 0x7cb2…23dc
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: sent message Generation.InitializeSession to 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: received message Generation.SessionError(node required for this operation is currently disconnected) from 0xfc10…b0ab
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-0 WARN secretstore GenerateServerKey request /shadow/c7d9abcb6c9bca27c91c98dee17ca59de350b891f6a332c4e10a6b2b25d47052/fbe03244a10cf33cb1f8d0a59f0d9002586adc87d94b177d4dd3c7a0ba7d5b471c9a40001782d0ac861c917c94645872a73a780ddda254f0716fbf5d5e496dd001/1 has failed with: node required for this operation is currently disconnected
Feb 04 08:45:23 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:23 UTC tokio-runtime-worker-1 INFO secretstore_net 0xb98e…fec0: generation session completed
Feb 04 08:45:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:25 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:45:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:35 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:45:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:36 UTC IO Worker #0 INFO import 5/25 peers 2 MiB chain 89 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC Verifier #0 INFO import Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:45 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:45 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 08:45:55 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:45:55 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0xfc10…b0ab
Feb 04 08:46:05 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:05 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:46:06 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:06 UTC IO Worker #1 INFO import 5/25 peers 2 MiB chain 89 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:46:15 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:15 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:46:25 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:25 UTC tokio-runtime-worker-0 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAlive from 0x7cb2…23dc
Feb 04 08:46:26 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:26 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAlive from 0xfc10…b0ab
Feb 04 08:46:35 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:35 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 08:46:36 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:36 UTC IO Worker #2 INFO import 5/25 peers 2 MiB chain 89 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:46:40 ip-172-31-0-123.ec2.internal docker[2288]: 2019-02-04 08:46:40 UTC Verifier #2 INFO import Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Logs for node 0xfc10…b0ab:
Feb 04 08:45:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:18 UTC 5/25 peers 3 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:45:23 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:23 UTC 0xfc10…b0ab: generation session read error 'node required for this operation is currently disconnected' when requested for session from node 0xb98e…fec0
Feb 04 08:45:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:45 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:46 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:46 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:45:48 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:48 UTC 5/25 peers 3 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:45:56 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:45:56 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:06 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:06 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:16 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:16 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:18 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:18 UTC 5/25 peers 3 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:46:26 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:26 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:36 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:36 UTC 0xfc10…b0ab: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 08:46:40 ip-172-31-16-123.ec2.internal docker[1110]: 2019-02-04 08:46:40 UTC Imported #67736 0x2d8d…c111 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Logs for node 0x7cb2…23dc:
Feb 04 08:45:05 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:05 UTC Imported #67734 0x389a…32cf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:21 UTC 5/25 peers 2 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:45:45 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:45 UTC Imported #67735 0xccb2…83a2 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 08:45:51 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:45:51 UTC 5/25 peers 2 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:46:21 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:21 UTC 5/25 peers 2 MiB chain 88 MiB db 0 bytes queue 227 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 08:46:25 ip-172-31-48-123.ec2.internal docker[1786]: 2019-02-04 08:46:25 UTC 0x7cb2…23dc: generation session failed with timeout
Thanks!
@jcortejoso Could you, please, share the whole (since restart) log of node 0xfc10…b0ab (the node who gets a lot of network error 'early eof' when reading message from node 0x7cb2…23dc errors)? I'd like to see - what has caused this in first place.
Hi @svyatonik. I attach the logs file. Now the nodes that seems to disconnect is the 0xb98e…fec0. I think these are the more relevant lines:
Feb 04 11:21:15 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:15 UTC IO Worker #3 INFO import 5/25 peers 176 KiB chain 89 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 04 11:21:24 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:24 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 11:21:25 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:25 UTC Verifier #1 INFO import Imported #67960 0xcda4…b8bf (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: executing maintain procedures
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN secretstore_net 0xb98e…fec0: keep alive timeout for node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 TRACE secretstore_net 0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:40230
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-3 WARN ethcore_secretstore::key_server_cluster::client_sessions::generation_session 0xb98e…fec0: generation session failed because 0x7cb2…23dc connection has timeouted
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore GenerateServerKey request /shadow/7f00dfe7756149c492bcd603a8d1cc8c25e77f3d5cc14464a106deea545a2653/7022c89dfd2fbe82bdcc148b0e31ca6d51f83493ee546b9efcc6e372a54c3c814384bf7cc12486f212f252f41d13f9b4a55fcc8c0d0436df548fef645ab4b00101/0 has fail
ed with: node required for this operation is currently disconnected
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Generation.PublicKeyShare from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 WARN secretstore_net 0xb98e…fec0: generation session read error 'no active session with given id' when requested for session from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-2 TRACE secretstore_net 0xb98e…fec0: received message Cluster.KeepAliveResponse from 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: inserting connection to 0x7cb2…23dc at 18.210.120.241:12000. Connected to 2 of 2 nodes
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 WARN secretstore_net 0xb98e…fec0: network error 'early eof' when reading message from node 0x7cb2…23dc
Feb 04 11:21:34 ip-172-31-0-123.ec2.internal docker[9403]: 2019-02-04 11:21:34 UTC tokio-runtime-worker-1 TRACE secretstore_net 0xb98e…fec0: removing connection to 0x7cb2…23dc at 18.210.120.241:12000
Thanks!
log_0x7cb2…23dc.txt
log_0xb98e…fec0.txt
log_0xfc10…b0ab.txt
Thanks, @jcortejoso - I think that's enough info for now. Definitely seems like issues are in SS code: (1) that node gets disconnected because of timeout (though seems like KeepAliveResponse messages are queued && processed right after disconnect) and (2) that node is unable to recover from disconnect issue (i.e. reconnect).
Seems like I found a reason - the SS' HTTP listener is currently written in a way that it won't work well with runtime shared with other components (i.e. it blocks the entire runtime thread until request is served). And since runtime is now (since #9888 - my bad :/) shared between ethereum sync + SS HTTP listener + SS network => some messages required for session completion could not be processed until session is completed (you could see it in logs - just after session is completed with timeout error, queued messages are processed).
@jcortejoso Could. you please try if everything works with this commit 1fdfa1e6c6833ec18e171e67e5925e17a243a21b?
Hi @svyatonik. We have running our SS cluster built from that commit since 2 hours ago. So far so good. I will update if there is any issue or if everything works as expected.
Thanks!
We have been testing our POA more intensively these days (running from commit https://github.com/paritytech/parity-ethereum/commit/1fdfa1e6c6833ec18e171e67e5925e17a243a21b) and I have the feeling that Secret Store reconnects correctly but Ethereum network does not. This is an example of the logs:
Feb 22 00:42:00 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:00 UTC Imported #87438 0x308c…cc5e (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:42:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:29 UTC 5/25 peers 5 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:42:59 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:42:59 UTC 5/25 peers 5 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:43:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:25 UTC Imported #87439 0xcbe5…52ca (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:43:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:43:29 UTC 5/25 peers 6 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:44:04 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:04 UTC 5/25 peers 6 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:44:15 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:15 UTC Imported #87440 0xad07…dbff (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:44:39 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:44:39 UTC 5/25 peers 6 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:45:14 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:14 UTC 5/25 peers 6 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 00:45:25 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:25 UTC Imported #87441 0x5886…7813 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
Feb 22 00:45:49 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 00:45:49 UTC 5/25 peers 6 MiB chain 88 MiB db 0 bytes queue 11 KiB sync RPC: 0 conn, 0 req/s, 0 µs
Feb 22 01:00:06 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:06 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:07 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:07 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:17 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:17 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:18 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:18 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:29 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:29 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:30 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:30 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:00:44 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:44 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:00:45 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:00:45 UTC 0xfc10…b0ab: encryption session completed
Feb 22 01:01:35 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:35 UTC 0xfc10…b0ab: generation session completed
Feb 22 01:01:36 ip-172-31-16-123.ec2.internal docker[28103]: 2019-02-22 01:01:36 UTC 0xfc10…b0ab: encryption session completed
...
Maybe because it is an old commit there is some already solved bug that is causing the issue. We will test now from branch https://github.com/paritytech/parity-ethereum/tree/fix_ss_blocking_wait
Hey, @jcortejoso ! There could have been fixes for PoA sync, for sure, since that commit. Could you please share results of fix_ss_blocking_wait testing (if they're available already)?