Parity-ethereum: PoA network partition upon change of contract-based ValidatorSet

Created on 6 Feb 2019  ·  18Comments  ·  Source: openethereum/parity-ethereum

  • Parity Ethereum version: 2.3.2 / 2.2.9
  • Operating system: Linux
  • Installation: Dockerized: parity/parity:{{version}}
  • Fully synchronized: yes
  • Network: custom PoA
  • Restarted: yes

We're trying to set up a PoA blockchain along the https://wiki.parity.io/Demo-PoA-tutorial with the following specifics:

// chain.json
"stepDuration": "5"
"transactionPermissionContractTransition": 60,
"transactionPermissionContract": <ADDR>,
// authority.toml
usd_per_tx = "0"
tx_queue_size = 32768
tx_queue_mem_limit = 8096
min_gas_price = 0
gas_floor_target = "20000000"
force_sealing = true

We start up the first node, deploy the provided contracts for validators (reporting) and permissions and scale up to 6 nodes with 1 authority and 5 non-authority nodes.

Then we change non-validators into validators step by step & slowly (30 blocks time in between). The network keeps being in sync, all nodes see each other as peers.

This is how turning the first non-authority node into an authority node looks like. Pay attention to the benign behaviour reporting, probably a race condition as the change of validator set might not be propagated fast enough / takes effect too quickly. In any case, the mining goes on:

2019-02-06 17:34:56 UTC Imported #1997 0x229d…9780 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:01 UTC Imported #1998 0x6518…4caa (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:06 UTC Imported #1999 0xc04d…9c5c (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:06 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:35:17 UTC    5/25 peers    724 KiB chain  396 KiB db  0 bytes queue  243 KiB sync  RPC:  0 conn,    4 req/s,  170 µs
2019-02-06 17:35:19 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:35:19 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd]
2019-02-06 17:35:19 UTC Applying validator set change signalled at block 2000
2019-02-06 17:35:19 UTC Imported #2000 0x7109…7375 (1 txs, 0.10 Mgas, 2994 ms, 0.71 KiB)
2019-02-06 17:35:21 UTC Transaction mined (hash 0xef8eb22e5b7213d2f4b6492adf1b5a0c2296ba44952e2a22896e3c8a9b746fde)
2019-02-06 17:35:21 UTC Imported #2001 0x8336…ca00 (5 txs, 0.14 Mgas, 8 ms, 1.41 KiB)
2019-02-06 17:35:26 UTC Imported #2002 0xbd3a…6490 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:31 UTC Imported #2003 0xfee6…813d (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:35:36 UTC Imported #2004 0x3204…0ef5 (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
````
Upon adding the fifth (or sixth or anyhow random?) validator node the network breaks apart with three types of different behaviour on the nodes:

### Partition 1: sync stalls on 2 nodes
Happens on latest added validator and initial validator (random?). They still have peers, but sync is broken and cannot be reccovered.

2019-02-06 17:54:21 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd, 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9, 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, 0xdd996d1d72a237d349b9901e03296ba104447379]
2019-02-06 17:54:21 UTC Imported #2229 0x4343…74f0 (1 txs, 0.10 Mgas, 5 ms, 0.71 KiB)
2019-02-06 17:54:26 UTC Imported #2230 0x9314…29ec (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:54:31 UTC Applying validator set change signalled at block 2229
2019-02-06 17:54:31 UTC Imported #2231 0x0655…2c92 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:32 UTC 5/25 peers 856 KiB chain 470 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 2 req/s, 184 µs
2019-02-06 17:54:36 UTC Imported #2232 0x9937…95e8 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:41 UTC Imported #2233 0x81ce…fc71 (1 txs, 0.03 Mgas, 1 ms, 0.74 KiB)
2019-02-06 17:54:45 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
vchain-testnet-health013 | "35.221.126.209:30303"
vchain-testnet-health013 | "35.203.79.193:30303"
vchain-testnet-health013 | "35.204.33.208:30303"
vchain-testnet-health013 | "35.204.33.208:30304"
vchain-testnet-health013 | "35.203.79.193:34996"
2019-02-06 17:54:54 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:54:54 UTC Imported #2234 0xa6ea…aeb1 (0 txs, 0.00 Mgas, 2180 ms, 0.57 KiB)
2019-02-06 17:55:02 UTC 5/25 peers 857 KiB chain 473 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 0 req/s, 184 µs
2019-02-06 17:55:15 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:55:22 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:55:23 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 17:55:24 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:55:26 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 17:55:32 UTC 5/25 peers 857 KiB chain 473 KiB db 12 KiB queue 243 KiB sync RPC: 0 conn, 0 req/s, 189 µs
2019-02-06 17:55:37 UTC Syncing #2234 0xa6ea…aeb1 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 5+ 0 Qed #2237 5/25 peers 857 KiB chain 473 KiB db 18 KiB queue 243 KiB sync RPC: 0 conn, 0 req/s, 189 µs
2019-02-06 17:55:42 UTC Syncing #2234 0xa6ea…aeb1 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 5+ 0 Qed #2237 5/25 peers 857 KiB chain 473 KiB db 18 KiB queue 243 KiB sync RPC: 0 conn, 2 req/s, 189 µs
2019-02-06 17:55:45 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:55:47 UTC Syncing #2234 0xa6ea…aeb1 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 6+ 0 Qed #2238 5/25 peers 857 KiB chain 473 KiB db 20 KiB queue 243 KiB sync RPC: 0 conn, 5 req/s, 189 µs
2019-02-06 17:55:52 UTC Syncing #2234 0xa6ea…aeb1 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 6+ 0 Qed #2238 5/25 peers 857 KiB chain 473 KiB db 20 KiB queue 243 KiB sync RPC: 0 conn, 2 req/s, 189 µs


### Partition 2: 3 nodes keep mining 
2 validators and the 1 remaining non-validator throw a dump, lose 1 peer but keep in sync:

2019-02-06 17:54:21 UTC Signal for transition within contract. New list: [0xf33cf66829fc4ccb0934f772c66f86115a76b8c2, 0xee1fd1869d3715bcb164966dcac9125d9fa499dd, 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9, 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, 0xdd996d1d72a237d349b9901e03296ba104447379]
2019-02-06 17:54:21 UTC Imported #2229 0x4343…74f0 (1 txs, 0.10 Mgas, 2 ms, 0.71 KiB) + another 1 block(s) containing 0 tx(s)
2019-02-06 17:54:23 UTC 5/25 peers 865 KiB chain 472 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 0 req/s, 182 µs
2019-02-06 17:54:26 UTC Imported #2230 0x9314…29ec (0 txs, 0.00 Mgas, 0 ms, 0.57 KiB)
2019-02-06 17:54:31 UTC Applying validator set change signalled at block 2229
2019-02-06 17:54:31 UTC Imported #2231 0x0655…2c92 (0 txs, 0.00 Mgas, 4 ms, 0.57 KiB)
2019-02-06 17:54:36 UTC Imported #2232 0x9937…95e8 (0 txs, 0.00 Mgas, 1 ms, 0.57 KiB)
2019-02-06 17:54:41 UTC Imported #2233 0x81ce…fc71 (1 txs, 0.03 Mgas, 1 ms, 0.74 KiB)
2019-02-06 17:54:49 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 17:54:52 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:54:52 UTC Imported #2234 0xa6ea…aeb1 (0 txs, 0.00 Mgas, 9 ms, 0.57 KiB)
2019-02-06 17:54:53 UTC 5/25 peers 867 KiB chain 475 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 0 req/s, 174 µs
2019-02-06 17:54:58 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 17:55:01 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 17:55:03 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 17:55:05 UTC Validator 0xff9e…2cf9 could not be reported Transaction import error: Transaction error (Already imported)
2019-02-06 17:55:05 UTC Stage 4 block verification failed for #2232 (0xca80…b9ac)
Error: Error(Engine(NotProposer(Mismatch { expected: 0x40f0706e72a18d974dc256bb66c4a75c6e6f4cf7, found: 0xff9ed5590167bbca40b52064679b7dd18f4c2cf9 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: Some(stack backtrace:
0: 0x55e0353281ad -
1: 0x55e035326b62 -
2: 0x55e035076381 -
3: 0x55e0350766b5 -
4: 0x55e034a2868f -
5: 0x55e034a32577 -
6: 0x55e0347c5fcd -
7: 0x55e03483434a -
8: 0x55e03431ba8e -
9: 0x55e034904ab1 -
10: 0x55e0348b02b8 -
11: 0x55e0348b25ca -
12: 0x55e03478ae6a -
13: 0x55e034a10a0d -
14: 0x55e0353b04e9 -
15: 0x55e0349c07be -
16: 0x55e03539a79d -
17: 0x55e035385355 -
18: 0x7f9dab39e6b9 -
19: 0x7f9daaebe41c -
20: 0x0 - ) } })
...
2019-02-06 18:37:22 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:37:22 UTC Imported #2438 0x3ca9…930a (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:37:38 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:37:39 UTC Reported benign validator misbehaviour 0xdd99…7379
2019-02-06 18:37:41 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:37:41 UTC Transaction culled (hash 0x6e6929706fcd7e329005224ab8ae4dd908c1878b8a24d74153a86627912bfd88)
2019-02-06 18:37:41 UTC Imported #2439 0x607b…b6f5 (3 txs, 0.09 Mgas, 2779 ms, 1.08 KiB)
2019-02-06 18:37:47 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:37:47 UTC Imported #2440 0x2a59…1ea3 (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:37:49 UTC 4/25 peers 973 KiB chain 685 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 4 req/s, 185 µs
2019-02-06 18:38:04 UTC Reported benign validator misbehaviour 0xdd99…7379
2019-02-06 18:38:05 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:38:05 UTC Transaction culled (hash 0x75558e5b9f8e1ae9173a4dcfd0f402aa52a30a305f002855fea0cb49d0d7992a)
2019-02-06 18:38:05 UTC Imported #2441 0x491e…75ea (3 txs, 0.09 Mgas, 2207 ms, 1.08 KiB)
2019-02-06 18:38:08 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:38:12 UTC Reported benign validator misbehaviour 0xff9e…2cf9
2019-02-06 18:38:12 UTC Imported #2442 0xba4b…30ee (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:38:19 UTC 4/25 peers 974 KiB chain 861 KiB db 0 bytes queue 243 KiB sync RPC: 0 conn, 4 req/s, 185 µs


### Partition 3: 1 node mins their own fork

2019-02-06 18:39:24 UTC Syncing #2365 0x3ee8…a11e 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 624 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 0 req/s, 175 µs
2019-02-06 18:39:28 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:39:32 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:39:33 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:39:33 UTC Syncing #2365 0x3ee8…a11e 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 624 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 7 req/s, 175 µs
2019-02-06 18:39:34 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:39:34 UTC Transaction culled (hash 0xd81e114ef144b57a8e51bb84bfd12efd12fb00f05d96793098dbfa7da62aecd8)
2019-02-06 18:39:34 UTC Imported #2366 0x7a26…994b (1 txs, 0.03 Mgas, 13 ms, 0.74 KiB)
2019-02-06 18:39:43 UTC Syncing #2366 0x7a26…994b 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 625 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 0 req/s, 175 µs
2019-02-06 18:39:52 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:39:53 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:39:53 UTC Syncing #2366 0x7a26…994b 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 625 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 7 req/s, 176 µs
2019-02-06 18:39:54 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:39:54 UTC Imported #2367 0xae37…ef18 (0 txs, 0.00 Mgas, 2 ms, 0.57 KiB)
2019-02-06 18:39:58 UTC Error serving connection: Error { kind: Parse(Header), cause: None }
2019-02-06 18:40:03 UTC Syncing #2367 0xae37…ef18 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 625 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 0 req/s, 176 µs
2019-02-06 18:40:12 UTC Reported benign validator misbehaviour 0x40f0…4cf7
2019-02-06 18:40:13 UTC Reported benign validator misbehaviour 0xf33c…b8c2
2019-02-06 18:40:13 UTC Syncing #2367 0xae37…ef18 0.00 blk/s 0.0 tx/s 0.0 Mgas/s 0+ 0 Qed #2232 3/25 peers 1 MiB chain 625 KiB db 0 bytes queue 66 KiB sync RPC: 0 conn, 7 req/s, 176 µs
2019-02-06 18:40:14 UTC Reported benign validator misbehaviour 0xee1f…99dd
2019-02-06 18:40:14 UTC Transaction culled (hash 0x460c8a8c58c7c11186d8906e8ae2bbdbc5ac93cd5005737e97de8b810c04772e)
2019-02-06 18:40:14 UTC Imported #2368 0xde8f…7880 (1 txs, 0.03 Mgas, 2 ms, 0.74 KiB)
```

F2-bug 🐞 M4-core ⛓ P5-sometimesoon 🌲

Most helpful comment

Pinging @varasev @afck in case this could have explained some of their problems at POA Networks.

All 18 comments

(cc @HCastano - lots of stuff to dig through)

I can reproduce the issue without a validator contract in place.

When switching the configuration at a future block

            "validators" : {
                "multi": {
                        "0": { "list": ["0x634Aa3713fc7A61061B1699E7bCcC819f2168B5f",
                                        "0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0xbEfa81Fa6832d8aCb3890113375423fe2f921BEE"] },
                        "101100": { "list": ["0x634Aa3713fc7A61061B1699E7bCcC819f2168B5f",
                                        "0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0xbEfa81Fa6832d8aCb3890113375423fe2f921BEE",
                                        "0x6737f7bbc98dc5558b425a238c00ef377e1bde45"]},
                        "101850": { "list": ["0xd03599FA9a60Eb59Ce59D510Cd2e90b7c2B47487",
                                        "0x6737f7bbc98dc5558b425a238c00ef377e1bde45",
                                        "0x93A75e6601E981628FF089f8eE86dEC3c5d247A1",
                                        "0xB1D772314abc1ef770AAb7ce4550C1B508Bc931A"]                                       
                                      }                                        
                }
            }

I get the same errors and the chain breaks into several partitions.

2019-02-13 14:02:39 UTC Stage 4 block verification failed for #101108 (0x1447…c7de)
node001 | Error: Error(Engine(NotProposer(Mismatch { expected: 0xbefa81fa6832d8acb3890113375423fe2f921bee, found: 0xd03599fa9a60eb59ce59d510cd2e90b7c2b47487 })), State { next_error: None, backtrace: InternalBacktrace { backtrace: Some(stack backtrace:
...

Had the issue both at block #101100 and #10185; strangely it was always the first validator node in the array that reported a mismatch.

cc @joshua-mir

@mathiasfrey Would you be able to share the code you're using to set up your network?

@mathiasfrey How come there's only node in your docker-compose.yml? I would've expected all six to be there

@HCastano , my nodes are spread across the globe. I checked out my code on different machines with public IPs as this represents the target setup.

In this specific setup I had 3 machines with two instances each distinguished by port (e.g. you can see the same IP but two different ports in the bootnodes):

  • Amsterdam :30303
  • Amsterdam :30304
  • Montreal :30303
  • Montreal :30304
  • Tokyo :30303
  • Tokyo :30304

I've managed to replicate this locally with two nodes running v2.3.2. In the chain spec I set up one node to be an authority from block 0, and the other to get added to the validator set at a future block. When that block was hit (if you were doing this with a contract you'd need to wait for the block to be finalized instead of simply reached) neither node appears to recognize the change in the validator set.

If I take the original authority offline, the "new" authority doesn't produce any blocks. If I then bring the original authority back online and restart the "new" authority, they fork from one another.

Hi @HCastano I have noticed that if I set
"immediateTransitions": true
when the block is hit, my validator recognizes the change.

Hey @danzipie! You shouldn't need that flag set to true for the validator set to update. I found a bug in the code path where it's not set earlier today, currently working on a fix :)

Although I should add, the bug I found is only for the case where you have a fixed validator set (e.g set in the chain spec). I need to do a big more digging to find out what's going on when a contract is being used.

I was out conferencing last week, will definitely make time to work on this this week

@mathiasfrey Hey, I haven't had any luck replicating this issue while using a contract. It is possible that your contract setup also had a static validator change at some point?

Could you also share more about the setup you were using with the contract, like the full chainspec, and the set of contracts you were using?

@HCastano you're correct. We always started with a static list and then switched over to a contract. One of the example chain specs attached.
chain.txt

@mathiasfrey Hey, I set up another test network with your chainspec and v2.4.3 of the client and I'm not seeing any issues with the contract based validator set. Have you seen this issue since I put in that fix a few weeks ago?

@HCastano we just tested the nightly build with a changing validator set and it always worked. We're going to check the change coming from a validator set to a smart contract validator set next week. Thanks!

Pinging @varasev @afck in case this could have explained some of their problems at POA Networks.

Fixed in 2.3.6 and 2.4.1

@DemiMarie-parity thanks for the ping!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

famfamfam picture famfamfam  ·  3Comments

jurijbajzelj picture jurijbajzelj  ·  3Comments

bryaan picture bryaan  ·  3Comments

danfinlay picture danfinlay  ·  3Comments

jacogr picture jacogr  ·  4Comments