Parity-ethereum: Rpc with --no-ancient-blocks fails with truffle-based deployment on 2.7.2+

Created on 21 Apr 2020  Â·  29Comments  Â·  Source: openethereum/parity-ethereum

After upgrade truffle seems is not able to detect mined transactions and migration stuck

  • parity: 2.7.2 and OpenEthereum: v3.0.0-alpha.1-nightly-d4b5720-20200401
  • Operating system: Linux, Ubuntu 18
  • Installation: binary install
  • Fully synchronized: no, gateway mode
  • Network: custom PoA
  • Restarted: yes

Expected behavior (works fine with parity: 2.5.13)

node_modules/.bin/truffle migrate --network test
You can improve web3's peformance when running Node.js versions older than 10.5.0 by installing the (deprecated) scrypt package in your project

Compiling your contracts...
===========================
> Everything is up to date, there is nothing to compile.

Starting migrations...
======================
> Network name:    'test'
> Network id:      43
> Block gas limit: 67108864 (0x4000000)

1_initial_migration.js
======================

   Deploying 'Migrations'
   ----------------------
   > transaction hash:    0x81630621c7a822e21c345abeb97c1fb5b6526f1f6e04b1b1662dfb2f2aec1dc3
   > Blocks: 2            Seconds: 4
   > contract address:    0x73E65c8B69B13564b96F5033A8a6C57052F042dA
   > block number:        9054936
   > block timestamp:     1587489850
   > account:             0x5C93042C9f3a18059C19Fb253376911AAA984C1F
   > balance:             0
   > gas used:            263677 (0x405fd)
   > gas price:           0 gwei
   > value sent:          0 ETH
   > total cost:          0 ETH


   > Saving migration to chain.
   > Saving artifacts
   -------------------------------------
   > Total cost:                   0 ETH
...
all migrated fine

After upgrade truffle seems is not able to detect mined transactions and migration stuck

parity: 2.7.2 and OpenEthereum: v3.0.0-alpha.1-nightly-d4b5720-20200401

node_modules/.bin/truffle migrate --network test
You can improve web3's peformance when running Node.js versions older than 10.5.0 by installing the (deprecated) scrypt package in your project

Compiling your contracts...
===========================
> Everything is up to date, there is nothing to compile.

Starting migrations...
======================
> Network name:    'test'
> Network id:      43
> Block gas limit: 67108864 (0x4000000)

1_initial_migration.js
======================

   Deploying 'Migrations'
   ----------------------
   > transaction hash:    0x5b6526f1f6e04b1b1662dfb2f2aec1dc381630621c7a822e21c345abeb97c1fb
   > Blocks: <INFINITY>            Seconds: <INFINITY>
...
stuck above

Command line:

 parity --base-path=./node --chain=./config/chain.json --config=./config/gateway.toml --bootnodes="<>" --no-ancient-blocks --unlock=5c93...4c1f --password=./node/password.txt

Config:

[parity]
chain = "./config/chain.json"
mode = "active"
auto_update_delay = 1000
auto_update_check_frequency = 1000
release_track = "stable"

[network]
port = 30303
discovery = true
allow_ips = "all"
reserved_only = false

[rpc]
disable = false
port = 8501
interface = "all"
cors = ["all"]
apis = ["web3", "eth", "pubsub", "net", "parity", "parity_set", "parity_pubsub", "rpc", "personal"]
hosts = ["all"]

[websockets]
disable = false
port = 8502
interface = "all"
apis = ["web3", "eth", "pubsub", "net", "parity", "parity_set", "parity_pubsub", "rpc"]
hosts = ["all"]

[ipc]
disable = false
apis = ["web3", "eth", "pubsub", "net", "parity", "parity_set", "parity_pubsub", "rpc"]

[dapps]
disable = true

[secretstore]
disable = true

[mining]
force_sealing = false
reseal_on_txs = "all"
reseal_min_period = 700
reseal_max_period = 900
work_queue_size = 1024
relay_set = "lenient"
usd_per_tx = "0"
usd_per_eth = "0"
price_update_period = "hourly"
gas_floor_target = "0x4000000"
gas_cap = "0"
tx_queue_size = 16384
tx_queue_per_sender = 4096
tx_queue_mem_limit = 0
tx_gas_limit = "0x4000000"
tx_time_limit = 500

[footprint]
tracing = "auto"
pruning = "auto"
pruning_history = 128
pruning_memory = 64
cache_size_db = 1024
cache_size_blocks = 64
cache_size_queue = 256
cache_size_state = 256
db_compaction = "ssd"
fat_db = "auto"
scale_verifiers = true
num_verifiers = 2

[snapshots]
disable_periodic = true

chain:
PoA network with 5 validators and following params:

  "params": {
    "networkID": "0x2b",
    "maximumExtraDataSize": "0x20",
    "minGasLimit": "0x1388",
    "gasLimitBoundDivisor": "0x400",
    "wasmActivationTransition": 0,
    "maxTransactionSize": "0x4b000",
    "eip150Transition": 0,
    "eip160Transition": 0,
    "eip161abcTransition": 0,
    "eip161dTransition": 0,
    "eip98Transition": 0,
    "eip658Transition": 0,
    "eip155Transition": 0,
    "validateReceiptsTransition": 0,
    "validateChainIdTransition": 0,
    "eip140Transition": 0,
    "eip211Transition": 0,
    "eip214Transition": 0,
    "eip145Transition": 0,
    "eip1014Transition": 0,
    "eip1052Transition": 0,
    "maxCodeSizeTransition": 0,
    "maxCodeSize": "0x6000"
  },
F3-annoyance 💩

All 29 comments

@illya-havsiyevych, I checked and is working for me for 2.5.13, 2.7.2 and 3.0.0-alpha with the same configuration, please take a look to https://github.com/adria0/oe_stuff/tree/master/issue_11645, first start the chain with oe/start.sh (need to put the node path there) and run the migration with truffle/start.sh

thanks for your effort, for sure will check till end of the day
might push some changes to your config

  • modified configs to be a bit closer to our env, https://github.com/adria0/oe_stuff/pull/1
  • still not able to reproduce the issue on a new network, but probably will be able to do it after snapshot got created
  • stay tuned

ok, more update - we have the issue when a node started to respond the following way
i.e. "Looks like you disabled ancient block download, unfortunately the information you're trying to fetch doesn't exist in the db and is probably in the ancient blocks."

   â ¹ Blocks: 0            Seconds: 0   > {
   >   "jsonrpc": "2.0",
   >   "id": 18,
   >   "method": "eth_getTransactionReceipt",
   >   "params": [
   >     "0x742889b32908fe1c89ba30084bbc8001be2c9f01c06289c5f5b8dc84312805c4"
   >   ]
   > }
 <   {
 <     "jsonrpc": "2.0",
 <     "error": {
 <       "code": -32000,
 <       "message": "Looks like you disabled ancient block download, unfortunately the information you're trying to fetch doesn't exist in the db and is probably in the ancient blocks."
 <     },
 <     "id": 18
 <   }

still don't know how long would it take to reproduce on this test env
probably once warp sync would take place

@adria0
The issue could be reproduced now.
Plz check the pull.
You need node data for 35k+ blocks.

> ./clean.up   - unzip nodes data
> ./start.sh   - start 3 PoA nodes in backgound

wait nodes are in sync

./gw.sh         - start `gw-node` in foreground

and in other console - try truffle migration

Ok, yes, there is a regression there in 2.5.13 to 2.7.2

--warp-barrier is only needed to reproduce the issue faster. In usual life warp is always On and we just may have an issue sooner or later

@illya-havsiyevych, so it seems that the problem is with --no-ancient-blocks, then?

yes, so the issue is in following

given

  • a network with warp sync and periodical snapshots enabled

when

  • some node started / restarted and snapshot-based warp sync take place

and

  • --no-ancient-blocks param used

and

  • version is 2.7.2+

then

  • we have an issue

yes, so the issue is in following

given

* a network with warp sync and periodical snapshots enabled

when

* some node started / restarted and snapshot-based warp sync take place

and

* `--no-ancient-blocks` param used

and

* ```
  version is `2.7.2+`
  ```

then

* we have an issue

@dvdplm, any idea

Ok, I checked the logs for RPC in gateway and transaction is sent and mined sucessfully.

Truffle tries to retrieve the transaction receipt before is mined and the following error is returned:

2020-05-05 11:33:28 UTC http.worker30 DEBUG rpc  Response: {"jsonrpc":"2.0","error":{"code":-32000,"message":"Looks like you disabled ancient block download, unfortunately the information you're trying to fetch doesn't exist in the db and is probably in the ancient blocks."},"id":17}.

it tries twice and stops trying it again. I checked this error message and was introduced in 2.6 https://github.com/openethereum/openethereum/pull/10608

Ok, from quick check it looks like it DID help

But can we check the code logic itself. So I'm here https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/impls/eth.rs#L836
with transaction receipt and by some reason checking .and_then(errors::check_block_gap(&*self.client, self.options));
without passing a receipt
https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/impls/eth.rs#L836
to only check block_gap. Then you need to help me with rust magic as I'm freshman here - how did we get response ?

    move |response| {
        if response.is_none() {

https://github.com/openethereum/openethereum/blob/d778558dcccf25cd6a0654f16c73bde7758b0c32/rpc/src/v1/helpers/errors.rs#L257

or in master
https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/helpers/errors.rs#L290

So I guess, that using of check_block_gap for validating a transaction_receipt's response https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/impls/eth.rs#L826
might be invalid

... also we (and others) are using not only rpc. We do use ws and ipc. But config option allow_missing_blocks is available only for rpc

  move |response| {
      if response.is_none() {

check_block_gap is a high order generic function, so a function that returns a function parametrized. Afais, in this case the variable response if of type T

   > {
   >   "jsonrpc": "2.0",
   >   "id": 63,
   >   "method": "eth_getTransactionReceipt",
   >   "params": [
   >     "0x158c9d7827ecfd10f243d35c1a2186a1eee1ebed5b973b81910d06573c8d09a8"
   >   ]
   > }
 <   {
 <     "jsonrpc": "2.0",
 <     "result": {
 <       "blockHash": "0x25be28b3c382ffeb146e5e7589856169fac6c94d77b06a8165ad10b717b9ef9f",
 <       "blockNumber": "0x9077",
 <       "contractAddress": "0xd9096d2473506e7aa0686d3b95dc9ab33e684bc6",
 <       "cumulativeGasUsed": "0x2e043",
 <       "from": "0xcfa3ae1840e38d1e54b0ef6300d6e91b22964a75",
 <       "gasUsed": "0x2e043",
 <       "logs": [],
 <       "logsBloom": "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
 <       "status": "0x1",
 <       "to": null,
 <       "transactionHash": "0x74293cb611600f3361bd7ec43db6d1acb8a27b76ae1db1bc6e282f18d366250c",
 <       "transactionIndex": "0x0"
 <     },
 <     "id": 60
 <   }

Here is a log (if I forced allow_missing_blocks = true).

I'm assuming some compatible presentation of transaction_receipt is used inside check_block_gap as response at https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/helpers/errors.rs#L290

So why if response.is_none() is true ?
Is it really transaction_receipt ?

... and again if response is expected to be the block response - then the usage of check_block_gap is incorrect in this case and has to be removed from https://github.com/openethereum/openethereum/blob/master/rpc/src/v1/impls/eth.rs#L826

Here is a log (if I forced allow_missing_blocks = true).

How do you did it? via configuration parameters or modifying the code? Truffle deployment works now? Using IPC/ws?

PS. any reason to close / reopen ?

sec, will send a pull

PS. any reason to close / reopen ?

Touchpad problems under Sway, mainly :sweat_smile:

@adria0 Plz check https://github.com/adria0/oe_stuff/pull/2
So using of allow_missing_blocks = true allows to avoid issue with HTTP based RPC ONLY
All other providers (most important for us) are still having the issues

PS. we have no plans to fork and fix in a code

any updates here please ?

@illya-havsiyevych, in the weekly call (it is open, you can participate) we talk about which issues to work on. (Very) generally speaking, I think that we prioritize

  • Critical issues
  • EIPS to implement for the next network upgrade
  • Issues that have a general impact

Some of the issues take weeks to solve, and the queue of items grows day by day, so we are asking the community help on this.

Is it possible for you just to use the node without --no-ancient-blocks?

In our use case - we often assume node in gw-mode is lightweight, easy and fast to boot and --no-ancient-blocks has huge impact here, i.e:

  • with it node in a network with 17m blocks is up, ready and stops any backgrounds sync in mins
  • without it - it takes hours

Community help - we might do it at some point, rust is not in our skills set so it might take us some time to step in.

In any case - knowing a fair ETA is a key. If F3-annoyance really means "you'll never going to fix
that" - it will be a signal for us to react faster

In any case - knowing a fair ETA is a key. If F3-annoyance really means "you'll never going to fix
that" - it will be a signal for us to react faster

I agree with you that the wiki should explain how the tasks are chosen, it will help.
F3-annoyance also means "I'm going to fix it because there are not any critical tasks to do", but I am not in this situation.

Was this page helpful?
0 / 5 - 0 ratings