Go-ethereum: geth slow to produce blocks with transactions, mines empty blocks

Created on 24 Nov 2020  Â·  8Comments  Â·  Source: ethereum/go-ethereum

System information

Geth version: 1.9.21-stable
OS & Version: Ubuntu 18.04 LTS

Hardware: basically a top-notch cloud provider instance (dedicated, not virtual).
2 TB SSD, 64 GB RAM (geth RAM settings: max allowed, 21 GB).
Intel Xeon CPU E5-2680 v3 @ 2.50GHz

Expected behavior

Geth node is running in mining mode.
Upon receiving a new block N, the node instantly produces new work: block N+1, empty (txs=0 fees=0).
After that, the node packs block N+1 with pending transactions, and further produces new work (say txs=123 fees=0.123 ETH).
Expected: the delay in producing a transaction-filled block should be reasonable on top hardware, say 100-200 msec.

Actual behavior

The delay in producing a transaction-filled block is on the order of 1 second.
Specifically, on a sample size of 100k blocks (~15 days)

  • ~80% of the time it takes approximately 1 second
  • ~10% it takes 2 seconds or more
  • ~10% of the time it takes < 0.2 seconds. These blocks don't seem to be special, so I don't understand what's blocking geth from producing blocks in under 0.2 seconds all the time.

geth logger messages sample

Scroll to the right, the important part is the elapsed=... measurement
INFO [11-24|20:28:59.066] Imported new chain segment blocks=1 txs=191 mgas=12.467 elapsed=150.020ms mgasps=83.103 number=11323200 hash="f22f1f…323eab" dirty=3.24GiB INFO [11-24|20:28:59.067] Commit new mining work number=11323201 sealhash="ae5470…f5a484" uncles=0 txs=0 gas=0 fees=0 elapsed="995.16µs" INFO [11-24|20:29:00.188] Commit new mining work number=11323201 sealhash="7cb550…6fe4a8" uncles=0 txs=72 gas=12468185 fees=0.4014497929 elapsed=1.121s INFO [11-24|20:29:02.211] Commit new mining work number=11323201 sealhash="514281…fa440b" uncles=0 txs=75 gas=12458990 fees=0.4658125257 elapsed=145.129ms INFO [11-24|20:29:05.250] Commit new mining work number=11323201 sealhash="0a10cd…c4c127" uncles=0 txs=161 gas=12472545 fees=0.4883934447 elapsed=183.175ms INFO [11-24|20:29:06.163] Imported new chain segment blocks=1 txs=158 mgas=12.496 elapsed=181.371ms mgasps=68.898 number=11323201 hash="42c9fe…0c77ff" dirty=3.24GiB INFO [11-24|20:29:06.165] Commit new mining work number=11323202 sealhash="9dc1cd…fe0464" uncles=0 txs=0 gas=0 fees=0 elapsed=1.492ms INFO [11-24|20:29:07.255] Commit new mining work number=11323202 sealhash="218e37…4a828e" uncles=0 txs=66 gas=12472855 fees=0.391307477 elapsed=1.091s INFO [11-24|20:29:09.049] Imported new chain segment blocks=1 txs=70 mgas=12.481 elapsed=133.089ms mgasps=93.781 number=11323202 hash="9cca69…1f3c16" dirty=3.24GiB INFO [11-24|20:29:09.051] Commit new mining work number=11323203 sealhash="088da5…3a1b9c" uncles=0 txs=0 gas=0 fees=0 elapsed=1.216ms INFO [11-24|20:29:10.215] Commit new mining work number=11323203 sealhash="bbd829…c16f8e" uncles=0 txs=72 gas=12461722 fees=0.427861034 elapsed=1.165s

geth launch command

~/go-ethereum/build/bin/geth --datadir=/my-ssd/gethdatadir --syncmode=fast --cache=32000 --maxpeers 50 --txpool.globalslots=64000 --txpool.pricelimit 20000000000 --http --http.api eth,net --ws --ws.origins='http://127.0.0.1:8546' --ws.api eth,net --miner.etherbase '0xADDRESS' --mine --miner.threads=0 &>> ~/logs.txt

bug

All 8 comments

Thanks for reporting @bogatyy. Could you paste the command your using to start geth with, so we can repro easier?

IIRC there's some built-in delay there, so we don't just send new work to the nodes again and again, on the basis that there'a non-negigible delay where updating the work package has two drawbacks:

  • A small downtime for the miner,
  • A window of time where nodes may report results for what the miner suddenly considers 'stale' work package, since he just swapped it to something else, and thus drops it.

@rjl493456442 may know more?

Thank you for taking a look!

@MariusVanDerWijden Added the geth launch command into the initial report above.

@holiman Could you point me to where the delay is implemented?
I'm tentatively skeptical that that's the issue here.

IIUC there are 3 types of "new work" block templates produced by geth nodes, in order:

  • Empty block, immediately upon importing someone else's newly mined block – desirable to avoid mining uncles (costly!)
  • Filled block, immediately upon forming it – desirable to avoid mining empty blocks (costly and bad for ecosystem)
  • Filled blocks with slightly more expensive transactions. These updates are only performed once every 2-3 seconds (lines 4-5 in my logfile), and I think that's where the delay you mentioned comes into play?

@holiman actually it's not. Basically there is no downside for switching the mining block.

  • All the mining tasks created will be cached for 7 blocks. It means even the submission is stale, it will be sent out to become an uncle.
  • I think there is no BIG downtime for miner. The switch time in the GPU level is less than 1ms. @chfast had some optimization for the mining software for switching the mining block.

And we don't have the in-buit delay. Whenever we finish packing the mining block, it's sent out immediately.

@bogatyy yes, the interval for recreating the mining block can be configured by --miner.recommit. But we don't have the deliberate delay for the first full block.

@wsnbbxy Do you have any thoughts about it? wsnbbxy is the operator of the SparkPool which uses the Geth as the mining pool node.

@bogatyy 80% more than 1s is abnormal. It may be that the value of txpool.globalslots is too large, try reducing it

@wsnbbxy Thank you! Your suggestion indeed fixed it.

  • At globalslots=64000, it took 1 second on average
  • At globalslots=4096 (default value), it takes ~250 msec on average
  • At globalslots=2048, it takes ~150 msec on average

That said, I do not understand why does it take so long. Presumably because TransactionsByPriceAndNonce needs to sort all the transactions in the mempool?
But it creates a heap out of the mempool and extracts transactions one-by-one, so the complexity should be O(txs_in_block * log_2(globalslots)) = O(100 * log_2(64000)) - that should be very little.
Maybe passing all the mempool transactions by value?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

carver picture carver  Â·  3Comments

JMaxU picture JMaxU  Â·  3Comments

VoR0220 picture VoR0220  Â·  3Comments

aakilfernandes picture aakilfernandes  Â·  3Comments

phpsamsb picture phpsamsb  Â·  3Comments