Go-ethereum: geth progress when switching to trie download

Created on 16 Apr 2020  路  9Comments  路  Source: ethereum/go-ethereum

Dear ETH Community,

I am trying to sync from scratch an Ethereum node in the default mode (fast). Without success as of now. To give some context, this is not my first attempt, more like the 4-5th one. This attempt is now running since more than 24 hours and the ones before where running for a few days (3-5) before I killed them and tried again.

System information

Geth version: Geth/v1.9.13-stable-cbc4ac26/linux-amd64/go1.14.2
OS & Version: Ubuntu 18.04.4 LTS, Linux 5.3.0-45-generic, x86-64
Commit hash : none

Memory: 16 GB

$ lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               60
Model name:          Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
Stepping:            3
CPU MHz:             1197.391
CPU max MHz:         3500.0000
CPU min MHz:         800.0000
BogoMIPS:            4988.47
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
$ sudo smartctl -a /dev/sda
=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZ7TD512HAGM-000L1
Serial Number:    S151NYAF401503
LU WWN Device Id: 5 002538 500000000
Add. Product Id:  00000000
Firmware Version: DXT06L0Q
User Capacity:    512'110'190'592 bytes [512 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Apr 18 01:17:01 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled



md5-1f8f2a8bac6a1b48f538b16629392636



{
  currentBlock: 9892652,
  highestBlock: 9892756,
  knownStates: 315252660,
  pulledStates: 315249558,
  startingBlock: 0
}

Here are logs that might help:
geth-0.log
geth-1.log
geth-2.log
geth-3.log
geth-4.log
geth-5.log
geth-6.log
geth-7.log

Thank you in advance for any help provided!

Best,

CedMaire

Most helpful comment

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.


Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

All 9 comments

There are about ~490.000.000 state entries and you have synced 315.249.558 of them.

Your node does not know how many state entries there are in advance, so it can not show you this information. There are plans to improve this.

https://github.com/ethereum/go-ethereum/issues/14647
https://eips.ethereum.org/EIPS/eip-2029
https://github.com/hayorov/ethereum-sync-metrics

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.


Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

Hey I am syncing with fast sync for a full GETH node.
_I currently am seeing:_

INFO [05-06|08:50:11.884] Upgrading chain index                    type=bloombits                         percentage=32
INFO [05-06|08:50:12.164] Initializing fast sync bloom             items=22109422 errorrate=0.000 elapsed=20m32.846s
INFO [05-06|08:50:19.129] Imported new block headers               count=1    elapsed=87.639ms   number=10011176 hash=5733be鈥e3293
INFO [05-06|08:50:20.172] Initializing fast sync bloom             items=22149838 errorrate=0.000 elapsed=20m40.855s
INFO [05-06|08:50:23.304] Upgrading chain index                    type=bloombits                         percentage=32
INFO [05-06|08:50:28.175] Initializing fast sync bloom             items=22203289 errorrate=0.000 elapsed=20m48.858s

I mistakely set my Cache to 1024. Is there a way to increase the cache to 4096 by stopping the sync process for a while?

@sssubik Yes, sync can be stopped and resumed at will.

Hey @karalabe Thanks I stopped and increased my cache size.

INFO [05-07|09:49:22.196] Imported new state entries               count=768  elapsed=7.911ms      processed=428953567 pending=11253 retry=0   duplicate=19952 unexpected=74470
INFO [05-07|09:49:22.676] Imported new state entries               count=384  elapsed=3.031ms      processed=428953951 pending=13183 retry=0   duplicate=19952 unexpected=74470
INFO [05-07|09:49:24.278] Imported new state entries               count=765  elapsed=4.512ms      processed=428954716 pending=14090 retry=0   duplicate=19952 unexpected=74470
INFO [05-07|09:49:25.150] Imported new state entries               count=837  elapsed=6.031ms      processed=428955553 pending=14142 retry=3   duplicate=19952 unexpected=74470
INFO [05-07|09:49:26.029] Imported new state entries               count=837  elapsed=7.429ms      processed=428956390 pending=14348       

I am getting this while syncing for a couple of days now. It seems that "pending" states increases and again decreases. At average its staying the same. What is the issue?

Hi @karalabe - I just wanted to check in and ask for your thoughts on the latest situation for syncing a mainnet node from scratch? I started syncing 14 days ago and I am seeing everything as described in the thread here.

I am just wondering if it's now the new norm that it would take 14+ days and still not complete the state trie phase? Naturally, I am concerned about this amount of time to complete the sync and I've checked everything that I can think of (SSD read / write speeds, Bandwidth, CPU etc) and all seem to be of the recommended specs.

Interestingly, I fired up a mainnet node using a service from QuikNode and that synced in just 6 hours - so I guess they are syncing with their own private nodes to spin up a dedicated node.

If you have any suggestions or know of a case where the sync was faster than what I am experiencing that would be really helpful so that we can share the process.

Syncing our AWS nodes generally takes around 8-9 hours. Currently fast sync is quite upload and latency intensive, meaning it gets super slow if your upload bandwidth is small, your latency is large (e.g. satellite and co). SSD IOPS are also important.

We have a prototype new sync algo that should be lightyears better, but there's still some things I need to stabilize and do a phased release.

Thank you @karalabe for your comments and its great to hear that you are syncing a node in AWS in about 8-9 hours. What specifications (using EC2 I presume?) are you using to do that?

Very exciting also to hear about the new sync algo developments. Many thanks!

@karalabe I finally managed to sync a mainnet node in around 15 hours usingi3.2xlarge AWS EC2 instance (8 core, 61 GiB RAM, 1.9 TiB NVMe SSD). It's way larger than what is needed but the main point being that its part of the family of _high I/O compute instances_ offered by AWS. I also selected that instance type based on the Geth teams benchmarks reported here.

I starting geth with the command: geth --syncmode=fast --nousb --cache=4096 --maxpeers=50 (which I think these options are mostly defaults anyway).

...about 15 hours later the logs read Fast sync complete, auto disabling which was great to see! :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

phpsamsb picture phpsamsb  路  3Comments

aakilfernandes picture aakilfernandes  路  3Comments

tymat picture tymat  路  3Comments

aomiba picture aomiba  路  3Comments

prene picture prene  路  3Comments