Go-ethereum: full sync / archive node - never finishes synching

Created on 2 Mar 2018  路  24Comments  路  Source: ethereum/go-ethereum

I have been trying to get geth (1.8.1) to sync in full archive mode with the following flags:
--gcmode=archive --syncmode=full

Geth version: v1.8.1
OS & Version: Linux

I tried it on several different server configurations, the last try was on a machine with 96gig of RAM and 2TB SSD. It seems to get kind of stuck after a week... At least the progress slows down that I do not expect it to finish within the next months. A plain fast sync is no problem and finishes within less than a day. But I need the archive data.

Any hints on how to get a geth node fully synced in archive mode?

For my application I do not need all historical states, lets say the last 1 million blocks would be enough.
Any hints on how to force geth to do a fast sync to e.g. block 4m and then continue in archive mode?

triage

Most helpful comment

Thanks but I don't think this will help me. Syncmode light has even less information than fast. I need the full historical data as I want to do some statistical analysis on internal transaction of some accounts. As the timespan is around 1.2 million blocks I think the best option is to have all the data on my machine and to fetch the data directly from my geth.

All 24 comments

use --syncmode=light

Thanks but I don't think this will help me. Syncmode light has even less information than fast. I need the full historical data as I want to do some statistical analysis on internal transaction of some accounts. As the timespan is around 1.2 million blocks I think the best option is to have all the data on my machine and to fetch the data directly from my geth.

@fishpepper, I run full node now for similar purpose of analysis. It takes 1-2 week but works for me. Big issue I find is to have very fast and stable Internet. Server is much smaller with 16GB RAM.

@fishpepper I am (probably) trying to do the same as you. Im using geth 1.7 and syncmode = full option and it is syincing. Very slowly but it is. Its running for almos 2 weeks now, 700Gb and still aroun block 5012253. And so far, I was able to get all historical data I needed. Let me know if you want to have a further talk. Maybe we can help each other

Might be related to #16202

I start fresh node with --gcmode=archive --syncmode=full to help. Here are some numbers so far, not 100%:

timestamp,block_number,block_timestamp
2018-03-03T18:18:42 UTC,0,
2018-03-04T13:21:22 UTC,2425291,2016-10-12 07:27:22
2018-03-06T12:31:05 UTC,3687600,2017-05-11 08:14:17
2018-03-07T07:40:54 UTC,4002766,2017-07-10 11:09:12

Update info from same node if helps. It can take week (=

2018-03-03T18:18:42 UTC,0,
2018-03-04T13:21:22 UTC,2425291,2016-10-12 07:27:22
2018-03-06T12:31:05 UTC,3687600,2017-05-11 08:14:17
2018-03-07T07:40:54 UTC,4002766,2017-07-10 11:09:12
2018-03-09T00:27:49 UTC,4274116,2017-09-14 19:28:56
2018-03-09T09:06:57 UTC,4316963,2017-09-27 19:13:19

Seems like it does indeed proceed but very slowly. After 12 days on the 96gig ram machine I am finally at block 4732597... I will report back how long it takes to finish.

I am consider hosting a copy of geth data with full historical on S3. Would you be interested in such file @fishpepper ?

hey @ycdk , out of curiosity, why are you hosting on S3? Just for backup? I thought I'd store the files on S3 to save on block storage but (I believe) there is no way to make geth work with the blockchain data on S3, right?

@dmenin , i mean tgz backup of geth folder for other to share. i think s3 fuse or nfs is possible but would be too slow for many use case :-1:

@ycdk I'm interested as well

Any hints on how to get a geth node fully synced in archive mode?

@fishpepper You can try this: sync a node in "fast" mode and keep it running; use another node in "full" (archive) mode, running without peer discovery, with the first node as the only peer.

Any hints on how to force geth to do a fast sync to e.g. block 4m and then continue in archive mode?

I don't think this is available yet. Would make a good "feature request" issue IMO.

So timing is complete for this run. ~19 days or ~3 weeks to sync

2018-03-03T18:18:42 UTC,0,
2018-03-04T13:21:22 UTC,2425291,2016-10-12 07:27:22
2018-03-06T12:31:05 UTC,3687600,2017-05-11 08:14:17
2018-03-07T07:40:54 UTC,4002766,2017-07-10 11:09:12
2018-03-09T00:27:49 UTC,4274116,2017-09-14 19:28:56
2018-03-09T09:06:57 UTC,4316963,2017-09-27 19:13:19
2018-03-12T02:10:02 UTC,4674202,2017-12-04 12:35:45
2018-03-14T00:31:07 UTC,4799680,2017-12-26 09:17:07
2018-03-18T12:37:30 UTC,5046756,2018-02-07 11:24:43
2018-03-22T10:59:56 UTC,5300736,2018-03-22 10:56:50

@ycdk if you are still planning to host a copy of the geth data, let me know.
It seems I have no choice but to run an archive node for a few tests, and not looking forward to 3weeks of syncing... ;)

Hi - I did see one important thing. SSDs make a huge differnce in sync compared to HDD atleast upto block 3.9M. With a HDD it was going at at about 20% the speed of SDD. After block # ~3.9M they seem to be equally slow.

I'm 18 days in on block 5035692 for my archival sync. There geth client outputs a status update every 8 seconds (from looking at the source on github) and there are spans where I only confirm 1 block, and others were I confirm 5-8 in those seconds. Not sure if the slow patches are because of the increased interest in Eth from Nov '17 to Jan '18, or cryptokitties, or if it's just what peers have available for me to confirm but yeah, blocks i'm adding per hour is pretty variable and never very large.

I'm using an overpowered VM on GCP, SSD and a lot of idle cores with a --cache=8192. Geth never pushes my load average over ~1.8 no matter the settings, so honestly more than a dual core machine doesn't even seem to matter.

I don't suppose there's anything that can be done to speed up an archival sync? I've restarted the client before to see if I can get better peers or something but that doesn't seem to do much. Would opening an external port like 30303 help?

I can confirm that I experienced the same behaviour. I just stopped my sync at block 5211425 because I'm reaching 1Tb of data and what I have it should be enough for what I need to do. In terms of machine power, I think you are right as well, I'm using an AWS m4.xlarge and it is largely underused.

For anyone else who comes here, AWS instances like m4.xlarge are much slower than instances like i3.2xlarge because of the latency accessing storage. If you look at the CPUs iowait times that should become clear (use iostat on ubuntu in package sysstat).

That said, I still see both Geth and Parity underutilise resources on large machines, even with an i3.2xlarge there's idle CPU even though caches are bumped up and AFAIK blocks are being cached in advance. Not sure what the bottleneck is 馃槨

This issue was partially already solved by introducing a read cache in one of the recent releases since it was opened (https://github.com/ethereum/go-ethereum/pull/18087).

A further tweak is pending merge that completely removes the write cache for archive mode, granting more memory for the read cache (https://github.com/ethereum/go-ethereum/pull/18991).

A last fix is also coming that will not just blindly flush the dirty trie nodes to disk, rather also store some (not sure how much) of it into the read cache.

All in all there's only so much we can do before sync time gets throttled by disk IO, but these changes should definitely help. Will keep this thread updated when things get merged.

As of 1.9 release of geth, we massively improved in particular the ability to run an archive node. I don't think this ticket is productive anymore, since it's mostly about historic releases. It's probably better to open a new ticket if this is still a problem.
(Of course, it's still difficult to sync, but it's an ongoing problem that we're always actively working on)

I have a fully synced node of Ethereum with sync mode full and updated to latest version of geth. Is there any way to upgrade the same into archive node without having to start from scratch ?
@karalabe

I have a fully synced node of Ethereum with sync mode full and updated to latest version of geth. Is there any way to upgrade the same into archive node without having to start from scratch ?
@karalabe
@Sreeram1993
I meet the same problem.
Have you solved it?

@Sreeram1993 Currently, I'm trying to do something like this. Do you've any idea?

Was this page helpful?
0 / 5 - 0 ratings