Geth version: 1.9.7
OS & Version: Linux (Kubernetes on Azure) - running clique/POA
Commit hash : stable
After a transient failure to write blocks to freezer the node restarts (as expected), and should be able to sync with it's peers and catch up to the rest of the chain.
Node truncates freezer to a very old block (due to a missing full block) and when it receives a batch of blocks from a peer, it is unable to sync those blocks with a missing parent error. As seen in the logs below, the node experiences a temporary failure to flush entries to the freezer. On a restart, it finds that:
15057124057233605This is a non-archive node, but even then it is strange that the most recent full block should be so far behind. This results in the freezer getting truncated and the node attempts to sync the most recent blocks. However, that action fails because of a missing parent error on a batch of 26 most recent blocks. The same error keeps repeated across a rolling window of blocks. Eventually, we were able to recover the node only after performing a fresh sync (wiping all chain data).
INFO [12-23|22:44:14.269] Deep froze chain segment blocks=12 elapsed=179.503ms number=150533 hash=d41e42…b264c1
INFO [12-23|22:45:14.452] Deep froze chain segment blocks=12 elapsed=182.750ms number=150545 hash=103e90…ef8568
INFO [12-23|22:46:42.021] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=2.394ms mgasps=0.000 number=240564 hash=b5f25c…030f1b dirty=87.87KiB
INFO [12-23|22:46:47.022] Imported new chain segment blocks=1 txs=0 mgas=0.000 elapsed=2.617ms mgasps=0.000 number=240565 hash=fe3a1b…a8e538 dirty=87.87KiB
CRIT [12-23|22:47:21.184] Failed to flush frozen tables err="[sync /qdata/ethereum/geth/chaindata/ancient/headers.cidx: bad file descriptor]"
WARN [12-23|22:47:27.767] Head state missing, repairing chain number=240571 hash=ceb1b7…c0d84f
INFO [12-23|22:47:27.932] Deep froze chain segment blocks=13 elapsed=695.626ms number=150570 hash=3cb347…a1c6bd
DEBUG[12-23|22:48:28.127] Ancient blocks frozen already number=240571 hash=ceb1b7…c0d84f frozen=150571
DEBUG[12-23|22:49:28.128] Ancient blocks frozen already number=240571 hash=ceb1b7…c0d84f frozen=150571
DEBUG[12-23|22:50:28.128] Ancient blocks frozen already number=240571 hash=ceb1b7…c0d84f frozen=150571
INFO [12-23|22:50:40.494] Rewound blockchain to past state number=33604 hash=8bdb53…4cef1b
INFO [12-23|22:50:40.638] Loaded most recent local header number=240571 hash=ceb1b7…c0d84f td=240572 age=3m23s
INFO [12-23|22:50:40.638] Loaded most recent local full block number=33604 hash=8bdb53…4cef1b td=33605 age=1w4d23h
INFO [12-23|22:50:40.638] Loaded most recent local fast block number=240571 hash=ceb1b7…c0d84f td=240572 age=3m23s
WARN [12-23|22:51:11.596] Truncating freezer table database=/qdata/ethereum/geth/chaindata/ancient table=diffs items=150571 limit=33605
WARN [12-23|22:51:11.750] Truncating freezer table database=/qdata/ethereum/geth/chaindata/ancient table=headers items=150571 limit=33605
WARN [12-23|22:51:11.866] Truncating freezer table database=/qdata/ethereum/geth/chaindata/ancient table=hashes items=150571 limit=33605
WARN [12-23|22:51:11.953] Truncating freezer table database=/qdata/ethereum/geth/chaindata/ancient table=bodies items=150571 limit=33605
WARN [12-23|22:51:12.044] Truncating freezer table database=/qdata/ethereum/geth/chaindata/ancient table=receipts items=150571 limit=33605
DEBUG[12-23|22:51:13.280] Inserting downloaded chain items=49 firstnum=240566 firsthash=6180ef…9759bf lastnum=240614 lasthash=71d5bb…52d486
DEBUG[12-23|22:51:13.606] Pruned ancestor, inserting as sidechain number=240566 hash=6180ef…9759bf
DEBUG[12-23|22:51:13.619] Injected sidechain block number=240572 hash=ade660…1158f5 diff=1 elapsed=127.799µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.622] Injected sidechain block number=240573 hash=e9e493…494f1e diff=1 elapsed=96.699µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.624] Injected sidechain block number=240574 hash=81aa3d…1cde52 diff=1 elapsed=101.899µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.626] Injected sidechain block number=240575 hash=bc0ce7…617837 diff=1 elapsed=86.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.628] Injected sidechain block number=240576 hash=3bbdea…d91170 diff=1 elapsed=85.099µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.630] Injected sidechain block number=240577 hash=7b80a6…ebc272 diff=1 elapsed=148.099µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.633] Injected sidechain block number=240578 hash=3ff37a…36382c diff=1 elapsed=91.699µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.635] Injected sidechain block number=240579 hash=dfebcb…78379f diff=1 elapsed=96.199µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.637] Injected sidechain block number=240580 hash=a02165…a74d1c diff=1 elapsed=79.799µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.639] Injected sidechain block number=240581 hash=3b4cc0…7e87d5 diff=1 elapsed=108.499µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.641] Injected sidechain block number=240582 hash=cde4a8…e2983f diff=1 elapsed=87.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.644] Injected sidechain block number=240583 hash=0146ad…71b09e diff=1 elapsed=111.199µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.646] Injected sidechain block number=240584 hash=2d2e1e…bd08dd diff=1 elapsed=138.599µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.648] Injected sidechain block number=240585 hash=b35b4b…c04883 diff=1 elapsed=111.099µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.651] Injected sidechain block number=240586 hash=a38f0e…68a7af diff=1 elapsed=88.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.653] Injected sidechain block number=240587 hash=19247a…2c90f0 diff=1 elapsed=78.799µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.655] Injected sidechain block number=240588 hash=391dda…476f21 diff=1 elapsed=106.498µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.657] Injected sidechain block number=240589 hash=46463c…70c966 diff=1 elapsed=150.098µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.660] Injected sidechain block number=240590 hash=aee9e4…e46c1e diff=1 elapsed=63.799µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.661] Injected sidechain block number=240591 hash=efe965…ea8c03 diff=1 elapsed=68.099µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.664] Injected sidechain block number=240592 hash=b1b592…92e578 diff=1 elapsed=63.6µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.666] Injected sidechain block number=240593 hash=da4bd5…5b6677 diff=1 elapsed=95.399µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.668] Injected sidechain block number=240594 hash=71b9b8…1f1687 diff=1 elapsed=66.5µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.670] Injected sidechain block number=240595 hash=8bf6a9…553deb diff=1 elapsed=79.2µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.672] Injected sidechain block number=240596 hash=d0cf30…3a04b1 diff=1 elapsed=98.499µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.674] Injected sidechain block number=240597 hash=1bfb25…5bce90 diff=1 elapsed=83.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.676] Injected sidechain block number=240598 hash=f69b76…a99954 diff=1 elapsed=83.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.679] Injected sidechain block number=240599 hash=04c285…abf8e1 diff=1 elapsed=91.599µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.681] Injected sidechain block number=240600 hash=d48283…acdab2 diff=1 elapsed=77.3µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.683] Injected sidechain block number=240601 hash=55bb47…6692de diff=1 elapsed=84.999µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.685] Injected sidechain block number=240602 hash=f7de83…70ba11 diff=1 elapsed=67.699µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.690] Injected sidechain block number=240603 hash=02b770…dd89fd diff=1 elapsed=55.1µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.694] Injected sidechain block number=240604 hash=389ac9…1f111c diff=1 elapsed=67.199µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.698] Injected sidechain block number=240605 hash=542fc1…83922e diff=1 elapsed=57.499µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.702] Injected sidechain block number=240606 hash=c5c2e3…b1c1eb diff=1 elapsed=77.899µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.706] Injected sidechain block number=240607 hash=6dcb2c…dfc10f diff=1 elapsed=81.599µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.710] Injected sidechain block number=240608 hash=052e05…9c5a5f diff=1 elapsed=116.699µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.714] Injected sidechain block number=240609 hash=a391d4…6ba0bc diff=1 elapsed=59.1µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.717] Injected sidechain block number=240610 hash=90cfcb…d9012e diff=1 elapsed=187.199µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.721] Injected sidechain block number=240611 hash=06fb4d…8cadee diff=1 elapsed=53.1µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.725] Injected sidechain block number=240612 hash=6affa9…095b4a diff=1 elapsed=67.699µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.729] Injected sidechain block number=240613 hash=a678cf…ecda76 diff=1 elapsed=101.899µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
DEBUG[12-23|22:51:13.732] Injected sidechain block number=240614 hash=71d5bb…52d486 diff=1 elapsed=86.899µs txs=0 gas=0 uncles=0 root=86bc5d…3ef70d
INFO [12-23|22:51:14.035] Network syncing, will start miner afterwards
DEBUG[12-23|22:51:18.023] Downloaded item processing failed on sidechain import index=49 err="missing parent"
DEBUG[12-23|22:51:18.023] Reset ancient limit to zero
DEBUG[12-23|22:51:18.023] Synchronisation terminated elapsed=4.751s
DEBUG[12-23|22:51:18.023] Reset ancient limit to zero
WARN [12-23|22:51:18.023] Synchronisation failed, dropping peer peer=9ef106b661dd09ab err="retrieved hash chain is invalid"
DEBUG[12-23|22:51:18.023] Removing Ethereum peer peer=9ef106b661dd09ab
We do not know what causes the original issue with flushing contents to freezer DB. However, if one could re-create such a failure via a unit test, it should be pretty straight forward to see the rest of the failure.
None
Maybe this is instigated by containerization/k8s? Are you using kubectl proxy by any chance?
No, we are not. In general, there is no failure to sync blocks when adding new nodes to an existing chain (even one which is > 90k blocks). So direct and ongoing push to the freezer DB works just fine. The CRIT error when syncing data to the freezer could be due to the underlying filesystem -- but that shouldn't mean that a subsequent recovery/catch up isn't possible, should it?
Well if you're hosting a node, you have to sync (which requires downloading things off the Ethereum blockchain). So a failure in the underlying filesystem could instigate an error in syncing. So if the filesystem is broken/there are issues with it, then a subsequent recovery would probably error out as well, right?
I mentioned in my initial summary that we are running a private chain running clique/POA. Also a re-sync of the same node works just fine (stop geth, wipe data and sync again)
This is the root of you problems:
CRIT [12-23|22:47:21.184] Failed to flush frozen tables err="[sync /qdata/ethereum/geth/chaindata/ancient/headers.cidx: bad file descriptor]"
Apparently, this failure used the CRIT log level, which is not good -- since it causes an abnormal shutdown, which can cause further corruption.
I suspect that the missing parent error comes from a mismatch between what's in leveldb versus what's in ancient db.
Btw, you're running a fairly old release (20 releases old). I think it's probably time to upgrade before trying again :)
Oh, I just saw that you reported this against v1.9.7.
In https://github.com/ethereum/go-ethereum/pull/21409 , which was merged for 1.9.20, @karalabe reworked a lot of the repair cases, to ensure that geth properly handled all the quirky scenarios that can happen on an abrupt shutdown.
Whatever happened on 1.9.7, it's not really actionable, because it's too old, and with the major refactorings done since then, there's really no point in digging further on this issue.
I'll close this, recommend that you use a more recent version of geth, and if you experience a similar issue again, please file a new ticket.
Oh and I just saw that @karalabe already commented just that :/
Most helpful comment
Btw, you're running a fairly old release (20 releases old). I think it's probably time to upgrade before trying again :)