Go-ethereum: Error: invalid mix digest

Created on 2 Jan 2021 · 32Comments · Source: ethereum/go-ethereum

Hoping can get some help on this issue:
Geth
Version: 1.9.13-stable
Architecture: amd64
Protocol Versions: [65 64 63]
Go Version: go1.13.4
Operating System: linux

Running private network with 2 nodes on AWS servers - One region went down, when I try to bring it up get error :

Error: invalid mix digest

#

WARN [01-02|12:32:27.063] Synchronisation failed, dropping peer peer=8538963b9b1151ef err="retrieved hash chain is invalid"

The other node is working fine and creating and verifying contracts - but I really need to more nodes to run and synchronize.

I tried to export and import the chaindata from the running node, but as soon as I start the second node I get invalid chain data.

Is there any way to recover from this ?

Source

atrana

All 32 comments

Please provide more information. E.g. what do you mean with "One region went down," - did geth have an unclean shutdown?

ligi on 3 Jan 2021

👍1

Yes there was an unclean shutdown - the EC2 instance terminated.

atrana on 4 Jan 2021

Unfortunately you will most likely need to resync then

ligi on 4 Jan 2021

I am happy to resync, but when I do it stops with the message "invalid chain data" - I tried, full, fast, light, export, import - the second node just does not want to sync

atrana on 4 Jan 2021

So you deleted the data-dir and it still does not want to sync?

ligi on 4 Jan 2021

yes correct - completely started the second node from geth init genesis.json - was syncing fine and then hits this error

atrana on 4 Jan 2021

the other node (the only one) is working fine - mining and verifying

atrana on 4 Jan 2021

did you delete the data dir?
Seems something is wrong with your cache - I think init does not delete it. Also do you use --ethash.cachedir or --ethash.dagdir ?
Ideally specify all your CLI arguments

ligi on 4 Jan 2021

yes I deleted the data dir. full cli

nohup geth --datadir ./datadir --keystore ./keystore --networkid=88888001 --nodiscover --syncmode=full --maxpeers 3 --nousb --verbosity 3 --cache=2048 --ethash.dagdir=./.ethash &

atrana on 4 Jan 2021

can you remove --ethash.dagdir=./.ethash and see if it works then?

ligi on 4 Jan 2021

is it safe to delete the dagdir ? I only have one node running now with production smart contracts - cannot afford to corrupt this

atrana on 4 Jan 2021

@atrana make a test environment for your node

DGKSK8LIFE on 4 Jan 2021

@atrana make a test environment for your node

actually just did it now - removed the dagdir and restarted... looking ok so far

atrana on 4 Jan 2021

ok cool. Best of luck! @atrana

DGKSK8LIFE on 4 Jan 2021

👍1

it started and regenerated the DAGS fine and individually the node is working, but as soon as I add a peer and start syncing I get the error:
INFO [01-04|15:00:02.345] Looking for peers peercount=0 tried=1 static=1
ERROR[01-04|15:00:02.549]

#### BAD BLOCK

Chain config: {ChainID: 88888001 Homestead: 0 DAO: DAOSupport: false EIP150: 0 EIP155: 0 EIP158: 0 Byzantium: 0 Constantinople: Petersburg: Istanbul: , Muir Glacier: , Engine: unknown}

Number: 12122670
Hash: 0x5a29e4da0cfce27ac3eca41203c7d854eb3c4c47f73279dc6f96cef68a7ac63d

Error: invalid mix digest

#

WARN [01-04|15:00:02.567] Synchronisation failed, dropping peer peer=8538963b9b1151ef err="retrieved hash chain is invalid"

atrana on 4 Jan 2021

I would recommend doing a disk-check. I suspect that the DAG data is corrupt, and since it got corrupt again it seems likely that it's due to a disk issue.
We could also verify this, if you do a shasum of the files in the dagdir / cachedir, and we could compare against the correct version.

holiman on 4 Jan 2021

Oh wait -- what engine are you using? I can't believe you ran ethash for 12122670 blocks?

holiman on 4 Jan 2021

Also, 1.9.13 suffers from a bug in mining ethash: https://github.com/ethereum/go-ethereum/security/advisories/GHSA-v592-xf75-856p . If you're at block 12M, you would definitely hit it.

holiman on 4 Jan 2021

Oh wait -- what engine are you using? I can't believe you ran ethash for 12122670 blocks?

We currently run 2 nodes on a private network - one of them is also constantly mining, is there another way to ensure transactions and smart contracts are updated on demand ?

atrana on 4 Jan 2021

Normally people use clique proof-of-authority networks, instead of ethash proof-of-work mining.

holiman on 4 Jan 2021

It may well be that the mining-node hit the 4gb threshold quite some whle ago, and has mined "bad blocks" for a while now.. Which means that you're in a pretty bad situation, basically. Either you need to keep the mining-bug in there, or throw away a few million blocks...

holiman on 4 Jan 2021

It may well be that the mining-node hit the 4gb threshold quite some whle ago, and has mined "bad blocks" for a while now.. Which means that you're in a pretty bad situation, basically. Either you need to keep the mining-bug in there, or throw away a few million blocks...

I think this error has been around since 23 Dec. I can see in the logs that the peer was dropped back then. We didn't pick up on the problem until 1st Jan (peeps being away on holidays)

I am happy to rewind back a few million blocks - how to do that ?

atrana on 4 Jan 2021

Normally people use clique proof-of-authority networks, instead of ethash proof-of-work mining.

I have just been reading more on PoA networks and I think this suits us better, as our network has no value but requires proof of transactions. I think when we started this nearly 2 years ago clique was not available in geth (or maybe we overlooked it). Is there any way to migrate from PoW to PoA ?

atrana on 4 Jan 2021

No, there's no way to migrate, you'd have to start over from scratch. That said, it's possible to populate the genesis alloc with arbitrary state: balances, code and storage.

holiman on 4 Jan 2021

I have just been reading more on PoA networks and I think this suits us better, as our network has no value but requires proof of transactions. I think when we started this nearly 2 years ago clique was not available in geth (or maybe we overlooked it). Is there any way to migrate from PoW to PoA ?

Sounds like you'd have to rewrite your code for that.

DGKSK8LIFE on 4 Jan 2021

We might bring up a clique network for all new customers (we are a small fintech startup) and slowly migrate existing customers over. But for now, coming back to above question - how to throw away the "bad blocks" so that I can upgrade to ver 1.9.24 and start mining and syncing again ?

atrana on 4 Jan 2021

A full-sync will verify every header, and get stuck on the first bad one. A fast-sync will not verify the PoW on every header, so it might "land" somewhere after a bad block.
So first of all, you need to figure out which the first bad block is. After that, you can do a setHead to that block, or full-sync to that point and then start mining

holiman on 4 Jan 2021

You could also, with some custom code, ensure that every header pow is verified during fast-sync., by changing fsHeaderCheckFrequency to 1 in downloader.go. Then a fast-sync would pinpoint the first bad block.

holiman on 4 Jan 2021

A full-sync will verify every header, and get stuck on the first bad one. A fast-sync will not verify the PoW on every header, so it might "land" somewhere after a bad block.
So first of all, you need to figure out which the first bad block is. After that, you can do a setHead to that block, or full-sync to that point and then start mining

Thanks @holiman - so from the console I would use debug.setHead('#blocknum') ?

atrana on 5 Jan 2021

Yes. I can't guarantee it will work perfectly, but that's the way to do it. It's obviously not something that is considered part of the normal usecase -- it's a last-ditch approach to correcting a bad error, but @karalalbe put a lot of effort into making setHead behave correctly.
It probably requires a restart after the operation is finished.

holiman on 5 Jan 2021

@atrana Regarding the invalid mix digest, I think I can almost sure that it's because of the 4GB DAG issue. You are using the old Geth to generate the invalid DAG(which can be invalid when the DAG size exceeds 4GB).

You can use the latest released Geth to regenerate the DAG. And with the correct DAG, then apply the approach from @holiman to wipe all mined bad blocks.