We have a couple thousand les nodes running forming consensus externally in our blockchain network (syscoin)
related to https://github.com/ethereum/go-ethereum/issues/19559 and https://github.com/ethereum/go-ethereum/issues/21337 we have seen a new CHT that you guys must have updated to only 6d old which may end up causing our network issues because LES nodes are not downloading blocks before the CHT (geth client simply does not return blocks before CHT (no peers I guess for serving those block headers)). This is why I was asking if we can simply refrain from updating CHT's too close to the tip. What was the motivation to make it so close to the tip?
Can we make this configurable so we are always looking back atleast X weeks of headers?
To further clarify this question: https://github.com/ethereum/go-ethereum/issues/21337#issuecomment-662212817
Even if we were verifying against the latest checkpoint, we don't have the data for the blockheader to be able to verify it using a merkle path to the latest CHT. So we could not verify it anyway. The problem is, when our nodes notice geth headers are missing (anything around 3w of blocks or newer that are not continuously linked) it will request it from geth, at which point geth never returns and thus the nodes are stuck in a data availability conundrum where they know theres data missing, that probably should be validated and they cannot get the data even though geth is functional and returning recent block headers.
In a fresh sync, the data won't be available and then when it tries to fetch blocks prior to CHT it fails, no response from geth on those getblockbynumber queries. When someone tries to move from Eth to Sys they have usually a week to do it or try to cancel and get a refund. Because of this we have to have atleast availability of this amount of blocks so cancellation fraud proofs can be done effectively, but also put a restriction on bandwidth as to how far back to verify those cross-chain transactions (which can be verified via merkle path to latest CHT sure, but still need the block header for that to re-hash it and that doesn't seem possible in this context)
As a proposal to fix in short term can we remove the latest CHT? We are finding major potential issues with it on mainnet. I say potential because it has ability to stall our network in the right circumstance, Secondly if we can add a configuration so we can configure how many blocks or cht we want to look back to and third maybe we see if geth can and should serve headers prior to any checkpoints (in light mode) where they aren鈥檛 downloaded on sync but still served on request. I think changing cht to be atleast 2 or 3 weeks back will help in short term to put out the fire and then we try to solve the problem via config params.
I know we can run our own geth fork but I refrain to do so as we want to stay on the official branch unless theres real good reasons like you find no reason to make the change (but I think its a good change or feature request, @rjl493456442 has already told me you guys have plans for the config change)
Thanks for reading!
The problem is, when our nodes notice geth headers are missing (anything around 3w of blocks or newer that are not continuously linked) it will request it from geth, at which point geth never returns and thus the nodes are stuck in a data availability conundrum where they know theres data missing, that probably should be validated and they cannot get the data even though geth is functional and returning recent block headers
Can you offer some more logs? With the latest CHTs, you can always fetch and verify all the block headers in the past. If it doesn't work, please open the issue and we should fix it.
All in all, the best approach is to let the syncing start point configurable, and we have the plan to implement this feature.
But I don't think the issue is valid that the CHTs are too new. There are enough blocks upon and we hold the assumption that the blocks covered by the CHT are "un-reorgable". So CHTs is safe enough to be updated.
And the CHT mechanism can always let you fetch the past block header by the number. So I don't get why your system will be stuck.
Most helpful comment
As a proposal to fix in short term can we remove the latest CHT? We are finding major potential issues with it on mainnet. I say potential because it has ability to stall our network in the right circumstance, Secondly if we can add a configuration so we can configure how many blocks or cht we want to look back to and third maybe we see if geth can and should serve headers prior to any checkpoints (in light mode) where they aren鈥檛 downloaded on sync but still served on request. I think changing cht to be atleast 2 or 3 weeks back will help in short term to put out the fire and then we try to solve the problem via config params.
I know we can run our own geth fork but I refrain to do so as we want to stay on the official branch unless theres real good reasons like you find no reason to make the change (but I think its a good change or feature request, @rjl493456442 has already told me you guys have plans for the config change)
Thanks for reading!