At the moment Lighthouse will keep blocks and states from non-canonical forks in the hot database indefinitely. This wastes disk space, and represents a potential resource exhaustion attack vector.
When the chain finalizes (or on some subset of finalization events), we should prune all blocks and states that don't descend from the finalized state. This could probably be done efficiently using the head tracker, and maybe some help from fork choice (for determining if a head is descended from the finalized state).
I'm claiming this issue :)
Awesome!
Since we don't keep forward-pointers from parent block to child block, we're most likely going to need to start and the head block and iterate backwards through it's ancestors. There are three ways we can iterate backwards:
block.state_root to read the corresponding BeaconState from the database and use the state.block_roots field to iterate backwards (using state.state_roots to load the next state once you run out of block roots). The ReverseBlockRootIterator achieves this.block.parent_root to load the previous block root. Repeat.BeaconStateparent_root (could be avoided with a custom Store method).ForkChoice struct stores the parent for each block and could expose a function (akin to ProtoArrayForkChoice::block_slot) that provides fn parent_root(&self, block_root: &Hash256) -> Option<Hash256>.HashMap then doing a Vec access.proto_array_fork_choice is only guaranteed to hold non-finalized blocks. If you need to prune blocks that have been finalized you'll probably have to default back to recursive database lookups (like the previous option).After writing this, I'm thinking that the second option (using block.parent_root) is how I would move forward. I avoid loading BeaconState from disk since it doesn't scale well with large validator counts and I'm concerned that using proto_array_fork_choice is a little too complex.
Perhaps you have some thoughts @michaelsproul? :)
I agree that using the parent root seems like the best option at the moment, in terms of simplicity and performance. I suspect that the burden on disk reads won't be too severe, as blocks are reasonably sized and we won't be pruning super often. The ParentBlockRootIterator iterates using the parent_root strategy, and might be useful.
Ooo, @adaszko fixed this! Closing! 🎉
Most helpful comment
I agree that using the parent root seems like the best option at the moment, in terms of simplicity and performance. I suspect that the burden on disk reads won't be too severe, as blocks are reasonably sized and we won't be pruning super often. The
ParentBlockRootIteratoriterates using theparent_rootstrategy, and might be useful.