Lighthouse: Prune forks from the database

Created on 4 Mar 2020  Â·  4Comments  Â·  Source: sigp/lighthouse

Description

At the moment Lighthouse will keep blocks and states from non-canonical forks in the hot database indefinitely. This wastes disk space, and represents a potential resource exhaustion attack vector.

Expected Behaviour

When the chain finalizes (or on some subset of finalization events), we should prune all blocks and states that don't descend from the finalized state. This could probably be done efficiently using the head tracker, and maybe some help from fork choice (for determining if a head is descended from the finalized state).

enhancement security

Most helpful comment

I agree that using the parent root seems like the best option at the moment, in terms of simplicity and performance. I suspect that the burden on disk reads won't be too severe, as blocks are reasonably sized and we won't be pruning super often. The ParentBlockRootIterator iterates using the parent_root strategy, and might be useful.

All 4 comments

I'm claiming this issue :)

Awesome!

Since we don't keep forward-pointers from parent block to child block, we're most likely going to need to start and the head block and iterate backwards through it's ancestors. There are three ways we can iterate backwards:

  • From the head (start) block use block.state_root to read the corresponding BeaconState from the database and use the state.block_roots field to iterate backwards (using state.state_roots to load the next state once you run out of block roots). The ReverseBlockRootIterator achieves this.

    • Pro: You get 8,192 block roots from a state (in mainnet spec) so this single state will likely serve most normal circumstances.

    • Pro: simple, straight-forward.

    • Con: involves reading a beacon state from db. This is Badâ„¢ since it involves reading from disk and doing lots de-serialization.

    • Con: pruning several heads will result in several big DB reads.

  • From the head (start) block use block.parent_root to load the previous block root. Repeat.

    • Pro: simple straight-forward.

    • Pro: never loads a BeaconState

    • Con: results in one disk read per pruned block.

    • Con: unnecessary de-serialization of block fields other than parent_root (could be avoided with a custom Store method).

  • Our ForkChoice struct stores the parent for each block and could expose a function (akin to ProtoArrayForkChoice::block_slot) that provides fn parent_root(&self, block_root: &Hash256) -> Option<Hash256>.

    • Pro: finding the parent for some block root only involves reading a HashMap then doing a Vec access.

    • Con: will involve taking a read-lock on the fork choice which might delay other functions (if you're careful about this the impact can be minimal).

    • Con: more complicated since the proto_array_fork_choice is only guaranteed to hold non-finalized blocks. If you need to prune blocks that have been finalized you'll probably have to default back to recursive database lookups (like the previous option).

After writing this, I'm thinking that the second option (using block.parent_root) is how I would move forward. I avoid loading BeaconState from disk since it doesn't scale well with large validator counts and I'm concerned that using proto_array_fork_choice is a little too complex.

Perhaps you have some thoughts @michaelsproul? :)

I agree that using the parent root seems like the best option at the moment, in terms of simplicity and performance. I suspect that the burden on disk reads won't be too severe, as blocks are reasonably sized and we won't be pruning super often. The ParentBlockRootIterator iterates using the parent_root strategy, and might be useful.

Ooo, @adaszko fixed this! Closing! 🎉

Was this page helpful?
0 / 5 - 0 ratings

Related issues

paulhauner picture paulhauner  Â·  4Comments

nickoneill picture nickoneill  Â·  3Comments

paulhauner picture paulhauner  Â·  4Comments

paulhauner picture paulhauner  Â·  4Comments

michaelsproul picture michaelsproul  Â·  4Comments