Parity-ethereum: Create RPC method `GetSlice(path hex, depth int, root hash, storage bool)`

Created on 2 Nov 2018 · 10Comments · Source: openethereum/parity-ethereum

What is this

This is an enhancement issue to focus efforts and gather feedback

Motivation

In Mustekala, we are working on a p2p network of light clients: Its key feature is that nodes consume and seed subsets of the state called slices.

Specifications

Slice definition

A slice consists on a merkle trie starting from an arbitrary head, which is a trie node found following the path from the given state or storage root and delivering all its children up to a specified depth.

Example: The following slice

    [`0x0ab3`, 
    4, 
    `0xddc8b0234c2e0cad087c8b389aa7ef01f7d79b2570bccb77ce48648aa61c904d`,
    false]

Is a slice from a state trie which root is 0xddc8b0, which starts at the node found in the path [0,a,b,3], delivering all its children until a depth of 4.

Why these "slices"? What are you talking about?

Enable browsers and low resource devices to consume and become seeders of light chunks of data. The average state slice size with 600 trie-nodes is 128 kB.
You can find all the information you need from an account (balance, nonce, evm code reference and storage root) inside a slice. The same applies to a smart contract and storage slices, as the value of a key will be mapped to the same slice id over blocks (same key, same path from the storage root).
Enables the emergence of clusters of nodes consuming and seeding particular slices, increasing the readiness for the N+1 node. In other words, the more required is an area of the state (and hence the slices attached to that area), the more available it becomes.
Likewise, as clients in the network will be seeders (_can't stress that enough_), the burden over full clients to serve data will dramatically decrease.
Generally speaking, for any given slice, there is a low rate of nodes that change over each block. This approach presents opportunities in the range of memory management for the seeders of said slices.

RPC Method

Input

GetSlice(path hex, depth int, root hash, storage bool)

path: String representing the traversal in nibbles, from the given root to the head of the slice.
depth: Maximum number of steps to walk down from the head.
root: The hash of the root of the state or storage trie from where we are taking the slice.
storage: Boolean. If set false, additional values from the state will be returned.

As an anecdote fact, at of 2018.11.02, if we choose paths of 4 nibbles, and depth of 10, we can cover the whole state trie with 65,536 slices of 128 kB. The choose of path/depth for storage slices is a little bit more complicated, as it is related to the elections the smart contract developers took for the keys in their DB schema.

Output

slice-id: string specifying <path>-<depth>-<root>
trie-nodes:
- stem: ["hash:value"]: Nodes from the root to the head. Enables the client to verify the slice
- head: ["hash:value"]: Head node of the slice
- slice: ["hash:value"]: Children from the head, up to the specified depth
metadata:
- time-ms:
- 00-trie-loading: In milliseconds
- 01-fetch-stem-keys: In milliseconds
- 02-fetch-slice-keys: In milliseconds
- 03-fetch-leaves-info: In milliseconds
- nodes-number:
- 00-stem-and-head-nodes: count
- 01-max-depth: depth of the deepest children found
- 02-total-trie-nodes: count
- 03-leaves: count
- 04-smart-contracs: count

Additional output for state slices

While a smart contract account contains a field defined for the EVM code, this is but a content-addressed reference to its actual bytes. Then. each leaf in a slice must be parsed, the ones containing an EVM code field different from empty, must retrieve it using this reference from the database and return it inside the response.

Also, for convenience, we are returning the storage root of each smart contract, to avoid clients to traverse the slice for this value.

leaves: A JSON object, which keys are the full path to the leaf, Each leave contains:
- storage-root: Hash
- evm-code: String representation in hexadecimal

Challenges

Cache: Either an enhancement of the current caching system for trie nodes _or_ a special cache for slices must be build to avoid calls to the database as much as possible. As mentioned above, while the contents of a slice vary little over blocks, a clever cache eviction process must be in place, for those trie-nodes no longer needed.
Deltas: We may want in the future to have a separate RPC method to return the differences (deltas) between two slices.
EIP: This method should be worked as part of the RPC Protocol.

Conclusion

We are confident that this approach will contribute to the goal of a truly decentralized blockchain as well as facilitating the participation of browsers and low resource devices as seeders of data, reducing the load of full clients.

In a fully deployed model, during each computed block, full nodes would be serving a number of slices to kitsunet clients, to be relayed over the network, creating effective scalability over the reading operations of the ethereum database, providing data availability at face value.

A3-stale 🍃 F8-enhancement 🎊 M4-core ⛓ P7-nicetohave 🐕

Source

hermanjunge

👍3 ❤2

Most helpful comment

Thanks @holgerd77. I’ve been following this thread and once we implement state downloading in ethereumjs-client, we can support the GetSlice RPC method.

vpulim on 27 Nov 2018

👍3

All 10 comments

@5chdn @dryajov @kumavis

hermanjunge on 2 Nov 2018

This is an example of the current output we have from our go-ethereum fork

https://gist.github.com/hermanjunge/e430de12d7124208ad6052182ec6866f

hermanjunge on 2 Nov 2018

Adding @tbaut as he is working on the light client effort as well.

dryajov on 2 Nov 2018

👍1

Also pinging @holgerd77.

dryajov on 3 Nov 2018

Hi @dryajov - we talked about this at devcon, did we?

Its key feature is that nodes consume and seed subsets of the state called slices.

How does this work if the state changes every 15 seconds?

A slice consists on a merkle trie starting from an arbitrary head

The average state slice size with 600 trie-nodes is 128 kB

How does this perform if a slice contains crypto-kitties?

Did you implement something like this in Geth yet? If yes, could you link your work here? Also, did you consider writing an EIP for this?

5chdn on 26 Nov 2018

Hi @5chdn,

Implementation in geth:

https://github.com/ethereum/go-ethereum/compare/master...MetaMask:wip/slicer

Output example

https://gist.github.com/hermanjunge/e430de12d7124208ad6052182ec6866f

How does this work if the state changes every 15 seconds?

Example of a slice id:

0ab3-4-0xddc8b0234c2e0cad087c8b389aa7ef01f7d79b2570bccb77ce48648aa61c904d

which is path-depth-root. We can thus track by path, while slices change based on their new root, which we can obtain from the block headers.

How does this perform if a slice contains crypto-kitties?

We apply the same logic for storage tries: We compute the slice where the keys of interest of that database reside, and then the client keeps track of that path.

Afri, currently I am bashing my head against the keyboard to implement this in rust :smile_cat: , now, we can always make a quick call to clarify anything. Let me know!

hermanjunge on 27 Nov 2018

👍1

@5chdn yes, I showed you a quick demo of how this works.

This works for both state and storage tries and slices are just a few KBs, we can also control the slice size by adjusting it's depth.

For the state trie, we have a deterministic address space of 65000 slices. This technique uses the smallest unique prefix approach to generate the slices, in our case we use 4 chars as our prefix.

Same goes for contract storage, with the complication that contracts don't have this nice uniform address space and are in general sparse, our intuition is that this affects large contracts less than it does small ones.

So for contracts, such as cryptokities, we can calculate what's the storage key that is being looked up on demand, and request the network to start tracking that (the on demand pubsub topics are documented here - https://github.com/MetaMask/kitsunet-docs/pull/1). The downside of this is that it will be relatively slow on first request. One way of mitigating this is providing a way for contract authors to create maps for their contract storage, also, for standard contracts (ERC20, etc..) we can generate the maps ourselfs.

Also, it's worth noting that, once a slice is tracked by the network, subsequent access to it should be significantly faster.

dryajov on 27 Nov 2018

Adding @vpulim as he is working on ethereumjs-client with an emphasis on light client functionality, also already some exchange on this with @dryajov on our Gitter channel.

holgerd77 on 27 Nov 2018

Thanks @holgerd77. I’ve been following this thread and once we implement state downloading in ethereumjs-client, we can support the GetSlice RPC method.

vpulim on 27 Nov 2018

👍3

Hi @hermanjunge, really dig this work and am currently working on exposing the getSlice endpoint from vulcanizeDB.

I'm curious what the motivation for returning the nodes as ["hash:value"] in a map[string]string is, and how opposed you would be to changing that. My impression is it would be more useful to map the nodes to their hex path, e.g. return map[string][]byte where []byte is the rlp encoded node. Otherwise, it seems the client loses any ability to discern the paths for all of the nodes returned to them for the stem and slice.