Substrate: Offchain Workers: deterministic bookkeeping

Created on 30 Sep 2019 · 11Comments · Source: paritytech/substrate

cc @tomusdrw , @jimpo

Offchain workers are a feature of Substrate allowing us to provide code in the Runtime which may be non-deterministic, and is intended to be executed for each _new_ block to perform tasks such as:

querying a price feed
making an HTTP request
creating a DHT entry
replicating a file

For nondeterministic tasks such as the above, it would be bad to execute the offchain logic on ancient blocks while performing a major synchronization. You might end up re-executing logic initially triggered years ago, to absolutely no effect. Because of this, offchain workers are designed not to execute for every block in the chain, as a full node.

I've encountered two use-cases which fall into another category of execution: deterministic bookkeeping. These are situations where the computation is deterministic but data-heavy, and we want to off-load data (typically trie nodes) off of the chain state (where trie roots are kept). For these use-cases, the current operation of offchain workers does not seem to be sufficient.

Example 1: Merkle Mountain Ranges (MMR)

2053 , https://github.com/mimblewimble/grin/blob/master/doc/mmr.md

For many kinds of auxiliary blockchain protocols, it's important to be able to prove that some ancient block header is an ancestor of the finalized chain head. MMRs provide a good way of doing that.

We want to write a runtime module to keep track of the _peaks_ (roots) of a bunch of different merkle tries - there will be log2(N) of these for N blocks (and N trie nodes in total). You can add to the MMR with only the peaks, and prove ancestry if you have all the nodes.

We'd want full nodes to keep track of all of the MMR nodes by keeping them in offchain storage. However, if even one block in the chain is not executed, it is possible to end up in a situation where ancestry can no longer be proven.

Example 2: Historical Slashing

srml-staking and srml-session are designed so that validators and nominators can be slashed for a long bonding duration while they wait for their money to be withdrawable. Keeping months' worth of historical validator sets, session keys, and nominator assignments on-chain is too heavy, so we instead keep a trie root encoding the historical validator sets for every session. Full nodes are intended to keep this trie root.

For slashing, the situation isn't as severe. However, for security it would be best to have as many full nodes as possible be able to report misbehavior. If a full node doesn't execute the off-chain worker, it may not have the trie nodes necessary to issue a report of a misbehavior that it witnesses - reducing the effectiveness of fishermen.

Final notes

For these kinds of deterministic bookkeeping tasks, it would be really useful to have a category of offchain execution which is guaranteed to be run on every block. This could also be done by having an alternate set of storage APIs available to on-chain execution, which places storage into the off-chain DB.

Warp sync also obviously plays a big role in usability of a blockchain client. We don't want it to happen that only nodes which have performed a full sync have all of the bookkeeping trie data. In the MMR case, it would mean that only those kinds of nodes could give out ancestry proofs. In the Historical Slashing case, it would mean that recently warp-synced nodes could not report misbehavior.

Given that this data is all trie-based, with roots in the runtime, it would be nice to be able to warp sync it as well. This may not be too difficult with the right runtime APIs, but it is something to keep in mind.

I6-refactor 🧹 Z0-unconfirmed

Source

rphmeier

👍3

All 11 comments

I very much appreciate the issue as this matter of re executing ancient offchain code was a question that I had as well for some time.

Yet, I don't exactly get what the issue is recommending? I asked this question in SCL session (who controls which nodes run the offchain code, and when) and the apparent answer was no one, the code itself should restrict this and prevent re-execution of the offchain code by those who do not have the access right, and, at times where it does not make any sense.

Given that I understood this correctly:
The only remaining aspect is the node types that run the offchain code. From your issue I infer that what you recommend boils down to allowing the offchain code itself to define who will execute it, as opposed to now where the client is hardcoded to run it only when a full node is running/syncing.

Am I correct here?

kianenigma on 1 Oct 2019

it would be really useful to have a category of offchain execution which is guaranteed to be run on every block

rphmeier on 2 Oct 2019

👍1

I had some thoughts on how we could extend the API for offchain workers and let them decide whether they are run or not, roughly:

#[api_version(2)]
trait OffchainWorker {
  fn offchain_worker(Params);
}

struct Params {
  /// Is the node currently doing a major sync (i.e. we are not fully in sync)
  pub is_major_sync: bool,
  /// Is the imported block new best block
  pub is_new_best: bool,
  /// The import route (i.e. what blocks were retracted/enacted)
  pub import_route: TreeRoute<Block>,
  // Should we only run a subset of offchain workers (handled by `Executive`)
  pub filter: Option<Vec<Module>>,
}

And answering to @kianenigma

the code itself should restrict this and prevent re-execution of the offchain code by those who do not have the access right, and, at times where it does not make any sense.

Actually, I thought the assumption is that it's totally fine for anyone to run offchain workers, I don't think the code itself should be restrictive beyond some basic requirements for it to run (like key available, etc). The thing is that whatever this offchain worker produces is either:

Just a waste of time cause it was not needed.
Rejected by the chain cause some higher-level conditions were not satisfied.

The filter option mentioned above could be used by clients to actually restrict offchain workers they want to run themselves via CLI or have the node restrict them, as it knows way more than is available from withing offchain worker.

tomusdrw on 2 Oct 2019

Letting the offchain workers decide for themselves whether it's necessary to run every block seems fine to me

what about runtime APIs that pass data to the offchain worker? for instance, in the runtime we compute a trie root (which means computing all the trie nodes). We could just pass these nodes to the offchain worker, because if we don't, then we have to re-do the trie calculation in offchain logic.

rphmeier on 4 Oct 2019

@rphmeier yeah, that would be useful too. Something like data stash that can be pushed to during block import and later can be retrieved in offchain workers. I guess it's just a matter of adding a pair of sr-io methods.

tomusdrw on 4 Oct 2019

ok it seems what we want is:

<Modue as OffchainWorker>::generate_extrinsic has some knowledge of its context so it can skip things when it is in major sync

note: I felt a bit mixed between having these conditions on-chain and off-chain, like if one module become obsolete you could want not to execute its offchain work in any context.
Maybe it is fine to have both off-chain conditions and on-chain condition, we can start with offchain condition:
rust // Should we only run a subset of offchain workers (handled by `Executive`) pub filter: Option<Vec<Module>>,
and on-chain conditions having context: is_new_best, in_major_sync and import_route. Also we have to make this context extensible without breaking old offchain worker.

on-chain decision allows for new (on runtime upgrade) offchain-worker to make their own decisions
off-chain decision allows to change decision afterwards for old unnecessary workers

probably improvments on API available inside offchain worker to store stuff. for instance maybe fork-aware data structure, in memory stuff...
A way to send data for use in offchain worker from inside the runtime.
probably using a sr-io::put_for_offchain_workers(datas)
or better sr-io::put_for_offchain_workers(key, value)
EDIT: I'm working on this right now

thiolliere on 25 Oct 2019

@thiolliere I got a fork aware data structure in #3774, there is some adjustment and thing to add to make it efficient for any kind of data but it is basically the easy/cool part remaining.

cheme on 25 Oct 2019

👍1

::generate_extrinsic has some knowledge of its context so it can skip things when it is in major sync

I'm only familiar with offchain workers at a high level.

This issue is not only about generating extrinsics, but about executing the offchain worker in general. The use-cases I mentioned above don't have anything to do with generating extrinsics.

I think fork-aware data storage is really only necessary if we do pruning of the offchain DB.

rphmeier on 25 Oct 2019

This issue is not only about generating extrinsics, but about executing the offchain worker in general. The use-cases I mentioned above don't have anything to do with generating extrinsics.

oh yes it seemed to me ::generate_extrinsic should just be renamed offchain_worker, or work. there is no need to generate an extrinsic in this method, we can just do some work. I think we should general the usage of this function

thiolliere on 26 Oct 2019

Yes, the method is misnamed, it comes from an old concept where the return type of offchain worker was supposed to be Vec<Extrinsic>. It's only used internally (i.e. Executive dispatching to mulitple modules), so we can safely rename it without breaking compatibility.

tomusdrw on 26 Oct 2019

👍1

Seems that the use cases for this are getting more urgent to tackle, I spoke with @rphmeier and he proposed that we might have a simple sp_io method available for the block execution that will allow writing (and writing only) directly to the offchain worker database.
Note that such method should most likely be an opt-in from CLI and if disabled it should simply be a no-op. Writes should also be buffered so that it doesn't affect block execution time (preferably at all), but we should make sure they are committed before OCW for that block actually runs.

tomusdrw on 3 Mar 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings