tendermint 🚀 - Run CheckTx before voting on blocks

Note this also relates to the MaxGas problem (#2310) - if we run CheckTx before voting, we can easily enforce the MaxGas tendermint side. Otherwise we need the app to do it or we do something else funky

ebuchman on 12 Sep 2018

I'm highly in favor of enforcing running check Tx before voting on blocks. It feels very off to me that validators may not actually check anything about the txs themselves before commiting. An adversarial validator in this model can then waste many peoples computation with useless things, its better to have to the validators sanitize this, and not have it get into the history of the entire chain / incur arbitrary costs on others. Otherwise a validator could make every full node / syncing node verify block_size signatures, and make each light client store another header.

Note, p2p relays should not have run check tx here though, just verify that signature from the validator proposing it matches up.

ValarDragon on 12 Sep 2018

when receiving new comming tx , checkTx is also called by mempool, which will alter the checkTxState in the ABCI server. Thus the app may need to maintain another checkTxState which would not change along with the handling of incoming txs. Consider the following codes in ethermint:

// Commit commits the block and returns a hash of the current state
// #stable - 0.4.0
func (app *EthermintApplication) Commit() abciTypes.ResponseCommit {
    app.logger.Debug("Commit") // nolint: errcheck
    state, err := app.getCurrentState()
    if err != nil {
        app.logger.Error("Error getting latest state", "err", err) // nolint: errcheck
        return abciTypes.ResponseCommit{
            Code: uint32(errors.CodeInternal),
            Log:  err.Error(),
        }
    }
    app.checkTxState = state.Copy() //this checkTxState will change along with incoming tx
    blockHash, err := app.backend.Commit(app.Receiver())
    if err != nil {
        // nolint: errcheck
        app.logger.Error("Error getting latest ethereum state", "err", err)
        return abciTypes.ResponseCommit{
            Code: uint32(errors.CodeInternal),
            Log:  err.Error(),
        }
    }

    return abciTypes.ResponseCommit{
        Data: blockHash[:],
    }
}

we may need another

app.checkTxStatePending = state.Copy()    //which will change with the incoming tx
app.checkTxState =state.Copy()   ////which would not change with the incoming tx

accordingly, CheckTx interface may should adopt RequestCheckTx and add a flag to let the app to determine using checkTxState or checkTxStatePending . The flag can also be used to determine whether to do verifySignature mentioned in #2127 . For example:

type RequestCheckTx struct {
    Tx                   []byte   `protobuf:"bytes,1,opt,name=tx,proto3" json:"tx,omitempty"`
       type  byte   // 0 : use checkTxStatePending,verify signature.  1: use checkTxStatePending, don't verify signature.  2:use checkTxState,verify signature
    XXX_NoUnkeyedLiteral struct{} `json:"-"`
    XXX_unrecognized     []byte   `json:"-"`
    XXX_sizecache        int32    `json:"-"`
}

I don't know if I understand correctly?

james-ray on 14 Sep 2018

We'd need to define how to do it. This change will require an ADR before we implement.

Back in https://github.com/tendermint/tendermint/issues/2175#issuecomment-412110007 I suggested we introduce a new ABCI connection to handle these CheckTx calls, which would tend to imply another state.

An alternative might just be to call these CheckTx on the Consensus connection using the same state as the DeliverTx will use. Then in BeginBlock, apps can decide whether they want to clear that state or not (most will, but others can cache the checks they did in CheckTx eg. so they dont have to recheck signatures in DeliverTx).

ebuchman on 14 Sep 2018

The second solution seems easier to implement with less modification. Just need all developers to clear some state such as nonce or balance in BeginBlock to the original state before checking the block. Let me try for it. I'll test in my local enviroment, if it's passed in CircleCI, Then we can also decide if solution 1 is better.

james-ray on 15 Sep 2018

I have written an ADR markdown draft, please help to review it.

james-ray on 19 Sep 2018

Consider a ConsensusParam that determines if we run CheckTx on proposal blocks before voting for them. This would enforce that blocks are fully valid (assuming CheckTx is implemented correctly).

In favor. As @ValarDragon mentioned, this prevents certain kinds of proposer DoS - it also facilitates more efficient application implementation by allowing caching of CheckTx checks, which are presently duplicated.

cwgoes on 20 Sep 2018

Why not just have the proposer run CheckTx on all the txs before proposing a block. If there's an invalid tx, it should be caught earlier, before a block has been propogated through the network. The proposer proposing invalid txs is not a big deal, because:

It's not like they can clog the chain with invalid txs. If they were malicious, they could just propose empty blocks instead. They can clog their percentage of the throughput.
It can be made a slashing condition to propose a block that doesn't pass CheckTx.

sunnya97 on 20 Sep 2018

It's not like they can clog the chain with invalid txs. If they were malicious, they could just propose empty blocks instead. They can clog their percentage of the throughput.

No, if they were malicious they would fill it with invalid txs, and waste peoples computation on signature verification. They would pack it with as many signature verifications as they could. This is more troublesome than proposing an empty block.

Not sure how I feel about the slashing condition idea. I'm currently leaning towards blocking it completely, and #postlaunch also make proposing one with invalid check's a slashing condition. You don't want archival nodes or the p2p network to be getting stressed over bad txs.

ValarDragon on 20 Sep 2018

If you're worried about minimizing the number of signature checks, then you definitely don't want to ask validators to be doing CheckTx before voting on a block. Because now they're double checking the signatures, once during that CheckTx, and then again during DeliverTx!

sunnya97 on 20 Sep 2018

Its a matter of minimizing verifications for the network. The network isn't verifying the signatures / running CheckTx on the proposed block, its only the validators. Full nodes would verify it in the following block once its committed via Deliver Tx.

If validators are upset about the double verification (which they shouldn't be as it pales for them in comparison to bandwidth), its an easy optimization #postlaunch. This would be for saving space in archive nodes, and for saving computation of the entire network.

ValarDragon on 20 Sep 2018

It slows down consensus, because then Tendermint has to talk to the state machine during the consensus process. The ABCI paradigm is to separate consensus from state machine; commit the block as fast as possible, and then pass the committed block to the state machine. I think making the proposer responsible for proposing a valid block is better as only one person has to do the CheckTx, not everyone. And because he will get slashed for invalid proposed Txs, there will not be many committed invalid Txs in practice, making it not much of a burden on full nodes.

sunnya97 on 20 Sep 2018

Why not just have the proposer run CheckTx on all the txs before proposing a block

This effectively happens already in the mempool. An honest proposer will propose a block where all txs pass CheckTx because they are already properly ordered and checked in the mempool.

This is mostly about guarding against malicious/incorrect proposers and reducing the complexity around tracking that a block has bad txs in it

It's not like they can clog the chain with invalid txs. If they were malicious, they could just propose empty blocks instead. They can clog their percentage of the throughput.

Why not? They can fill their blocks to the brim with invalid txs. This is useless spam.

It can be made a slashing condition to propose a block that doesn't pass CheckTx.

Sure, but it's significant added complexity, and so far not in the prelaunch roadmap ...

Because now they're double checking the signatures, once during that CheckTx, and then again during DeliverTx!

If Tendermint guarantees txs pass CheckTx, apps could start changing their DeliverTx logic to not recheck things already checked in CheckTx. This might lead to better separation/clarity between these two ABCI message types!

ebuchman on 20 Sep 2018

It slows down consensus, because then Tendermint has to talk to the state machine during the consensus process.

This is definitely true, but the question is how much, and it may not be true if CheckTx and DeliverTx start to diverge as mentioned above.

Also, we're considering executing blocks in full before voting on them for a Tendermint 2.0, as this would reduce most of the off-by-X nonsense that app devs are currently plagued with.

ebuchman on 20 Sep 2018

👍1

This effectively happens already in the mempool. An honest proposer will propose a block where all txs pass CheckTx because they are already properly ordered and checked in the mempool.

The underlying state might change since when the tx first went into the mempool.

reducing the complexity around tracking that a block has bad txs in it

Why do you have to track blocks that had bad txs? Isn't that already done by the Error Code?

Why not? They can fill their blocks to the brim with invalid txs. This is useless spam.

Not if they get slashed for it.

Sure, but it's significant added complexity, and so far not in the prelaunch roadmap ...

Is this a major concern pre-launch?

If Tendermint guarantees txs pass CheckTx, apps could start changing their DeliverTx logic to not recheck things already checked in CheckTx. This might lead to better separation/clarity between these two ABCI message types!

Not opposed to this, but I'd still rather have CheckTx only run on committed blocks, not active proposals.

Also, we're considering executing blocks in full before voting on them for a Tendermint 2.0, as this would reduce most of the off-by-X nonsense that app devs are currently plagued with.

This gets rid of one of the main efficiencies of ABCI. Now you're executing txs for every round, rather than every height. I think once the off-by-one stuff is handled once, most developers will never have to deal with it

sunnya97 on 20 Sep 2018

👍1

The underlying state might change since when the tx first went into the mempool.

This is why we replay all txs in the mempool after each block is committed.

Why do you have to track blocks that had bad txs? Isn't that already done by the Error Code?

We have to at least notify the application so they can punish the proposer.

Is this a major concern pre-launch?

We have to do at least one of:

run checktx before voting
add a bit-array of invalid txs to the header

Otherwise we can have junk blocks and we wont be able to punish for it. The second option I think would also require more up front design work ...

This gets rid of one of the main efficiencies of ABCI. Now you're executing txs for every round, rather than every height. I think once the off-by-one stuff is handled once, most developers will never have to deal with it

That's right, but it's also mitigated by:

most heights are expected to finish in one round
most future rounds are expected to repropose a previously proposed block, so CheckTx will only need to be run the first time around

So it's unclear how valuable this optimization really is, given the complexity it adds (ie. this conversation, need to slash invalid proposals, off-by-one stuff)

ebuchman on 20 Sep 2018

I genuinely don't think having txs execute each round is a bad thing. It seems like an unnecessary optimization. Recall the only times where we get to a second round are: proposer is offline, there is high overall network delay. In the first situation, there is a clear solution - have the person who would be the second proposer build their own "block candidate". Thats a relatively easy #postlaunch thing to do. In the second situation, the time to recheck a new block is not a big deal, as network delay is the clear bottleneck at hand.

ValarDragon on 20 Sep 2018

This is a fine discussion. We can cache CheckTx signature verifications and state lookups so that the up-front cost of doing these things during consensus is redeemed by faster execution post commit (during DeliverTx).

The reason why we shouldn't do this is because it significantly increases blockchain commit latency. The bulk of the time required to process a block for most applications and the hub is not in the logic, but in signature verifications and state lookups (due to cache misses). By requiring CheckTx during consensus, we are making block commit times longer than they need to be. DeliverTx during consensus would make this worse.

apps could start changing their DeliverTx logic to not recheck things already checked in CheckTx. This might lead to better separation/clarity between these two ABCI message types!

Even after this proposal, for CheckTx you'd still need to run the bulk of the logic of CheckTx in DeliverTx to deduct fees and set nonces, so I don't think it would help clarify distinctions between CheckTx and DeliverTx as already specified.

Also, we're considering executing blocks in full before voting on them for a Tendermint 2.0, as this would reduce most of the off-by-X nonsense that app devs are currently plagued with.

For the same reason as above (namely reducing commit time) we should not do this for launch.

That said, I think there is a way to make this work by pipelining block processing during the prevote step post-prevote but pre-precommit, but (a) this requires a more formal consideration as it changes consensus dynamics, which might even have adverse consequences for future plans like BLS, (b) for optimum performance this creates a tight coupling between consensus step time (e.g. the time of 1 prevote step) vs block processing time. If block processing time were much longer than consensus step time, even with this sort of pipelining the block commit latency would increase.

As is, we have an alternative upgrade path whereby DeliverTx computation happens on validators while consensus for the next block is also happening (another form of pipelining). There are at least two ways to make this work... one involves semi-sticky validators (e.g. each proposer proposes say 10 blocks instead of switching after 1 block), another is for a subset of applications where there are less dependency assumptions between subsequent transactions that span two sequential blocks.

jaekwon on 21 Sep 2018

Even after this proposal, for CheckTx you'd still need to run the bulk of the logic of CheckTx in DeliverTx to deduct fees and set nonces, so I don't think it would help clarify distinctions between CheckTx and DeliverTx as already specified.

Why is this true? You could have the CheckTx deduct fees and set nonces, and let DeliverTx do the rest (assuming we don't reset state on BeginBlock, or BeginBlock happens before these CheckTxs).

For the same reason as above (namely reducing commit time) we should not do this for launch.

Running DeliverTx before prevotes was never in the cards before launch - it's a consideration for Tendermint 2.0 and will probably require some form of pipelining as indicated to make it efficient.

The reason why we shouldn't do this is because it significantly increases blockchain commit latency.

I'm not convinced the throughput on Cosmos in the short term is going to be high enough that running CheckTx before voting will really be a problem. If we don't do it, then we need a strategy for handling blocks with invalid txs or blocks that exceed the max gas, unless we're fine with not punishing proposers for spamming us.

Note it's actually insufficient to just include a bit-array of "invalid txs" in the header because that doesn't distinguish between txs that are genuinely invalid (ie. failed CheckTx) and txs that simply failed to achieve their intended effect (eg. pass CheckTx but fail DeliverTx). We would also need to handle blocks that exceed MaxGas at the application level by returning non-zero codes for all txs past the one that exceeded the MaxGas. Running CheckTx before voting would solve both issues.

ebuchman on 21 Sep 2018

Note it's actually insufficient to just include a bit-array of "invalid txs" in the header

Just realized this is kind of irrelevant. If the app returns a non-zero code, it doesn't need to wait for the next block to punish the proposer. It can just punish it right away, or at the end block.

ebuchman on 21 Sep 2018

Why is this true? You could have the CheckTx deduct fees and set nonces, and let DeliverTx do the rest (assuming we don't reset state on BeginBlock, or BeginBlock happens before these CheckTxs).

It's complicating the system and potentially making bugs much easier. As an example, nonce and fee values during DeliverTx would no longer reflect the current value but some arbitrary future value after all CheckTxs have run, unless I'm misunderstanding your proposal. AnteHandler should run before each transaction, it's the less surprising thing to do, and leaves optionality for additional features that otherwise wouldn't be sensible.

If we don't do it, then we need a strategy for handling blocks with invalid txs or blocks that exceed the max gas, unless we're fine with not punishing proposers for spamming us.

Tendermint could tell the app during BeginBlock what max gas to use, and the app can be responsible for failing txs that exceed the gas amount. This is easy for the app.

I'm not convinced the throughput on Cosmos in the short term is going to be high enough that running CheckTx before voting will really be a problem.

Lets design for maximum usage, not just for the hub, but for other applications as well. There will be times when blocks are full. During this time, latency will increase, otherwise it means we haven't set the max gas high enough. This latency still exists after commit in the current design, but when this becomes a frequent occurrence we have an easy upgrade path like I mentioned with semi-sticky validators -- optionality.

Just re-iterating Sunny's point - this proposal's design creates unnecessary wasted time/overhead when commits don't happen on the first round.

For smart wallets that take advantage of for fast confirmation of transaction commit, it increases avg mempool->commit latency (by the duration of signature checks and state reads).

If the app returns a non-zero code, it doesn't need to wait for the next block to punish the proposer. It can just punish it right away, or at the end block.

+1

jaekwon on 21 Sep 2018

re-iterating Sunny's point - this proposal's design creates unnecessary wasted time/overhead when commits don't happen on the first round.

Considering about the potential extra check time if not commited in round 0. I think maybe it's better to adopt the invalidTxs in the header to punish malicious proposer.
maybe we can use an array such as invalidTxs: [ 1,3,7] which contains the invalid txs' Indexes of the block instead of using a bitArray?
Also , maybe we can use a OutOfGasTxIndex : 800 to show the txs after 800 are all not executed due to out of block limited gas.

And I think define a reCheckTx interface method or add a flag in checkTx to tell the app whether now it's processing an incoming tx or a recheck tx is an optimization. Because in recheckTx, the app doesn't need to check the signature again. So it should be aware of the situation of calling. discussed in #2127

james-ray on 21 Sep 2018

👍1

As an example, nonce and fee values during DeliverTx would no longer reflect the current value but some arbitrary future value after all CheckTxs have run, unless I'm misunderstanding your proposal.

No, my bad, what I said didn't really make sense for this reason.

Tendermint could tell the app during BeginBlock what max gas to use, and the app can be responsible for failing txs that exceed the gas amount. This is easy for the app.

The app controls MaxGas, so we don't need to send it in BeginBlock

I think maybe it's better to adopt the invalidTxs in the header to punish malicious proposer.

Ok. Sounds like we shouldn't move forward on this for now. Proposer punishment will be left completely up to the application for now. We should still complete the ADR summarizing this discussion and the decision.

maybe we can use an array such as invalidTxs: [ 1,3,7] which contains the invalid txs' Indexes of the block instead of using a bitArray?
Also , maybe we can use a OutOfGasTxIndex : 800 to show the txs after 800 are all not executed due to out of block limited gas.

These are cool ideas. Let's bring them to #2383.

I'm going to close this for now. Thanks everyone!

ebuchman on 22 Sep 2018

Just to re-iterate, it looks like punishing bad proposers for invalid txs and/or blocks exceeding the max gas can happen entirely on the application side without any changes to Tendermint. We should continue discussion about invalid-tx-bit-array in #2383, but this would only be relevant for lite-clients, still needs some design, and may not need to happen in the very short term.

@ttmc would love to hear your thoughts on this. Were CheckTx-before-voting and invalid-tx-bit-arrays important for you?

ebuchman on 22 Sep 2018

@ttmc would love to hear your thoughts on this. Were CheckTx-before-voting and invalid-tx-bit-arrays important for you?

I think we can live without them, but I'll ask my colleagues for their opinions: @vrde @kansi @ldmberman

ttmc on 22 Sep 2018

Tendermint: Run CheckTx before voting on blocks

All 25 comments

Related issues