Take as the desideratum of our lite client commit verification algorithm that for it to be the case that a lite client which checks in at least once per unbonding period is successfully fooled, it must necessarily be the case that a certain fraction of bonded or unbonding stake has committed an equivocation and will be punished if the lite client publishes that equivocation to the chain. ("fooling a lite client is always costly")
As I understand it, the current mechanism by which our lite client verification algorithm bisects headers, checks the difference in validator sets, and skips intermediary header verification violates this property: it is possible, in certain cases with sufficient validator set flux, for a portion of stake to successfully fool a lite client at zero cost.
Consider the following example:
A Tendermint chain started at block 0 and is currently at block 20, with all blocks within a single unbonding period. At block 10, the chain had some validator set V_10, and at block 20 the chain had some validator set V_20. There is less than 1/3 overlap between V_10 and V_20 - i.e., in between blocks 10 and 20, the validator set has changed by 2/3.
Consider a lite client with the correct root-of-trust for block 10, now connecting to a full node and trying to verify the state at block 20 using the bisection optimization.
An honest full node will report V_20 correctly and the lite client will notice that the validator set has changed by a sufficient amount, request intermediary headers, and check that the appropriate next-validator-set hashes were signed.
However, a dishonest full node working in conjunction with 2/3 of the validator set of block 10 can instead report V'_20 where V'_20 overlaps with V_10 by 2/3. The lite client will calculate that the validator set has changed by less than 1/3, not request intermediary headers, and accept whatever state root V'_20 signed.
The lite client can now be costlessly fooled - V'_20 can sign whatever it wants, without committing any equivocation at all, since at least 2/3 of V'_20 aren't actually bonded at block 20!
I haven't read all the lite client code, so this might not be exactly correct, but I think this class of attack works for any case where the lite client skips intermediary header verification by checking validator set overlap and where a now-unbonded fraction of a validator set of an earlier block could sign a later block header without committing an equivocation.
The only way I see to preserve the safety of bisection is to slash validators for signing headers while unbonding - Tendermint would need to track validators in the unbonding state, and if any signature of a header of a height when they were in that state is discovered (not necessarily a double-signature, any signature at all), Tendermint must consider that a slashable fault and report it to the state machine for punishment (presumably equivalent to punishment for regular equivocation).
As a short term quick fix, we could also disable lite client bisection and verify all headers.
cc @ebuchman @jaekwon @sunnya97
So my POV on this has been that counter factual slashing should ideally be implemented in the SDK not in Tendermint.
it's a detail of the generalized Byzantine behavior that the needs to be expressible in the SDK and specific to the proof of stake implementation.
Vitalik was pointing out that another possible way to do things is delay voting power changes for the signing window.
So my POV on this has been that counter factual slashing should ideally be implemented in the SDK not in Tendermint.
At first glance this seems plausible - the SDK is already tracking the unbonding validators and has their consensus signing keys, we would only need to add parsing for the relevant Tendermint data structures and then evidence of a signature-while-unbonding could be submitted via transaction.
it's a detail of the generalized Byzantine behavior that the needs to be expressible in the SDK and specific to the proof of stake implementation.
It does seem somewhat application-specific, but so far we've tried to write a Tendermint-generic lite client, so any Tendermint lite client (desiring this safety property) will need to worry about this.
Great discussion.
For now can we assume (and DISCLAIM) that the validator set won't change too much within the unbonding period? I don't think it would be a bad assumption for the cosmos hub after launch. It could be a problem for some applications but I don't foresee any concrete problems on the horizon.
It could/should be an offense to sign a block header which includes the wrong ValidatorsHash (or something like that). That could just be handled by Tendermint as another kind of Evidence, with vote(s) and header w/ invalid .ValidatorsHash for the given .Height., and the SDK can deal with punishing.
For now can we assume (and DISCLAIM) that the validator set won't change too much within the unbonding period? I don't think it would be a bad assumption for the cosmos hub after launch. It could be a problem for some applications but I don't foresee any concrete problems on the horizon.
This seems likely in practice - and maybe even desirable to enforce in the state machine - but at the moment we don't have any guaranteed bound; if delegators changed their mind the entire validator set could change in two blocks.
If you mean that we could release the lite client as-is, with a disclaimer that malicious-full-node security is only provided under the assumption that the validator set has not changed too much, that doesn't seem too dangerous to me for the short term.
It could/should be an offense to sign a block header which includes the wrong ValidatorsHash (or something like that). That could just be handled by Tendermint as another kind of Evidence, with vote(s) and header w/ invalid .ValidatorsHash for the given .Height., and the SDK can deal with punishing.
That seems like it deals with this case (and maybe would be easier to implement?) - in general, however, I do wonder if we should just slash unbonding validators for signing any block header with their consensus keys. I don't see any reason they would have for doing so, and it reduces the set of possible signatures the lite client security model has to reason about.
If you mean that we could release the lite client as-is, with a disclaimer that malicious-full-node security is only provided under the assumption that the validator set has not changed too much, that doesn't seem too dangerous to me for the short term.
That's what I mean, yes.
That seems like it deals with this case (and maybe would be easier to implement?) - in general, however, I do wonder if we should just slash unbonding validators for signing any block header with their consensus keys. I don't see any reason they would have for doing so, and it reduces the set of possible signatures the lite client security model has to reason about.
That seems like another kind of offense we should punish for... but even bonded validators can sign bad blocks w/ the wrong ValidatorHash, which should be punished because it could contribute to an attack.
Opened #3259 to track an immediate fix at some performance hit.
One concern is how this evidence would be detected and included in blocks in the first place, since attackers just need to send it to light clients, not to the full nodes. If specific lite clients are being targeted and simultaneously eclipsed, the evidence may never get out in time. I think this is what justifies it being an application-specific concern, since the client needs proof that the signers are still validators, ie. that the validator set changes slowly, or that it's determined sufficiently far in advance. For apps that allow it to change too quickly, bisection may not be possible?
So for the Tendermint lite-client to safely update validator sets using bisection, it will need to rely on some injected function from the application. We can do this easily for the SDK since it's in-process, but it won't yet work with non Go apps unless we expose the lite API out-of-process. We can also push for Tendermint lite client implementations in all the languages as the way to unlock bisection :unlock:
Note we can't use a new evidence type for this without also adding better evidence handling to ABCI (eg. CheckEvidence). For apps in other languages, they'd have to implement it as txs (and verify the vote signature).
Whoops finger slipped.
but even bonded validators can sign bad blocks w/ the wrong ValidatorHash, which should be punished because it could contribute to an attack.
We should probably add an evidence type for this, but it also won't necessarily be detected in the consensus pkg for anyone other than the proposer, so will similarly need to be reported by lite clients themselves
One concern is how this evidence would be detected and included in blocks in the first place, since attackers just need to send it to light clients, not to the full nodes. If specific lite clients are being targeted and simultaneously eclipsed, the evidence may never get out in time.
The light client will have a trust period which is less than the unbonding period, and the gap is how much time they have to publish what it's seen, which may have been malicious.
I think this is what justifies it being an application-specific concern, since the client needs proof that the signers are still validators, ie. that the validator set changes slowly, or that it's determined sufficiently far in advance. For apps that allow it to change too quickly, bisection may not be possible?
Not sure what you mean, this works for arbitrary validator set changes.
So for the Tendermint lite-client to safely update validator sets using bisection, it will need to rely on some injected function from the application.
It just needs to persist the votes that it sees, and later it needs to broadcast everything in case any votes were punishable (e.g. signed w/ the wrong valset hash). So it doesn't need anything from the app during bisection -- the malicious evidence can be submitted to the chain later.
This evidence needs to be in-app for now since Tendermint doesn't know about unbonding periods yet, but I imagine this can migrate into Tendermint in the future.
but it also won't necessarily be detected in the consensus pkg for anyone other than the proposer,
Why would the proposer detect it?
It's not just bisection, but the entire lite client which is unsafe without "counterfactual slashing".
We want the following to hold: Arbitrary app state commits require > 2/3 bonded to be slashed.
Yet, if we don't have counterfactual slashing, only +1/3 is required to commit an invalid NextValidatorSet, and then the next block can have an arbitrary state. (Since 1/3 can precommit an invalid header w/ an invalid NextValidatorSet, and while this doesn't need to get committed, later only an addition +1/3 double0-signing is necessary to precommit the same invalid header).
So, we need counterfactual slashing, not just for bisection, but for +2/3 protection against arbitrary state commits.
Hmm, that makes sense, though at least in the no-bisection case the +1/3 double-signing the invalid header would be slashable, if I understand the scenario correctly.
Arbitrary app state commits require > 2/3 bonded to be slashed.
Is this a useful property? 2/3 of stake can commit arbitrary things, but they can also censor any evidence, so I don't know if making invalid app state commits slashable actually adds much security (although it could encourage individual validators to better defend against accidentally signing invalid app state commits).
The light client will have a trust period which is less than the unbonding period, and the gap is how much time they have to publish what it's seen, which may have been malicious.
There are few potential issues here:
It seems that counter factual slashing would require fork accountability mechanisms.
Note the following scenario:
If at some later point in time, f faulty processes create precommit for B for round r they can create a fork and these guys haven鈥檛 equivocated as they haven鈥檛 voted in round r. They don't need to create a fork on main chain, they can use it to trick lite client.
We can detect this only with fork accountability algorithm as we need to look at complete
transition history as this new precommit message will be there.
How to cope with this kind of attack? Lite client that thinks that block commit at height h is B will ask other node for header at height h
and realised that it is A. In this case client need to ask full node for commit for A.
Note that looking at commit for A and commit for B we can鈥檛 say who is faulty. The only way
we can conclude this is by fork accountability algorithm that will inject all those
messages in validators history and detect that there are f+1 faulty processes. (edited)
If I'm not mistaken, counterfactual slashing itself is not enough!
In the original case outlined by Chris, +2/3 of V_10 are no longer part of V_20, and thus can counterfactually sign V'_20. But consider if they were still part of V_20, only they now account for < 1/3. In this case, they may still fool a light client by signing for V'_20. It wouldn't be counterfactual because they are a legitimate part of the validator set at V_20, and it is not double signing, since they didn't need to sign for V_20 (ie can progress without them). Thus even if we punished counterfactual slashing, it would not be sufficient to cover this bisection attack.
We discovered this on a call a few weeks back but have not written it down yet.
Here is a spreadsheet about the different failure cases (fork, invalid state) for different clients (full node, sequential lite client, bisecting lite client) across different conditions (static validator set, changing validator set): https://docs.google.com/spreadsheets/d/1ZeMgE6EEi9EdWJz0t2IzZ6t7YdYRDnv9BpimnSnws6Q/edit?usp=sharing . It could probably be improved, but it helps lay things out.
So I believe the things we need to punish for are:
Note this means we don't need to punish for counterfactual signing in general, only if it's for invalid state, but perhaps would be best to punish for any counterfactual signing regardless. Then we have three distinct evidence types:
I believe the same argument applies about the issue Jae raised for the sequential lite client - if you want +2/3 to be slashable for signing invalid state, you need to punish for invalid state, otherwise, you could trick a sequential lite client into accepting invalid state with only +1/3 double signing.
In the original case outlined by Chris, +2/3 of
V_10are no longer part ofV_20, and thus can counterfactually signV'_20. But consider if they were still part ofV_20, only they now account for < 1/3. In this case, they may still fool a light client by signing forV'_20. It wouldn't be counterfactual because they are a legitimate part of the validator set atV_20, and it is not double signing, since they didn't need to sign forV_20(ie can progress without them). Thus even if we punished counterfactual slashing, it would not be sufficient to cover this bisection attack.
This seems correct to me (and a novel case, or at least as of when discussed a few weeks ago).
Another reason that we really need to formalize the light client model relative to the messages a full node could potentially see and conclusions it would come to, as compared to the light client, so we can exhaustively analyze all these cases - Zarko has also identified another case which I believe does not fall into any of your three final categories (one validator equivocating, others performing an "amnesia attack") but is potentially a concern for the light client - https://github.com/tendermint/tendermint/pull/3840/files#diff-874e3fc9d8b56bd1636f20df18caf47eR99
Unless we add more verification data to Commits, I think amensia is just double signing from the light client perspective, except it can't detect it until a +1/3 attack succeeds (unlike equivocation, which is easily detected). The light client doesn't care about rounds, just heights. See https://github.com/tendermint/tendermint/pull/3840#discussion_r310805299
Unless we add more verification data to Commits, I think amensia is just double signing from the light client perspective, except it can't detect it until a +1/3 attack succeeds (unlike equivocation, which is easily detected). The light client doesn't care about rounds, just heights. See #3840 (comment)
The difference is in the "amount" of fault accountability, since only one faulty validator needs to actually equivocate. That's a different dimension that just being fooled or not, but it seems like it might be relevant since it could mean that the cost of attack is low.
The amnesia is still detectable, but only after a fork has actually succeeded. There's no real difference between (f+1 equivocating) and (1 equivocating & f amnesia) other than that the latter requires a bit more work (ie. full fork accountability protocol) to detect. So both should still be fully accountable, just one requires more resources and only works after a full attack succeeds (whereas equivocation we can detect any amount even if the fork doesn't succeed)
I think I mean something else by
the "amount" of fault accountability
than you do; I just mean that the current slashing logic (which does not slash for amnesia) would result in far less punishment in the latter case; that would be remedied if we do slash for amnesia.
Given that punishing validators for signing invalid headers is apparently necessary for bisection, how do we actually do that? Suppose there is a correct header at height 10, H_10. Note that it's not a crime to sign a different header, say H'_10, than the one that actually gets committed, even in the same round, because there could be conflicting (but otherwise valid) proposals. So we can't just detect the conflicting signed header, we have to actually show that it is invalid with respect to block execution, in particular that it has the wrong validator set (is that enough?). Since the validator set for height 10 is included in the prior block, ie. H_9.NextValidatorSet, is it sufficient to check that H'_10.LastBlockID.Hash = H_9.Hash()? Is it possible for their to be a valid conflicting H'_9 at this point, for instance one that included a different set of votes for height 8 in its LastCommit? I'm not sure yet because of all the off-by ones. If we can't rely on such a check, I think we'd need to make some breaking change. Possible solutions might include:
1) include the previous block's NextValidatorSetHash in the vote.SignBytes
2) limit how much a validator set can change over some time
3) add a challenge game to force validators to prove their headers are valid
Will keep thinking about this ...
Most helpful comment
There are few potential issues here: