Eth2.0-specs: Invalid empty attestation signatures consensus split

Created on 8 Apr 2020 · 18Comments · Source: ethereum/eth2.0-specs

Some context:

v0.11.1 had a test helper function change an empty signature from 0xc000... to 0x0000. Completely zero, instead of valid but empty.
Up to v0.11.0 attestations with no participants were considered valid for some reason, and we had a testcase covering it. (I don't agree with it being a valid case though)
v0.11.1 also misses the post-state file for some reason, making the empty aggregation bits an anvalid case. And clients pass it, not because they added the rule, but because the 0x0000 signature is invalid.
I stumbled on this after testing it in a 3rd implementation (ZRNT, Go still, but with different state representation, more like remerkleable)

The scope of the bug:

The BLS spec has a pre-condition that states that verifying with 0 pubkeys are undefined behavior. Precondition: n >= 1, otherwise return INVALID. (Thanks @hwwhww for retrieving this from the specs)
The previous Eth2 spec versions considered empty attestations included in blocks valid. There was an empty_aggregation_bits test case that described it as valid: https://github.com/ethereum/eth2.0-spec-tests/tree/v0.11.0/tests/minimal/phase0/operations/attestation/pyspec_tests/empty_aggregation_bits
Py-ecc previously handled the 0 pubkeys case as a valid case; the 0xc00000.... signature is valid for no pubkeys.
The network spec requires attestations on gossip to have exactly 1 bit set, so the network spec is not affected
The beacon spec just verifies the empty signature. Not all empty signatures are valid. 0x000... is not. 0xc000... is.
Prysm and ZRNT ignore the signature if no bits are set. (ZRNT with a TODO statement): Prysm ZRNT
Lighthouse puts the attestation signatures in a SignatureSet, and then verifies it as a whole. This would allow the 0xc000 signature to pass without pubkeys I think. But not other invalid signatures.
Other clients have to be checked, there could be more edge cases. If we can agree on what it should be, I will write new spec tests to cover it.

Action points:

Empty signatures are a dangerous new edge case to attestation processing. Requiring the BLS library to handle the undefined behavior well, or the client to implement the edge case correctly (not like Prysm and ZRNT). We need to be more explicit about this behavior, and possible make it invalid. There should be no reason to include no-participant attestations on chain. Size is not the problem as proposers can already fill their block with duplicates, but just having the empty-signature case at all is bad.
A chain split could be caused between Prysm and Lighthouse if someone mines a block with an attestation containing an empty bitfield, but an invalid signature. (Which could be a valid looking one still, but not valid for empty set of pubkeys). Prysm will stay on that chain, lighthouse will split away from that chain.

Labeling it as a bug, although the spec is not necessarily wrong, it is definitely a case that should get attention to avoid clients splitting chains in the future.

@djrtwo @CarlBeek

bug phase0 BLS

Source

protolambda

Most helpful comment

Copy-paste the discussion on #1799 https://github.com/ethereum/eth2.0-specs/pull/1799/files#r424421674

@mratsim:
Do we define the empty signature somewhere? Is that 0xc000...

@protolambda:
The python here would mean all zeroes (every ssz type is all zeroes by default). But agree that this is way to vague and can be confused with an empty signature.

@hwwhww:
I remember we had EMPTY_SIGNATURE constant in 2019. I would be happy to add it back.

@mratsim:
We can have, NO_SIGNATURE(0x0000...) and EMPTY_SIGNATURE (0xc000...) for clarity

@kirk-baird:
I've been treating EMPTY_SIGNATURE as (0x0000...) as I believe that's what we called in it the spec in 2019. I'm happy to do a re-name and make

NO_SIGNATURE (0x0000...) (always verifies false)

EMPTY_SIGNATURE (0xc000...)

However, we have to be careful here as AggregatePublicKeys([]) will always verify false according to the BLS Standard.

So EMPTY_SIGNATURE will only verify true against a list of EMPTY_PUBLIC_KEYS. We must have 1..* EMPTY_PUBLIC_KEYS verifying true. (Note: or some other combination of public keys that add to 0 mod r).

@mratsim:
Or EMPTY_SIGNATURE(0x0000) and ZERO_SIGNATURE (0xc000) I'm not married to any name.

So far, we have two non-signature cases in phase 1:

BeaconBlockBody.light_client_signature, when len(signer_pubkeys) == 0
- It is currently handled with the following code when verifying:
  
  python if len(signer_pubkeys) == 0: assert block_body.light_client_signature == BLSSignature() return else: assert bls.FastAggregateVerify(signer_pubkeys, signing_root, signature=block_body.light_client_signature)
ShardTransition.proposer_signature_aggregate, when len(shard_blocks) == 0
- It’s currently assert bls.AggregateVerify(pubkeys, signing_roots, signature=transition.proposer_signature_aggregate)

My 2 cents

Optional 3 is good for attestation aggregation, however, for phase 1 cases, IMO I'm inclined to apply option 2 for these cases.

It's clear that AggregateVerify(PKs, messages, signature)/FastAggregateVerify(PKs, message, signature) is invalid when PKs = [] according to the BLS standard, so we cannot set the non-signature to 0xc0... and make it pass the verification.
0x00.. is an invalid signature in BLS verification, and it can remind us to explicitly take care of the non-signature case. We can define some Eth2-only helpers:

def optional_fast_aggregate_verify(pubkeys, message, signature) -> bool:
    if len(pubkeys) == 0:
        return signature == NO_SIGNATURE
    else:
        return bls.FastAggregateVerify(pubkeys, message, signature)

def optional_aggregate_verify(pubkeys, messages, signature) -> bool:
    if len(pubkeys) == 0:
        return signature == NO_SIGNATURE
    else:
        return bls.AggregateVerify(pubkeys, messages, signature)

So it would be crystal clear that PKs == [] is possible. 🙂

hwwhww on 14 May 2020

👍2

All 18 comments

There are going to be places in the protocol in the future where there will be a single aggregate signature, and not a list of aggregates; for example even in phase 1 there's the light client aggregate. And so in such a situation, the degenerate case is not going to be an empty list of attestations as it is elsewhere, it will be a single aggregate signature that is empty. To avoid requiring higher-layer extra logic around that case, it seems to me reasonable to make zero a valid signature for an empty list of pubkeys and messages. This case would get actually triggered when eg. somehow no one from the light client committee shows up to get included in a block.

We could add a separate rule banning empty attestations from phase 0 if desired; seems pointless to allow them.

vbuterin on 8 Apr 2020

We could add a separate rule banning empty attestations from phase 0 if desired; seems pointless to allow them.

But leaving them in does not break anything, and introducing breaking changes at this point should imo only be done to fix serious issues.

Preventing 0 attestations barely affects the scope of behavior and attack surface.

A rational actor would find more valuable attestations anyway, and not want to pack the block overly full, increasing the size of the block and time to compute.

As for the argument of “packing” the block with crap, the proposer could easily just put duplicate attestations (of high participation and thus more computational load) or just put a bunch of single bit attestations.

djrtwo on 8 Apr 2020

I am not concerned about the spam as much, but about it being a weird edge case that already has clients to disagree now, and likely in the future if we leave it as-is. Being more explicit about validity conditions would be good, especially because the BLS spec itself has its own requirements (the >= 1 case). And many libraries, such as the two Go version, simply crash or give undefined answers on empty pubkey list inputs. Having clients improvise to interpret the spec is what caused the current potential chain split.

protolambda on 8 Apr 2020

But leaving them in does not break anything, and introducing breaking changes at this point should imo only be done to fix serious issues.

That's fair. Especially with compression, an empty attestation is close to zero overhead anyway. So I'd be happy with specifying that BLS aggregate signatures with 0 pubkeys are in general allowed.

vbuterin on 8 Apr 2020

Giving some feedback as per @protolambda's request :)

it seems to me reasonable to make zero a valid signature for an empty list of pubkeys and messages

This also seems reasonable to me. I assume the intention is that _only_ 0x00..00 is valid for 0 pubkeys? I prefer this otherwise we have some weird form of signature malleability.

I am not concerned about the spam

Same for me.

Being more explicit about validity conditions would be good

Agreed.

paulhauner on 9 Apr 2020

👍1

Being more explicit about validity conditions would be good, especially because the BLS spec itself has its own requirements (the >= 1 case).

This would make sense and I would agree with this, being explicit here is a definite plus. I would be ok with verifying that a signature was 0xc00... if no bits were set in the accompanying bitfield.

nisdas on 15 Apr 2020

👍1

There are 2 layers where invalid signatures are handled:

Low-level: Milagro, MCL
High-level: The client

Currently due to testing, fuzzing (and invalid signatures in deposits?) we have a wrapper to handle valid signatures at a low-level (i.e. point on the G2 curve) and invalid signatures but that we accept because of protocol or test/fuzzing requirements. We may add in the future another case with lazy loaded signatures (i.e. we stored it in a database on shutdown and reload it in the future).

type
  BlsValueType* = enum
    Real
    OpaqueBlob

  BlsValue*[N: static int, T] = object
    # N = 48 for public keys and N = 96 for Signature
    case kind*: BlsValueType
    of Real:
      blsValue*: T
    of OpaqueBlob:
      blob*: array[N, byte]

In the low-level layer, it is very easy to check if a signature is valid or not: it's not a point on the curve and the underlying library will not parse it. For example 0x0000... (full-zero) is invalid while 0xc000... is valid and a point at infinity. This is derived from elliptic curve maths and so does not need to be specified.

At the high-level layer, there is a huge potential for divergence and as @nisdas said, being explicit is important as are test cases.
In particular what is the zero signature:

a point at infinity 0xc000
a binary zero blob 0x0000
in which case is it valid?
- when accompanied by an empty aggregation bitset?

mratsim on 23 Apr 2020

👍2

Have we reached a conclusion yet?

From my point of view I think we have two options which I prefer option 1:

Option 1

Request a change to the BLS specs such that if n = 0 i.e. PublicKeys = [] then [] aggregates to the identity. As it makes aggregation simpler i.e.

aggregate_public_key = 0
for key in public_keys:
    aggregate_public_key += key

Rather than specifically handling the n = 0 case.

If this were the case we would simply have

AggregateSignature = 0x00... always return false as it is not a valid point on curve.
AggregateSignature = 0xc00... return true for PublicKeys [] and [0xc00...] (i.e. PublicKey at point 0)

Option 2

Leave BLS spec as is and handle in the Ethereum layer the case where AggregateSignature = 0x00 and publickeys = [] specifically to return true before interacting with the BLS layer.

kirk-baird on 29 Apr 2020

I am opposed to Option 1 as it requires changes in the BLS spec (big or not, it's friction), and it doesn't solve for the real issue: clients have to get this right and not rely on libraries to work-around. If one library accepts 0 pubkeys, the other doesn't, and yet another is somewhere in between, then clients have to track those edge cases to stay in consensus. And properly implement work arounds.

Option 2 is better, as it does not require anything from the BLS spec, and is super straigtforward to implement and test:

0 participants is invalid case, unless signature is 0x000...
> 0 participants is valid only if the signature is valid for those pubkeys.

And better (IMHO) would be an Option 3, to remove one branch of decision making:

0 participants is an invalid case, no exceptions
> 0 participants is valid only if the signature is valid for those pubkeys.

protolambda on 29 Apr 2020

I am opposed to Option 1 as it requires changes in the BLS spec (big or not, it's friction), and it doesn't solve for the real issue: clients have to get this right and not rely on libraries to work-around. If one library accepts 0 pubkeys, the other doesn't, and yet another is somewhere in between, then clients have to track those edge cases to stay in consensus. And properly implement work arounds.

From this option all logic would now be in the BLS layer and none in the Ethereum layer which is appealing. However, as you mentioned you would like public keys [] to return false which this will not.

I like this option mostly as I think the BLS spec is not implementation friendly in its current form and I would like to change it anyway.

And better (IMHO) would be an Option 3, to remove one branch of decision making:

0 participants is an invalid case, no exceptions
> 0 participants is valid only if the signature is valid for those pubkeys.

I do see the appeal of option 3 over option 2 in avoiding changes to BLS spec. And in that AggregateSignature 0x00.. will always be false and as the BLS spec is PublicKeys [] will also always be false.

kirk-baird on 29 Apr 2020

From this option all logic would now be in the BLS layer and none in the Ethereum layer which is appealing. However, as you mentioned you would like public keys [] to return false which this will not.

I like this option mostly as I think the BLS spec is not implementation friendly in its current form and I would like to change it anyway.

Do you anticipate that it would be easy enough to get this included in the spec ? I would prefer to have everything under our control if possible,and the best way to do that would be to handle this in the application layer.

I would prefer option 3 as it is mostly straightforward to implement and I do not see any valid case for having an attestation with 0 participants. Its effectively just junk in the block that client's will have to process and verify.

nisdas on 29 Apr 2020

Do you anticipate that it would be easy enough to get this included in the spec ? I would prefer to have everything under our control if possible,and the best way to do that would be to handle this in the application layer.

I'm confident at this stage if we wished to change the BLS spec it would get done. Though we are looking to freeze phase 0 and BLS spec is a bottleneck so maybe it's best to go with option 3 which as you say will also give us more control.

Additionally, if we go with option 3 and handle this case without interacting with BLS then it will be irrelevant if BLS makes any changes in this regard.

kirk-baird on 29 Apr 2020

👍1

Ok, I'm making a PR for option 3 (empty set of participants is invalid as attestation to go onchain). If we need it to behave differently for some unforeseen phase 1+ case, we can simply start allowing the type of attestation we need, with backwards compatibility in regards to attestations already on-chain. Also, we indeed avoid being affected by BLS changes and library issues in any way, which is great.

protolambda on 1 May 2020

👍2

What about the issue brought up by vitalik early in the thread? We know of cases where empty signatures will have to be included on chain.

In such a case, I suppose we have to be explicit -- e.g. if light_sig == 0x00: return at the top of process light sigs function.

EDIT: The fix in #1780 is specific to this one issue. If we go Option 3, we'll need to address each type of potentially empty aggregate specifically. This is probably fine

djrtwo on 4 May 2020

Copy-paste the discussion on #1799 https://github.com/ethereum/eth2.0-specs/pull/1799/files#r424421674

@mratsim:
Do we define the empty signature somewhere? Is that 0xc000...

@protolambda:
The python here would mean all zeroes (every ssz type is all zeroes by default). But agree that this is way to vague and can be confused with an empty signature.

@hwwhww:
I remember we had EMPTY_SIGNATURE constant in 2019. I would be happy to add it back.

@mratsim:
We can have, NO_SIGNATURE(0x0000...) and EMPTY_SIGNATURE (0xc000...) for clarity

@kirk-baird:
I've been treating EMPTY_SIGNATURE as (0x0000...) as I believe that's what we called in it the spec in 2019. I'm happy to do a re-name and make

NO_SIGNATURE (0x0000...) (always verifies false)

EMPTY_SIGNATURE (0xc000...)

However, we have to be careful here as AggregatePublicKeys([]) will always verify false according to the BLS Standard.

So EMPTY_SIGNATURE will only verify true against a list of EMPTY_PUBLIC_KEYS. We must have 1..* EMPTY_PUBLIC_KEYS verifying true. (Note: or some other combination of public keys that add to 0 mod r).

@mratsim:
Or EMPTY_SIGNATURE(0x0000) and ZERO_SIGNATURE (0xc000) I'm not married to any name.

So far, we have two non-signature cases in phase 1:

BeaconBlockBody.light_client_signature, when len(signer_pubkeys) == 0
- It is currently handled with the following code when verifying:
  
  python if len(signer_pubkeys) == 0: assert block_body.light_client_signature == BLSSignature() return else: assert bls.FastAggregateVerify(signer_pubkeys, signing_root, signature=block_body.light_client_signature)
ShardTransition.proposer_signature_aggregate, when len(shard_blocks) == 0
- It’s currently assert bls.AggregateVerify(pubkeys, signing_roots, signature=transition.proposer_signature_aggregate)

My 2 cents

Optional 3 is good for attestation aggregation, however, for phase 1 cases, IMO I'm inclined to apply option 2 for these cases.

It's clear that AggregateVerify(PKs, messages, signature)/FastAggregateVerify(PKs, message, signature) is invalid when PKs = [] according to the BLS standard, so we cannot set the non-signature to 0xc0... and make it pass the verification.
0x00.. is an invalid signature in BLS verification, and it can remind us to explicitly take care of the non-signature case. We can define some Eth2-only helpers:

def optional_fast_aggregate_verify(pubkeys, message, signature) -> bool:
    if len(pubkeys) == 0:
        return signature == NO_SIGNATURE
    else:
        return bls.FastAggregateVerify(pubkeys, message, signature)

def optional_aggregate_verify(pubkeys, messages, signature) -> bool:
    if len(pubkeys) == 0:
        return signature == NO_SIGNATURE
    else:
        return bls.AggregateVerify(pubkeys, messages, signature)

So it would be crystal clear that PKs == [] is possible. 🙂

hwwhww on 14 May 2020

👍2

Yes, being explicit like that is great, and avoids weird edge cases between implementations in the BLS spec, as readers have many different assumptions about 0 pubkeys, and the BLS spec does not cover it. With phase0 I prefer option 3, as it is easier to open it back up, than restrict it (since existing signatures still would need to be valid). For phase 1 that approach looks good.

protolambda on 14 May 2020

@protolambda I think it's okay to close this issue?

hwwhww on 20 Jul 2020

@hwwhww yes, this was resolved. There are still surprises with the infinity-point pubkey/signature for implementers though, especially when changing to new BLS libraries (BLST). @paulhauner was looking into this. If you've any edge cases or things that should be documented/tested, please make a new issue and we'll look into it.

protolambda on 20 Jul 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Justifed state in LMD GHOST

djrtwo · 3Comments

Eth1 voting when block_hash is junk

paulhauner · 4Comments

Add "design goals" to the README?

ralexstokes · 3Comments

Clarification on the initiating client of `Hello`

mhchia · 3Comments

Proof of Possession and BLSVerify clarification

mratsim · 5Comments