The main spec should explicitly document how consensus critical data gets serialized (e.g before it gets signed or hashed). Additionally, test-vectors should be provided in the spec (not only in code/tests).
It is critical for implementations in other languages to understand how exactly this data gets serialized. I'm aware of the move from amino to protobuf but this doesn't make this issue obsolete: protobuf does not guarantee deterministic serialization, especially not between different language implementations.
It is essential to provide a brief yet complete specification of any consensus critical data serialzation. Providing a similar description and test-vectors in the spec as this would be cool:
https://github.com/davidiw/libra/blob/a101ebf5253a4c5da2cc1cf289147e562cd8c01c/common/canonical_serialization/README.md (note this doc is still WIP).
Alternatively, the spec can refer to the (soon to be) proto messages and explicitly mention in what deterministic fashion they are expected to be serialized (e.g. mention that the field order is binding, unknown fields are forbidden and maybe a few other things that potentially could introduce non-determinism).
Test-vectors should still be provided for every type and particularly for all edge cases. To increase readability, these test-vectors could also be referenced instead of explicitly listed in the spec.
The main spec should explicitly document how consensus critical data gets serialized (e.g before it gets signed or hashed).
been going through the spec I hope to have a PR ready tomorrow for review. There are quite a few updates that will be made and a encoding section similar to what the blockchain section currently has will be made for all reactors.
@adlerjohn reminded me of the existence of this related doc by Regen / @aaronc:
@marbar3778 do you think we should move this issue to the spec repo?
Most helpful comment
@marbar3778 do you think we should move this issue to the spec repo?