Currently, a random avm could be deployed, because no checking is made over it. The same applies to verification scripts.
I think that a basic checking should be enforced on Neo 3.0 scripts, for example, parsing the opcode structures to verify that at least all opcodes exist (on deploy time), and are minimally well-formed (those that have static parameters, check them too). This avoids some future problems and increases network script quality, as invalid opcodes cannot be stored, to be randomly enabled on future usages.
Interops can be checked as well, if they exist by the time the deploy is made.
Finally, in a next step, unreachable code can also be checked in advance, avoiding useless space to be spent on network contracts.
Even if we don't check when we deploy, the wrong contract will fail when it is executed, isn't it?
Yes, it will fail on runtime, but it's straighforward to parse avm and it makes no sense to allow a contract deploy with unexisting opcodes (as it may become a future attack on previously unversioned opcodes). It will probably report a mistake if wrong compiler is used, and keep networks cleaner (including TestNet 3.0). It also complicates the lives of blockchain explorers, that need to give precise information on the workings of every script (how to interpret a malformed avm?).
What do you mean by the "attack"?
You publish an opcode that is not used and later it is implemented.
Someone can always publish contracts with a failing behavior (due to an unexisting opcode), that changes in the future when that new opcode is introduced. This can happen easily for example, when a new vm is tested on TestNet and will be later introduced on MainNet. In the meanwhile, a failing contract can be introduced on the mainnet, and its behavior is changed when the new vm is updated on mainnet. This may lead to different storage values, notifications, transaction output states, basically everything. This may happen intentionally or not intentionally, the point is that this is easily prevented by simply parsing the avm on Deploy time.
This doesn't avoid all possible future breaking changes, but at least blockchain explorers and users can clearly parse all existing scripts.
Someone publishes a contract with unexisting opcodes, and you failed the deployment. But later, the opcodes are introduced, then the deployment is successful. What's the difference?
That's why two things are needed: (1) storing transaction final states (HALT, FAULT,...) on a Patricia Tree or Block header/body so these are never changed (2) A minimal versioning system in VM (or application layer), that asserts which opcode exists in which block range (example https://github.com/neo-project/neo-vm/issues/142)
Storing tx execution states is quite easy and not a big overhead (a single byte per tx), and only this could already resolve the situation you mentioned (and 99% of others). The rest can only be efficiently dealt with (2).
Note that I'm not mentioning to store tx output (that may be bigger than a byte) or notifications... but at least HALT/FAULT should be very clear. The best place to put it, I don't know... it could be headers or block body (but this requires pre-execution), or any tree state released after. I think this is more related to this issue here: https://github.com/neo-project/neo/issues/302
If you store transaction final states, the attack is disappearing in both case.
Ok, just to be very precise in the discussion here. The proposal is not to avoid any attacks (that can be discussed on other threads), but simply to guarantee that the proposed script is "minimally well-formed" at the moment it is put on the chain.
This helps blockchain explorers and provides a cleaner chain, and introduces a minimum overhead of parsing the avm at verification/deploy times.
Check the scripts on deployment will reduce the tps, but check it in explorers won't. They can do the same thing, I think.
A fair vision Erik. RPC nodes could do this job too and help filtering the network better. Many ways to do this, let's leave this for the future :+1:
Maybe we can create a hash, or something like that and check it before deploy, on valdation, so you need to provide both, the script and the hash
Maybe we can add a CRC at the end of the script
I think that the idea of an RPC endpoint to do this check is better, this way we can both ensure the code is correct and don't waste resources in double checks.
@igormcoelho what do you think if we create an RPC method for this verification?
In fact, it's not "integrity" in terms of "corruption", it's related to the fact of someone using opcodes that doesn't exist in this moment (so it cannot be considered a valid script to deploy). Since all opcodes are known at compile time (thanks, no OP_EVAL), we can be sure that script is well formatted. The cost should be linear on number of operations, meaning it's super fast in my opinion.
There's still one thing we would need to do, to verify deploy without actually executing it. Suppose someone pushes a Script to stack... then its operands.... then it reverses things, swaps, send to altstack.... and only then, it invokes Contract.Create. The truth is, we don't really know which are the parameters for SYSCALL, when they come from stack (so, we won't be able to validate the "correctness" of any deploy).
In my opinion, some SYSCALL should explicitly require data to be passed as SYSCALL parameter, not coming from stack, and the Script is such a case, in my opinion (another case, that gave me this idea, is related to a discussion with Erik on the Callback proposal).
It should be perhaps (example, 03 explicit parameters):
SYSCALL Contract.Create 03 MY_SCRIPT_HERE CONTRACT_NAME PARAMS ...
The "bad" thing is that it won't be possible to deploy an unknown contract, via some random stack operations, passed by user invocation.
This can be used to reject "malformed" scripts on Entry, during verification @erikzhang.