Depends on
CC @axic @chriseth @cdetrio @holiman @ekpyron
@cdetrio I don't have access to the go-vm-fuzzing git repo that the ABIencoderv2 fuzzer that you shared depends on (log below)
$ cd abiencoderv2-fuzzer
$ make
go build -buildmode=c-archive github.com/ethereum/go-ethereum/go-vm-fuzzing
can't load package: package github.com/ethereum/go-ethereum/go-vm-fuzzing: cannot find package "github.com/ethereum/go-ethereum/go-vm-fuzzing" in any of:
/snap/go/3417/src/github.com/ethereum/go-ethereum/go-vm-fuzzing (from $GOROOT)
/home/bhargava/go/src/github.com/ethereum/go-ethereum/go-vm-fuzzing (from $GOPATH)
Makefile:6: recipe for target 'go-vm-fuzzing.a' failed
make: *** [go-vm-fuzzing.a] Error 1
$ git clone https://github.com/ethereum/go-ethereum/go-vm-fuzzing.git
Cloning into 'go-vm-fuzzing'...
remote: Not Found
fatal: repository 'https://github.com/ethereum/go-ethereum/go-vm-fuzzing.git/' not found
Might be this one: https://github.com/guidovranken/go-ethereum/tree/go-vm-fuzzing
This is the most recent version of the vm-fuzzer: https://github.com/ethereum/ethereum-vm-fuzzer . It is continually updated to the most recent versions
Looping in @cryptomental here too
Hi @cryptomental
At a high level, my understanding is that you are doing multi-client differential fuzzing. So I imagine your test harness looks something like
#include <geth headers>
#include <parity headers>
...
int LLVMFuzzerTestOneInput(uint8_t *data, size_t size) {
std::string program(data);
assert(Geth::execute(program) == Parity::execute(program)
return 0;
}
Our use-case is slightly different. Assuming my code snippet somehow correctly captures your use-case, let me present what we would like to do
#include <some solidity headers>
#include <some libfuzzer headers>
#include <some geth headers>
...
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
{
std::string program(data);
EVMBytecode = solidity::compile(program);
// All programs must return true
assert(Geth::execute(EVMBytecode)==1);
return 0;
}
Do you think something like this is feasible? In the end, we'd have one big fuzzing binary blob that generates random input, tests the harness above, and reports failures.
This is the most recent version of the vm-fuzzer: https://github.com/ethereum/ethereum-vm-fuzzer . It is continually updated to the most recent versions
Thank you @holiman . IIRC, we should be writing a client for the latest and greatest ethereum-vm-fuzzer that is linked above.
@chriseth @ekpyron and I discussed to some extent what an initial test stub could look like for the ABIencoderv2. The main conclusion from our discussion is that we should first figure out what the testing environment is going to look like before we get back to actually writing the ABIencoderv2 test harness.
Thinking of it, such a test environment for ABIencoderv2 may also help us fuzz Solidity end-to-end (parsing, optimization, code generation and execution). We could, for example, assert equality of unoptimized and optimized Geth traces for random (but parseable) Solidity code.
@holiman was kind enough to loop in @cryptomental from the security testing team who may be able to help us better understand the test environment.
Hi @bshastry @chriseth @ekpyron @holiman . I'd be happy to help you set this up after we agree on the test harness . @bshastry Geth vs Parity differential fuzzing is a bit more complex than your high level code but you almost got it right. We also extract prestate from the input and fill the EVM with accounts and balances before the code is run, you may have a look at GethVMRunner::Run() in vmrunner.cpp for the details. We compare if state / stack and errors of both EVM implementations match.
If I understand correctly your use case, the goal would be to generate bytecode from a fuzzed source code using a Solidity compiler without and with optimization and feed this random bytecode into the EVM using a fuzzer. Then this indeed could be a perfect use case to see if Geth execution traces are the same for the optimized and unoptimized Solidity output.
The second use case would be to fuzz-test the ABIencoderv2 itself, without differential fuzzing. Then the fuzzer would feed the random input into the ABIencoderv2 engine until it can find the best possible encoder code coverage and see if the encoder can always give an output or an error without a crash.
Please let me know if I got it right.
P.S. If anyone is interested, https://github.com/Dor1s/libfuzzer-workshop/tree/master/lessons is a great workshop on the libfuzzer that helps to understand the main idea behind libfuzzer and the test harness. I am not sure about ABIencoderv2 internals, but we could think of structure aware fuzzing if needed https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md
@cryptomental You are right. I tend to think Geth execution trace is optional for the ABIencoderv2 use case because we are planning to construct the fuzzed program in such a way that we already know the expectation. So we assume that asserting program output is sufficient for now e.g., we always expect the program to return true.
Having said that, we are still very early in this project so it might not hurt to have access to traces if need be/for debugging etc.
The first priority at the moment is ABIencoderv2 fuzzing, I thought of the differential use-case since it seems to be a straightforward extension to the ABI fuzzing use case. IIUC, both use cases share common test infrastructure.
Hi @bshastry that makes sense. I would like to experiment a bit on this. Would it be possible for you to provide a simple example of a Solidity program (ABIencoder input) and the resulting EVM bytecode that you would like to fuzz with an example EVM configuration? Perhaps initially we could use solc and Geth's EVM runner to validate the EVM bytecode. My idea was to install sudo snap install solc --edge and then solc --bin program.sol and solc --optimize --bin program.sol and then run go/src/github.com/ethereum/go-ethereum/cmd/evm/evm --code program.bin with additional EVM parameters (see https://github.com/ethereum/go-ethereum/blob/master/cmd/evm/runner.go#L43) , but I am not sure if this is the way to go.
After you send an example use case with instructions I could try to hook it up in the way you described at https://github.com/ethereum/solidity/issues/6438#issuecomment-480180897 into the ethereum-vm-fuzzer.
Thank you @cryptomental
Update: I'm working on a minimal PoC harness that accepts solidity input, compiles it to EVM byte code, runs it on geth, and asserts on false return code or other program errors (e.g., revert).
@bshastry can this be closed?
Ah yes. Sorry for the considerable delay.
Do we fuzz against the Isabelle implementation of the encoding by now, though? I'd still say that'd be a major improvement on top of the initial ABIv2 fuzzing we had and we should really do it...
@ekpyron That is still in progress and unrelated to this issue which was originally created for the initial fuzzing setup for abiv2 coding that we now use.
May be I should create a separate issue to track the isabelle fuzzing...
Alright, sounds good! Let me know if we should talk about the isabelle fuzzing once more. There was some progress in the Isabelle implementation, so the newest version now also works nicely for unreasonably-huge array length, so with that we should be able to fuzz random mutations of valid encodings (e.g. flipping random bits in them), which would be nice...