{
for {} div(create(0, 1, 0), shl(msize(), 1)) {}
{
}
}
Not sure why the yul interpreter is evaluating create twice (trace below)
Trace:
CREATE(0, 1, 0)
CREATE(0, 1, 0)
while the optimized version of the program.
{
for { } shr(msize(), create(0, 1, 0)) { }
{ }
}
reports this trace
Trace:
CREATE(0, 1, 0)
tripping the fuzzer in the process.
The rule list responsible for this is the following
https://github.com/ethereum/solidity/blob/324cc71b13a2e1ebf1fe34397d085832b89076a4/libevmasm/RuleList.h#L489-L496
$ yulrun --input-file <test.yul>
...
$ solc --strict-assembly --optimize <test.yul>
$ yulrun --input-file <test_optimized.yul>
So the problem seems to be order of evaluation here
msize() is evaluated before create() which has the effect of passing the for-loop condition (since create returns a non-zero address value, 0xcccccc+ arg[1] to be precise and the divisor is 1 because msize() equals zero)msize() is evaluated after create(), making the divisor greater than the address returned by create (divisor becomes 4294967296, and dividend is 0xcccccc+1 = 13421773 in decimal). This falsifies the for loop conditionBecause in the unoptimized case the for-loop condition is evaluated twice, we get two traces of CREATE. OTOH, in the optimized program, we get a single trace of CREATE since for-loop is evaluated only once.
My guess is this is because the rule list only has "removes non-constant argument" instead of "moves non-constant argument" and that's false for DIV(X, SHL(Y, 1)) -> SHR(Y, X).
The same will is true for MUL(X, SHL(Y, 1)) -> SHL(Y, X).
But those are the only two I found after a quick look through the list.
We can just set the flag to true and maybe also adjust the explanation of the flag. Please also change the comment in ExpressionSimplifier.cpp that talks about the distinction.
Analysis for likelihood of appearance:
The sub-expressions X and Y need to be non-movable opcodes (or larger expression trees) with side-effects. If an opcode is part of an expression tree, it needs to return a single value.
keccak256 (only side-effect on msize)extcodesizebalancereturndatasizeextcodehashmloadsloadpcmsizecreatecallcallcodedelegatecallstaticcallcreate2For the new optimizer, this also includes calls to user-defined functions, so the effect looks rather large at first sight. On the other hand, the expression simplifier is usually applied to the "split form", a kind of three-address-code. In this form, arguments to opcodes and function calls are usually variables and thus the order of execution is not changed by exchanging these variables.
The only place where this bug could surface is the conditions of for loops (as also exemplified by the example). It could be rather uncommon to put non-movable expressions in for loop conditions. The existing generated code does not do that and user-written code would probably not rely on the order of execution in such expressions.
Here, the opcode stream is chopped up into blocks at certain opcodes. These opcodes include all opcodes that have side-effects. So this does not apply to the old optimizer at all.
The other rule where this issue applies
does not cause a problem in the for-loop-condition case because the for-loop condition value does not change across iterations due to the left shift (for division/right shift this is different though as noted before). So for the mul case, it makes the yul interpreter "time out" on the infinite loop.
Unoptimized (where address is value returned by the create call)
{
// condition = address * 1 = true for first iteration
// = address * 32 = true for subsequent iterations
for {} mul(create(0, 1, 0), shl(msize(), 1)) {} {
}
}
optimized form
{
// condition = address * 32 = true for all iterations
for {} shl(msize(), create(0, 1, 0)) {} {
}
}
Hm, but those expressions can be embedded in more complex expressions, can't they?
yeah, I would vote for better safe than sorry since we already anticipate the problem
Oh and by the way: The title of this issue is too strict - there can be way more severe problems than just msize when you take user-defined functions into account.
function f() -> a { a := add(mload(0), 1) }
for {} lt(mul(f(), shl(msize(), 1)),1) {} { /* some trace output */ }
Without optimization mul(f(), shl(msize(), 1)) is zero, hence loop is executed.
With optimization shl(msize(), f()) is 32, hence loop is not executed.
@ekpyron Nice! Shouldn't it be something like
function f() -> a { a := add(mload(0), 1) }
for {} sub(mul(f(), shl(msize(), 1)),1) {} { /* some trace output */ }
without optimization -> loop not taken
with optimization -> loop taken
Not sure - If I was correct about the values of the expressions being 0 or 32, then lt(exp,1) or iszero(exp) for that matter, would work - but I haven't checked in detail, the point is just: this can happen for the other rule as well.
We can just set the flag to true and maybe also adjust the explanation of the flag. Please also change the comment in ExpressionSimplifier.cpp that talks about the distinction.
If we set the optimization rule "movable" flag for the order altering DIV/MUL rules to true (as a fix), there is a different issue that I don't understand
This program
{
let x_0
x_0 := 0
for {let i_0} lt(i_0,6) {i_0 := add(i_0, 2)} {
x_0 := x_0
}
}
fails this assertion
@bshastry That seems quite unrelated... could you output the yul code at the point when the assertion fails?
@ekpyron You're right. This is an unrelated issue. Created #7416