Solidity: [yul] optimizer can produce different side-effects because of the order of evaluation of for-loop conditional expression that contains `msize()` or a user-defined function that has a side-effect

Created on 12 Sep 2019 · 14Comments · Source: ethereum/solidity

Description

{
  for {} div(create(0, 1, 0), shl(msize(), 1)) {}
  {
  }
}

Not sure why the yul interpreter is evaluating create twice (trace below)

Trace:
  CREATE(0, 1, 0)
  CREATE(0, 1, 0)

while the optimized version of the program.

{
    for { } shr(msize(), create(0, 1, 0)) { }
    { }
}

reports this trace

Trace:
  CREATE(0, 1, 0)

tripping the fuzzer in the process.

The rule list responsible for this is the following
https://github.com/ethereum/solidity/blob/324cc71b13a2e1ebf1fe34397d085832b89076a4/libevmasm/RuleList.h#L489-L496

Environment

Compiler version: latest develop

Steps to Reproduce

$ yulrun --input-file <test.yul>
...
$ solc --strict-assembly --optimize <test.yul>
$ yulrun --input-file <test_optimized.yul>

bug

Source

bshastry

All 14 comments

So the problem seems to be order of evaluation here

In the unoptimized test, msize() is evaluated before create() which has the effect of passing the for-loop condition (since create returns a non-zero address value, 0xcccccc+ arg[1] to be precise and the divisor is 1 because msize() equals zero)
In the optimized test, msize() is evaluated after create(), making the divisor greater than the address returned by create (divisor becomes 4294967296, and dividend is 0xcccccc+1 = 13421773 in decimal). This falsifies the for loop condition

Because in the unoptimized case the for-loop condition is evaluated twice, we get two traces of CREATE. OTOH, in the optimized program, we get a single trace of CREATE since for-loop is evaluated only once.

bshastry on 12 Sep 2019

My guess is this is because the rule list only has "removes non-constant argument" instead of "moves non-constant argument" and that's false for DIV(X, SHL(Y, 1)) -> SHR(Y, X).
The same will is true for MUL(X, SHL(Y, 1)) -> SHL(Y, X).
But those are the only two I found after a quick look through the list.

ekpyron on 12 Sep 2019

We can just set the flag to true and maybe also adjust the explanation of the flag. Please also change the comment in ExpressionSimplifier.cpp that talks about the distinction.

chriseth on 12 Sep 2019

👍1

Analysis for likelihood of appearance:

The sub-expressions X and Y need to be non-movable opcodes (or larger expression trees) with side-effects. If an opcode is part of an expression tree, it needs to return a single value.

keccak256 (only side-effect on msize)
extcodesize
balance
returndatasize
extcodehash
mload
sload
pc
msize
create
call
callcode
delegatecall
staticcall
create2

New optimizer

For the new optimizer, this also includes calls to user-defined functions, so the effect looks rather large at first sight. On the other hand, the expression simplifier is usually applied to the "split form", a kind of three-address-code. In this form, arguments to opcodes and function calls are usually variables and thus the order of execution is not changed by exchanging these variables.

The only place where this bug could surface is the conditions of for loops (as also exemplified by the example). It could be rather uncommon to put non-movable expressions in for loop conditions. The existing generated code does not do that and user-written code would probably not rely on the order of execution in such expressions.

Old optimizer

Here, the opcode stream is chopped up into blocks at certain opcodes. These opcodes include all opcodes that have side-effects. So this does not apply to the old optimizer at all.

chriseth on 12 Sep 2019

The other rule where this issue applies

https://github.com/ethereum/solidity/blob/324cc71b13a2e1ebf1fe34397d085832b89076a4/libevmasm/RuleList.h#L472-L479

does not cause a problem in the for-loop-condition case because the for-loop condition value does not change across iterations due to the left shift (for division/right shift this is different though as noted before). So for the mul case, it makes the yul interpreter "time out" on the infinite loop.

Unoptimized (where address is value returned by the create call)

{
  // condition = address * 1 = true for first iteration
  //                = address * 32 = true for subsequent iterations
  for {} mul(create(0, 1, 0), shl(msize(), 1)) {} {
  }
}

optimized form

{
  // condition = address * 32 = true for all iterations
  for {} shl(msize(), create(0, 1, 0)) {} {
  }
}

bshastry on 12 Sep 2019

Hm, but those expressions can be embedded in more complex expressions, can't they?

chriseth on 12 Sep 2019

👍1

yeah, I would vote for better safe than sorry since we already anticipate the problem

bshastry on 12 Sep 2019

Oh and by the way: The title of this issue is too strict - there can be way more severe problems than just msize when you take user-defined functions into account.

chriseth on 12 Sep 2019

👍1

function f() -> a { a := add(mload(0), 1) }
for {} lt(mul(f(), shl(msize(), 1)),1) {} { /* some trace output */ }

Without optimization mul(f(), shl(msize(), 1)) is zero, hence loop is executed.
With optimization shl(msize(), f()) is 32, hence loop is not executed.

ekpyron on 12 Sep 2019

❤1

@ekpyron Nice! Shouldn't it be something like

function f() -> a { a := add(mload(0), 1) }
for {} sub(mul(f(), shl(msize(), 1)),1) {} { /* some trace output */ }

without optimization -> loop not taken
with optimization -> loop taken

bshastry on 12 Sep 2019

Not sure - If I was correct about the values of the expressions being 0 or 32, then lt(exp,1) or iszero(exp) for that matter, would work - but I haven't checked in detail, the point is just: this can happen for the other rule as well.

ekpyron on 12 Sep 2019

👍1

We can just set the flag to true and maybe also adjust the explanation of the flag. Please also change the comment in ExpressionSimplifier.cpp that talks about the distinction.

If we set the optimization rule "movable" flag for the order altering DIV/MUL rules to true (as a fix), there is a different issue that I don't understand

This program

{
  let x_0
  x_0 := 0
  for {let i_0} lt(i_0,6) {i_0 := add(i_0, 2)} {
    x_0 := x_0
  }
}

fails this assertion

https://github.com/ethereum/solidity/blob/b747c26788c60d9d4487329f47b96d20df68dfb4/libyul/optimiser/BlockHasher.cpp#L116

bshastry on 12 Sep 2019

@bshastry That seems quite unrelated... could you output the yul code at the point when the assertion fails?

ekpyron on 12 Sep 2019

@ekpyron You're right. This is an unrelated issue. Created #7416

bshastry on 12 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

New ABI decoder fails on arrays of dynamic types as calldata

axic · 4Comments

Cannot build latest develop on OS X Sierra

VoR0220 · 4Comments

Basic Crowdfunding contract from eth tutorial problem

Dohtar1337 · 4Comments

inline functions

chriseth · 3Comments

grammar.txt: missing rule for 'NEW Identifier()'

walter-weinmann · 4Comments