solc 0.5.4 with optimizer runs is failing to properly optimize simple functions such as those in SafeMath as compared to using assembly.
See https://github.com/OpenZeppelin/openzeppelin-solidity/pull/1635
Could you be more specific? What is your expectation and what does the compiler do?
I'm not sure exactly what the compiler output is in either case, but
function div(uint256 a, uint256 b) internal pure returns (uint256) {
require(b > 0);
uint256 c = a / b;
return c;
}
is using 38 more gas than:
function div(uint256 a, uint256 b) internal pure returns (uint256 r) {
assembly {
if iszero(b) { revert(0, 0) }
r := div(a, b)
}
}
There are more examples in the link given. Almost all SafeMath functions are not compiling in an optimal way
Unoptimised:
div(uint256,uint256): 407
div2(uint256,uint256): 343
Optimised:
div(uint256,uint256): 328
div2(uint256,uint256): 268
Where div2 is the hand written assembly and both are public functions in a contract.
The unoptimised output:
/* "opt.sol":140:292 function div2(uint256 a, uint256 b) pure public returns (uint256 r) {... */
tag_7:
/* "opt.sol":197:206 uint256 r */
0x00
/* "opt.sol":243:244 b */
dup2
/* "opt.sol":236:245 iszero(b) */
iszero
/* "opt.sol":233:235 if */
iszero
tag_12
jumpi
/* "opt.sol":258:259 0 */
0x00
/* "opt.sol":255:256 0 */
dup1
/* "opt.sol":248:260 revert(0, 0) */
revert
/* "opt.sol":233:235 if */
tag_12:
/* "opt.sol":283:284 b */
dup2
/* "opt.sol":280:281 a */
dup4
/* "opt.sol":276:285 div(a, b) */
div
/* "opt.sol":271:285 r := div(a, b) */
swap1
pop
/* "opt.sol":223:290 {... */
swap3
swap2
pop
pop
jump // out
/* "opt.sol":13:138 function div(uint256 a, uint256 b) pure public returns (uint256) {... */
tag_10:
/* "opt.sol":69:76 uint256 */
0x00
/* "opt.sol":96:97 0 */
dup1
/* "opt.sol":92:93 b */
dup3
/* "opt.sol":92:97 b > 0 */
gt
/* "opt.sol":84:98 require(b > 0) */
iszero
iszero
tag_14
jumpi
0x00
dup1
revert
tag_14:
/* "opt.sol":104:113 uint256 c */
0x00
/* "opt.sol":120:121 b */
dup3
/* "opt.sol":116:117 a */
dup5
/* "opt.sol":116:121 a / b */
dup2
iszero
iszero
tag_15
jumpi
invalid
tag_15:
div
/* "opt.sol":104:121 uint256 c = a / b */
swap1
pop
/* "opt.sol":134:135 c */
dup1
/* "opt.sol":127:135 return c */
swap2
pop
pop
/* "opt.sol":13:138 function div(uint256 a, uint256 b) pure public returns (uint256) {... */
swap3
swap2
pop
pop
jump // out
The main difference I see is the check for division-by-zero, which you are missing in the hand written version.
We could consider tracking value ranges of variables and due to the require statement the code generator could avoid adding the check again. We do this in the SMTChecker but that state is not used anywhere else.
Btw, what is the point of div in SafeMath? The compiler generates runtime assertion for division by zero since mid-2016.
The newest SafeMath prefers to generate the revert opcode instead of the invalid opcode. The previous version of div() in SafeMath did not do anything besides return a/b. I didn't realize that the invalid check was created by the compiler and not the EVM (which will output 0 in the case of a zero divisor)
Okay it makes sense that the hand-written version is less gas then. One possible compiler optimization that stands out is optimizing to a>0 be iszero(iszero(a)) instead of iszero(iszero(gt(a,0))) for any unsigned types.
But that's sort of a separate issue, feel free to close this one if it is no longer helpful for you
iszero(iszero(gt(a,0)))
Should be picked up by the optimiser and replaced with gt(a,0) according to the rule list.
Update: the optimiser picks it up correctly.
I think we can add the following new optimiser rules:
lt(x, 0) -> 0 -- since this cannot ever be true, slt can be used for two's complement <0 checksgt(x, 0) -> iszero(iszero(x)) -- same here, there's sgt for signed comparisongt(x, ~0)) -> 0The middle rule makes sense because iszero(iszero(iszero(iszero(x)))) will be picked up and reduced into iszero(iszero(x)).
What are the gas costs of just using division in Solidity compared to the assembly?
Value range / condition tracking has been on the roadmap, and I think for the yul optimizer we should really implement it. It should also not be that difficult because we already track the values of variables.
Not sure if this is the case, but is the sequence
iszero
tag_#
jumpi
reducable to simply
tag_#
jumpi
since jumpi as I understand it already checks for a non-zero value
That is true, but I think our current framework can only optimize expressions and not statements. We could add this to the peephole optimizer, though.
We could add this to the peephole optimizer, though.
I think that sounds good.
From https://github.com/OpenZeppelin/openzeppelin-solidity/pull/1678 (I didn't want to creat a new issue, since the topic is the same)
As shown by @wjmelements here, when compiling on both v0.4.24 and v0.5.5, Solidity seems to emit less efficient bytecode when a returned variable is declared inside a function, as opposed to the return value being named in the function declaration and assigned to):
// This one costs more gas to call
function add1(uint256 a, uint256 b) internal pure returns (uint256) {
uint256 c = a + b;
require(c >= a);
return c;
}
// This one is cheaper to call
function add2(uint256 a, uint256 b) internal pure returns (uint256 c) {
c = a + b;
require(c >= a);
}
The difference is not very large (around 400 and 800 gas, depending on the compiler version, when calling the function in question 100 times, so a 1% gas difference for a transaction that only calls said function in a loop), but it may still be worthwhile looking into to find out where this discrepancy comes from.
The difference seemse to be 5 gas for optimized code and 13 for unoptimized code (which I would not consider here).
> solc --gas --optimize /tmp/x.sol
======= /tmp/x.sol:C1 =======
Gas estimation:
construction:
87 + 32600 = 32687
external:
add1(uint256,uint256): 277
======= /tmp/x.sol:C2 =======
Gas estimation:
construction:
81 + 32000 = 32081
external:
add2(uint256,uint256): 272
> solc --gas /tmp/x.sol
======= /tmp/x.sol:C1 =======
Gas estimation:
construction:
87 + 38400 = 38487
external:
add1(uint256,uint256): 352
======= /tmp/x.sol:C2 =======
Gas estimation:
construction:
87 + 37400 = 37487
external:
add2(uint256,uint256): 339
Since the return variable is an actual variable that can be referenced, it occupies a stack slot. The old optimizer (the new cannot yet be used here) is based on basic blocks and it only optimizes what happens to the stack from block start to block end - it will not modify the stack layout between blocks and thus it cannot remove this stack slot.
I'm closing this now, since everything discussed here that can be implemented with reasonable effort has been implemented. Value range tracking is on the roadmap for the yul optimizer: #6252
Ah, that makes sense then, considering how the optimizer works. Will the Yul optimizer be able to modify the stack layout across blocks?
Also, is there a rough estimate for when you'd expect solc to be able to compile to Yul to make use if its optimizer?
Thanks!
Yes, the yul optimizer was designed to allow cross-block and even cross-function optimizations. The traditional optimizer is a little better on optimizing stack layout, but I think we will get there also with the yul optimizer. I cannot really tell how solidity -> yul will progress, but we currently estimate end of the year for a usable prototype.
Here is the first PR on this road: https://github.com/ethereum/solidity/pull/2320
Most helpful comment
From https://github.com/OpenZeppelin/openzeppelin-solidity/pull/1678 (I didn't want to creat a new issue, since the topic is the same)
As shown by @wjmelements here, when compiling on both v0.4.24 and v0.5.5, Solidity seems to emit less efficient bytecode when a returned variable is declared inside a function, as opposed to the return value being named in the function declaration and assigned to):
The difference is not very large (around 400 and 800 gas, depending on the compiler version, when calling the function in question 100 times, so a 1% gas difference for a transaction that only calls said function in a loop), but it may still be worthwhile looking into to find out where this discrepancy comes from.