This was reported on the users forum , and I don't want it to get lost. Basically, replacing try!
by ?
resulted in ~20% performance loss in benchmarks:
https://github.com/ratel-rust/ratel-core/pull/48#issuecomment-261301646
I've reproduced, but not further investigated, these findings. Is that expected right now? It's not a good argument for adopting the question mark :)
I didn't read the implementation until now, I though they expanded to the exact same thing as try!()
Might just need to stick a couple of #[inline]
s on Carrier
methods.
I'm curious why this issue doesn't even get a label like I-Slow.
@sfackler I tried this (#[inline]
on the trait method definitions and the impl methods), but didn't seem to get any different bench numbers.
I've found out why this is slow. Suppose you have an expression res?
of type Result<U, V>
.
In librustc/hir/lowering.rs
it will be desugared to:
match Carrier::translate(res) {
Ok(val) => val,
Err(err) => return Carrier::from_error(From::from(err)),
}
The real culprit is Carrier::translate
. It contains a match
expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V>
as the identity function (in src/libcore/ops.rs
):
impl<U, V> Carrier for Result<U, V> {
type Success = U;
type Error = V;
fn from_success(u: U) -> Result<U, V> {
Ok(u)
}
fn from_error(e: V) -> Result<U, V> {
Err(e)
}
fn translate<T>(self) -> T
where T: Carrier<Success=U, Error=V>
{
match self {
Ok(u) => T::from_success(u),
Err(e) => T::from_error(e),
}
}
}
In lowering.rs
we have this piece of desugaring process:
// Carrier::translate(<expr>)
let discr = {
// expand <expr>
let sub_expr = self.lower_expr(sub_expr);
let path = &["ops", "Carrier", "translate"];
let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
P(self.expr_call(e.span, path, hir_vec![sub_expr]))
};
Suppose we remove translation and implement the piece simply like this:
// Carrier::translate(<expr>)
let discr = {
// expand <expr>
P(self.lower_expr(sub_expr))
};
If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.
I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity
compiles to:
identity:
.cfi_startproc
xorl %eax, %eax
cmpq $0, (%rsi)
movq 8(%rsi), %rcx
setne %al
movq %rax, (%rdi)
movq %rcx, 8(%rdi)
movq %rdi, %rax
retq
You can see there are unncesseary cmpq
and setne
instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.
I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?
@arielb1 I've implemented a test version of the "variant copy prop" by hacking into CopyPropagation. However @eddyb has unfortunately :wink: said that it's much better to make a more general version of that that can propagate copies of aggregates (somehow).
I think this issue is fixed.
Generated asm is same for try! and ?.
https://play.rust-lang.org/?gist=625d88df305ace951a088e1cee2ec13a
Edit: Typo
Even if this is fixed: Is there a unit test which prevents regression?
We could have a codegen
test for this.
Someone tag this as E-needstest
please...
It looks like they both have the same bad code generation now, needs looking into.
Example where it is compared with the identity function. Returning Ok(x?)
is the same as the identity for the result. playground link
But it seems this example is not a good condensation of the issue, because I can't find any previous Rust version (in rust.godbolt) that has the desired identity function code gen even for the try!()
macro.
It doesn't show the difference between try and ?, but it shows something we can fix to improve them both.
Edited: Update to another example code link (configurable Result type).
@bluss
I think https://reviews.llvm.org/D37216 should fix this, but it's a little stuck in the LLVM review queue.
@arielb1 it was reverted https://github.com/llvm-mirror/llvm/commit/c87c1c0cd88b18eee5668841b4bfcc6d1260afd3.
@mati865
I know that. It has a bug that I couldn't quite pin out.
Still not fixed (tested nightly on playpen)
This seems to be fixed (in stable). Tried the play link above.
; Function Attrs: noinline norecurse nounwind readnone uwtable
define { i64, i64 } @try_op(i64, i64) unnamed_addr #2 {
%3 = tail call { i64, i64 } @try_macro(i64 %0, i64 %1) #2
ret { i64, i64 } %3
}
try_op:
jmp try_macro
try_macro:
xorl %eax, %eax
testq %rdi, %rdi
setne %al
movq %rsi, %rdx
retq
You can still provoke it to make a difference and introduce conditionals or more copies for the ?
operator than for the try!()
macro if you change the payload types for the Result
, for example using (i32, i32)
or String
.
type T = (i32, i32);
type E = T;
type R = Result<T, E>;
#[no_mangle]
pub fn try_op(a: R) -> R {
Ok(a?)
}
#[no_mangle]
pub fn try_macro(a: R) -> R {
Ok(try!(a))
}
Looks like it regressed: https://godbolt.org/z/awKD_U
Most helpful comment
I've found out why this is slow. Suppose you have an expression
res?
of typeResult<U, V>
.In
librustc/hir/lowering.rs
it will be desugared to:The real culprit is
Carrier::translate
. It contains amatch
expression itself, which doesn't get optimized away, even though it's implemented forResult<U, V>
as the identity function (insrc/libcore/ops.rs
):In
lowering.rs
we have this piece of desugaring process:Suppose we remove translation and implement the piece simply like this:
If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.
I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function
identity
compiles to:You can see there are unncesseary
cmpq
andsetne
instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?