Rust: Reduced performance when using question mark operator instead of `try!`

Created on 22 Nov 2016 · 18Comments · Source: rust-lang/rust

This was reported on the users forum , and I don't want it to get lost. Basically, replacing try! by ? resulted in ~20% performance loss in benchmarks:

https://github.com/ratel-rust/ratel-core/pull/48#issuecomment-261301646

I've reproduced, but not further investigated, these findings. Is that expected right now? It's not a good argument for adopting the question mark :)

A-LLVM C-enhancement E-needs-test I-slow WG-codegen

Source

birkenfeld

👍4

Most helpful comment

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

The real culprit is Carrier::translate. It contains a match expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V> as the identity function (in src/libcore/ops.rs):

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
    .cfi_startproc
    xorl    %eax, %eax
    cmpq    $0, (%rsi)
    movq    8(%rsi), %rcx
    setne   %al
    movq    %rax, (%rdi)
    movq    %rcx, 8(%rdi)
    movq    %rdi, %rax
    retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

stjepang on 8 Dec 2016

❤23 👍22

All 18 comments

I didn't read the implementation until now, I though they expanded to the exact same thing as try!()

arthurprs on 24 Nov 2016

Might just need to stick a couple of #[inline]s on Carrier methods.

sfackler on 24 Nov 2016

I'm curious why this issue doesn't even get a label like I-Slow.

StefanoD on 26 Nov 2016

@sfackler I tried this (#[inline] on the trait method definitions and the impl methods), but didn't seem to get any different bench numbers.

birkenfeld on 3 Dec 2016

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
    .cfi_startproc
    xorl    %eax, %eax
    cmpq    $0, (%rsi)
    movq    8(%rsi), %rcx
    setne   %al
    movq    %rax, (%rdi)
    movq    %rcx, 8(%rdi)
    movq    %rdi, %rax
    retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

stjepang on 8 Dec 2016

❤23 👍22

@arielb1 I've implemented a test version of the "variant copy prop" by hacking into CopyPropagation. However @eddyb has unfortunately :wink: said that it's much better to make a more general version of that that can propagate copies of aggregates (somehow).

bluss on 11 Dec 2016

I think this issue is fixed.

Generated asm is same for try! and ?.

https://play.rust-lang.org/?gist=625d88df305ace951a088e1cee2ec13a

Edit: Typo

kdy1 on 28 Sep 2017

Even if this is fixed: Is there a unit test which prevents regression?

StefanoD on 28 Sep 2017

We could have a codegen test for this.

Someone tag this as E-needstest please...

kennytm on 28 Sep 2017

👍2

It looks like they both have the same bad code generation now, needs looking into.

bluss on 28 Sep 2017

😄4 😕1

Example where it is compared with the identity function. Returning Ok(x?) is the same as the identity for the result. playground link

But it seems this example is not a good condensation of the issue, because I can't find any previous Rust version (in rust.godbolt) that has the desired identity function code gen even for the try!() macro.

It doesn't show the difference between try and ?, but it shows something we can fix to improve them both.

Edited: Update to another example code link (configurable Result type).

bluss on 28 Sep 2017

@bluss

I think https://reviews.llvm.org/D37216 should fix this, but it's a little stuck in the LLVM review queue.

arielb1 on 28 Sep 2017

🎉1

@arielb1 it was reverted https://github.com/llvm-mirror/llvm/commit/c87c1c0cd88b18eee5668841b4bfcc6d1260afd3.

mati865 on 1 Feb 2018

@mati865

I know that. It has a bug that I couldn't quite pin out.

arielb1 on 1 Feb 2018

Still not fixed (tested nightly on playpen)

https://play.rust-lang.org/?version=nightly&mode=release&edition=2015&gist=b3858efebcadbffcbe2155e3f2750d07

kornelski on 24 Apr 2018

👍2

This seems to be fixed (in stable). Tried the play link above.

; Function Attrs: noinline norecurse nounwind readnone uwtable
define { i64, i64 } @try_op(i64, i64) unnamed_addr #2 {
  %3 = tail call { i64, i64 } @try_macro(i64 %0, i64 %1) #2
  ret { i64, i64 } %3
}

try_op:
    jmp try_macro

try_macro:
    xorl    %eax, %eax
    testq   %rdi, %rdi
    setne   %al
    movq    %rsi, %rdx
    retq

mehcode on 31 Aug 2018

🎉3 👍2

You can still provoke it to make a difference and introduce conditionals or more copies for the ? operator than for the try!() macro if you change the payload types for the Result, for example using (i32, i32) or String.

type T = (i32, i32);
type E = T;
type R = Result<T, E>;

#[no_mangle]
pub fn try_op(a: R) -> R {
    Ok(a?)
}

#[no_mangle]
pub fn try_macro(a: R) -> R {
    Ok(try!(a))
}

bluss on 1 Dec 2018

👍1

Looks like it regressed: https://godbolt.org/z/awKD_U

mati865 on 3 Apr 2019

👍3

Was this page helpful?

0 / 5 - 0 ratings