Rust: Reduced performance when using question mark operator instead of `try!`

Created on 22 Nov 2016  路  18Comments  路  Source: rust-lang/rust

This was reported on the users forum , and I don't want it to get lost. Basically, replacing try! by ? resulted in ~20% performance loss in benchmarks:

https://github.com/ratel-rust/ratel-core/pull/48#issuecomment-261301646

I've reproduced, but not further investigated, these findings. Is that expected right now? It's not a good argument for adopting the question mark :)

A-LLVM C-enhancement E-needs-test I-slow WG-codegen

Most helpful comment

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

The real culprit is Carrier::translate. It contains a match expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V> as the identity function (in src/libcore/ops.rs):

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
    .cfi_startproc
    xorl    %eax, %eax
    cmpq    $0, (%rsi)
    movq    8(%rsi), %rcx
    setne   %al
    movq    %rax, (%rdi)
    movq    %rcx, 8(%rdi)
    movq    %rdi, %rax
    retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

All 18 comments

I didn't read the implementation until now, I though they expanded to the exact same thing as try!()

Might just need to stick a couple of #[inline]s on Carrier methods.

I'm curious why this issue doesn't even get a label like I-Slow.

@sfackler I tried this (#[inline] on the trait method definitions and the impl methods), but didn't seem to get any different bench numbers.

I've found out why this is slow. Suppose you have an expression res? of type Result<U, V>.
In librustc/hir/lowering.rs it will be desugared to:

match Carrier::translate(res) {
    Ok(val) => val,
    Err(err) => return Carrier::from_error(From::from(err)),
}

The real culprit is Carrier::translate. It contains a match expression itself, which doesn't get optimized away, even though it's implemented for Result<U, V> as the identity function (in src/libcore/ops.rs):

impl<U, V> Carrier for Result<U, V> {
    type Success = U;
    type Error = V;

    fn from_success(u: U) -> Result<U, V> {
        Ok(u)
    }

    fn from_error(e: V) -> Result<U, V> {
        Err(e)
    }

    fn translate<T>(self) -> T
        where T: Carrier<Success=U, Error=V>
    {
        match self {
            Ok(u) => T::from_success(u),
            Err(e) => T::from_error(e),
        }
    }
}

In lowering.rs we have this piece of desugaring process:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        let sub_expr = self.lower_expr(sub_expr);

                        let path = &["ops", "Carrier", "translate"];
                        let path = P(self.expr_std_path(unstable_span, path, ThinVec::new()));
                        P(self.expr_call(e.span, path, hir_vec![sub_expr]))
                    };

Suppose we remove translation and implement the piece simply like this:

                    // Carrier::translate(<expr>)
                    let discr = {
                        // expand <expr>
                        P(self.lower_expr(sub_expr))
                    };

If we do that, there will be no slowdown in ratel-core due to question mark operators (I benchmarked it).
Of course, this is a non-solution and the problem is that rustc and LLVM don't optimize away the redundant match statement.

I created a simple example that demonstrates the issue: https://is.gd/Q7PGzm
Compile into ASM in Release mode. Function identity compiles to:

identity:
    .cfi_startproc
    xorl    %eax, %eax
    cmpq    $0, (%rsi)
    movq    8(%rsi), %rcx
    setne   %al
    movq    %rax, (%rdi)
    movq    %rcx, 8(%rdi)
    movq    %rdi, %rax
    retq

You can see there are unncesseary cmpq and setne instructions. The function should ideally (in order to be fast) simply return the input argument using a few move instructions, that's all.

I'm a total beginner at rustc, so no idea how to proceed further. Should we perhaps detect identity functions within a MIR optimization pass?

@arielb1 I've implemented a test version of the "variant copy prop" by hacking into CopyPropagation. However @eddyb has unfortunately :wink: said that it's much better to make a more general version of that that can propagate copies of aggregates (somehow).

I think this issue is fixed.

Generated asm is same for try! and ?.

https://play.rust-lang.org/?gist=625d88df305ace951a088e1cee2ec13a

Edit: Typo

Even if this is fixed: Is there a unit test which prevents regression?

We could have a codegen test for this.

Someone tag this as E-needstest please...

It looks like they both have the same bad code generation now, needs looking into.

Example where it is compared with the identity function. Returning Ok(x?) is the same as the identity for the result. playground link

But it seems this example is not a good condensation of the issue, because I can't find any previous Rust version (in rust.godbolt) that has the desired identity function code gen even for the try!() macro.

It doesn't show the difference between try and ?, but it shows something we can fix to improve them both.


Edited: Update to another example code link (configurable Result type).

@bluss

I think https://reviews.llvm.org/D37216 should fix this, but it's a little stuck in the LLVM review queue.

@mati865

I know that. It has a bug that I couldn't quite pin out.

This seems to be fixed (in stable). Tried the play link above.

; Function Attrs: noinline norecurse nounwind readnone uwtable
define { i64, i64 } @try_op(i64, i64) unnamed_addr #2 {
  %3 = tail call { i64, i64 } @try_macro(i64 %0, i64 %1) #2
  ret { i64, i64 } %3
}

try_op:
    jmp try_macro

try_macro:
    xorl    %eax, %eax
    testq   %rdi, %rdi
    setne   %al
    movq    %rsi, %rdx
    retq

You can still provoke it to make a difference and introduce conditionals or more copies for the ? operator than for the try!() macro if you change the payload types for the Result, for example using (i32, i32) or String.

type T = (i32, i32);
type E = T;
type R = Result<T, E>;

#[no_mangle]
pub fn try_op(a: R) -> R {
    Ok(a?)
}

#[no_mangle]
pub fn try_macro(a: R) -> R {
    Ok(try!(a))
}

Looks like it regressed: https://godbolt.org/z/awKD_U

Was this page helpful?
0 / 5 - 0 ratings