Rust: Rustc fails to optimize a common option usage pattern

Created on 30 Jan 2020  路  5Comments  路  Source: rust-lang/rust

Consider following functions

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| t >= b)
     .unwrap_or(false)
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

The first pattern is what we often write and the second one is the most efficient manually unrolled version. Surprisingly rustc fails to optimize the former one into the latter as you can see in godbolt listing:

example::unwrap_combinators:
        xor     eax, eax
        cmp     edx, esi
        setle   al
        test    edi, edi
        mov     ecx, 2
        cmovne  ecx, eax
        cmp     cl, 2
        setne   al
        and     al, cl
        ret

example::unwrap_manual:
        test    edi, edi
        setne   cl
        cmp     esi, edx
        setge   al
        and     al, cl
        ret

P.S. Yes, I'm aware of map_or

A-LLVM A-layout A-mir-opt A-mir-opt-inlining C-bug I-slow T-compiler

Most helpful comment

@Aaron1011 I can confirm that, with #68528, unwrap_or becomes eligible for MIR inlining in unwrap_combinators, and the two functions compile to the same assembly with -Z mir-opt-level=3.

All 5 comments

It looks like Rust's niche-filling optimization is defeating LLVM's optimizations.

In unwrap_combinators, the temporary Option<bool> uses the niche in bool for the discriminant, resulting in the following layout:

0 == Some(false)
1 === Some(true)
2 == None

LLVM ends up inlining the calls to map and unwrap_or, but is unable to optimize the particular combination of icmp and select that ends up in the LLVM IR.

This can be seen by changing the type to a tuple of (bool, ()), which inhibits Rust's niche-filling optimization: (playground):

pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
    a.map(|t| (t >= b, ()))
     .unwrap_or((false, ()))
     .0
}

pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
    match a {
        Some(t) => t >= b,
        None => false
    }
}

which generates the following ASM:

playground::unwrap_combinators:
    testl   %edi, %edi
    setne   %cl
    cmpl    %esi, %edx
    setle   %al
    andb    %cl, %al
    retq

playground::unwrap_manual:
    testl   %edi, %edi
    setne   %cl
    cmpl    %edx, %esi
    setge   %al
    andb    %cl, %al
    retq

LLVM has all of the information it needs to optimize the original IR - however, it doesn't seem to have an instcombine special-case that would allow it to do so.

Unfortunately, neither map nor unwrap_or gets MIR-inlined with -Z mir-opt-level=3, due to both of them having too high of a computed cost:

[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } } [cost=204 > threshold=100]
[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline]     final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } } [cost=123 > threshold=100]

The fact that LLVM decides to inline these functions suggets that we might be overly conservative in how we calculating inlining cost.

Hopefully, this situation will be improved by https://github.com/rust-lang/rust/pull/68528, which specifically calls out unwrap_or as having improved MIR generation.

@Aaron1011 I can confirm that, with #68528, unwrap_or becomes eligible for MIR inlining in unwrap_combinators, and the two functions compile to the same assembly with -Z mir-opt-level=3.

Should this issue be closed now? Since it's been solved by #68528. Or maybe since it's still under mir-opt-3, and therefore unstable, it's still worth left open? :)

This won't be fixed until MIR inlining becomes more usable and should remain open. I would like to see an "I-slow-fixed-by-MIR-inlining" tag so issues like this and #66234 can be triaged more efficiently.

That makes a lot of sense :slightly_smiling_face:

Was this page helpful?
0 / 5 - 0 ratings