Consider following functions
pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
a.map(|t| t >= b)
.unwrap_or(false)
}
pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
match a {
Some(t) => t >= b,
None => false
}
}
The first pattern is what we often write and the second one is the most efficient manually unrolled version. Surprisingly rustc fails to optimize the former one into the latter as you can see in godbolt listing:
example::unwrap_combinators:
xor eax, eax
cmp edx, esi
setle al
test edi, edi
mov ecx, 2
cmovne ecx, eax
cmp cl, 2
setne al
and al, cl
ret
example::unwrap_manual:
test edi, edi
setne cl
cmp esi, edx
setge al
and al, cl
ret
P.S. Yes, I'm aware of map_or
It looks like Rust's niche-filling optimization is defeating LLVM's optimizations.
In unwrap_combinators
, the temporary Option<bool>
uses the niche in bool
for the discriminant, resulting in the following layout:
0 == Some(false)
1 === Some(true)
2 == None
LLVM ends up inlining the calls to map
and unwrap_or
, but is unable to optimize the particular combination of icmp
and select
that ends up in the LLVM IR.
This can be seen by changing the type to a tuple of (bool, ())
, which inhibits Rust's niche-filling optimization: (playground):
pub fn unwrap_combinators(a: Option<i32>, b: i32) -> bool {
a.map(|t| (t >= b, ()))
.unwrap_or((false, ()))
.0
}
pub fn unwrap_manual(a: Option<i32>, b: i32) -> bool {
match a {
Some(t) => t >= b,
None => false
}
}
which generates the following ASM:
playground::unwrap_combinators:
testl %edi, %edi
setne %cl
cmpl %esi, %edx
setle %al
andb %cl, %al
retq
playground::unwrap_manual:
testl %edi, %edi
setne %cl
cmpl %edx, %esi
setge %al
andb %cl, %al
retq
LLVM has all of the information it needs to optimize the original IR - however, it doesn't seem to have an instcombine special-case that would allow it to do so.
Unfortunately, neither map
nor unwrap_or
gets MIR-inlined with -Z mir-opt-level=3
, due to both of them having too high of a computed cost:
[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5043 ~ core[ca81]::option[0]::{{impl}}[0]::map[0]), substs: [i32, bool, [[email protected]:3:11: 3:21 b:&i32]], bb: bb0, location: SourceInfo { span: slow.rs:3:5: 3:22, scope: scope[0] } } [cost=204 > threshold=100]
[DEBUG rustc_mir::transform::inline] checking whether to inline callsite CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } }
[DEBUG rustc_mir::transform::inline] consider_optimizing(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] should_inline(CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } })
[DEBUG rustc_mir::transform::inline] final inline threshold = 100
[DEBUG rustc_mir::transform::inline] NOT inlining CallSite { callee: DefId(2:5040 ~ core[ca81]::option[0]::{{impl}}[0]::unwrap_or[0]), substs: [bool], bb: bb1, location: SourceInfo { span: slow.rs:3:5: 4:23, scope: scope[0] } } [cost=123 > threshold=100]
The fact that LLVM decides to inline these functions suggets that we might be overly conservative in how we calculating inlining cost.
Hopefully, this situation will be improved by https://github.com/rust-lang/rust/pull/68528, which specifically calls out unwrap_or
as having improved MIR generation.
@Aaron1011 I can confirm that, with #68528, unwrap_or
becomes eligible for MIR inlining in unwrap_combinators
, and the two functions compile to the same assembly with -Z mir-opt-level=3
.
Should this issue be closed now? Since it's been solved by #68528. Or maybe since it's still under mir-opt-3
, and therefore unstable, it's still worth left open? :)
This won't be fixed until MIR inlining becomes more usable and should remain open. I would like to see an "I-slow-fixed-by-MIR-inlining" tag so issues like this and #66234 can be triaged more efficiently.
That makes a lot of sense :slightly_smiling_face:
Most helpful comment
@Aaron1011 I can confirm that, with #68528,
unwrap_or
becomes eligible for MIR inlining inunwrap_combinators
, and the two functions compile to the same assembly with-Z mir-opt-level=3
.