For this code:
pub struct A([i32; 4]);
impl A {
pub fn foo(self) -> Self { self }
pub fn bar(self) -> Self { self }
}
rustc beta (1.28.0) with opt-level=3 generates:
example::A::bar:
movups xmm0, xmmword ptr [rsi]
movups xmmword ptr [rdi], xmm0
mov rax, rdi
ret
example::A::foo:
push rbx
mov rbx, rdi
call example::A::bar@PLT
mov rax, rbx
pop rbx
ret
rustc 1.27.1 and rustc beta with opt-level=s generates:
example::A::foo:
movups xmm0, xmmword ptr [rsi]
movups xmmword ptr [rdi], xmm0
mov rax, rdi
ret
example::A::bar:
movups xmm0, xmmword ptr [rsi]
movups xmmword ptr [rdi], xmm0
mov rax, rdi
ret
The beta with opt-level=3 seems to be anti-inlining the identical functions.
cc @rust-lang/compiler
(I've not attempted a reproduction locally)
This might be due to #49479, the time frame seems right and the code change observed here is precisely what MergeFunctions does. Some observations:
opt-level=sunnamed_addr attribute, MergeFunctions could turn one function into a link-time alias for the other instead of creating a thunk that callscc @nox
@rkruppe Huh, I thought we used unnamed_addr already, and didn't make any promises?
@eddyb Evidently we're not using it. I don't know of any reason to make promises here, but since we de facto don't merge addresses, who knows whether anyone in the wild is relying on it for crazy shenanigans 🤷♂️
Interestingly, the LLVM IR the sample code results in appears to be using both unnamed_addr and tail_call, but the assembly is the same as in the parent post.
define void @bar(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16) %self) unnamed_addr #0 {
; ...
}
; ...
define void @foo(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
tail call void @bar(%A* noalias nocapture sret dereferenceable(16) %0, %A* noalias nocapture readonly dereferenceable(16) %1) #0
ret void
}
The lack of a tail call is caused by the sret attribute. Relevant LLVM code.
unnamed_addr doesn't have the expected effect because here the only possible outcomes are:
bar has local linkage, then it is erased (all uses are replaced with foo).bar becomes a thunk (tail-)calling foo (which isn't lowered as an actual tail call because of sret, as above). This is what happened in the parent post.foo and bar remain unchanged (all local uses of bar are replaced with foo, but the definition is kept for external linkage).There is no attempt to alias bar with foo. Two things should probably be fixed in LLVM code:
isThunkProfitable should be less optimistic (currently it returns true for all functions longer than 2 instructions). This will, at least, avoid the bloat seen in the parent post.writeThunkOrAlias as described here should replace writeThunk. Then we can be even more space-efficient by making bar an alias for foo.I will look into submitting a patch to LLVM.
For the record, I've submitted a couple of patches for MergeFuncs to improve handling in some cases and add alias support. They're accepted, but not landed yet:
https://reviews.llvm.org/D53262
https://reviews.llvm.org/D53271
https://reviews.llvm.org/D53285
The alias functionality will be behind a -mergefunc-use-aliases flag, because historically there have been issues here with lacking linker support. We can try enabling this flag for rust and see if there are any issues.
The patches have all landed upstream. After the next LLVM update, we can enable use of aliases by passing -mergefunc-use-aliases in https://github.com/rust-lang/rust/blob/f1d61837d1cf058cfbd0902b0bf79a2657b81187/src/librustc_codegen_llvm/llvm_util.rs#L54
Most helpful comment
unnamed_addrdoesn't have the expected effect because here the only possible outcomes are:barhas local linkage, then it is erased (all uses are replaced withfoo).barbecomes a thunk (tail-)callingfoo(which isn't lowered as an actual tail call because ofsret, as above). This is what happened in the parent post.fooandbarremain unchanged (all local uses ofbarare replaced withfoo, but the definition is kept for external linkage).There is no attempt to alias
barwithfoo. Two things should probably be fixed in LLVM code:isThunkProfitableshould be less optimistic (currently it returns true for all functions longer than 2 instructions). This will, at least, avoid the bloat seen in the parent post.writeThunkOrAliasas described here should replacewriteThunk. Then we can be even more space-efficient by makingbaran alias forfoo.I will look into submitting a patch to LLVM.