rustc beta (1.28.0) generates unnecessary call instruction

Created on 23 Jul 2018  ·  10Comments  ·  Source: rust-lang/rust

For this code:

pub struct A([i32; 4]);
impl A {
    pub fn foo(self) -> Self { self }
    pub fn bar(self) -> Self { self }
}

rustc beta (1.28.0) with opt-level=3 generates:

example::A::bar:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

example::A::foo:
        push    rbx
        mov     rbx, rdi
        call    example::A::bar@PLT
        mov     rax, rbx
        pop     rbx
        ret

rustc 1.27.1 and rustc beta with opt-level=s generates:

example::A::foo:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

example::A::bar:
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rdi], xmm0
        mov     rax, rdi
        ret

The beta with opt-level=3 seems to be anti-inlining the identical functions.

I-slow P-medium T-compiler regression-from-stable-to-stable

Most helpful comment

unnamed_addr doesn't have the expected effect because here the only possible outcomes are:

  1. If bar has local linkage, then it is erased (all uses are replaced with foo).
  2. Otherwise, if creating a thunk is profitable, bar becomes a thunk (tail-)calling foo (which isn't lowered as an actual tail call because of sret, as above). This is what happened in the parent post.
  3. Otherwise, foo and bar remain unchanged (all local uses of bar are replaced with foo, but the definition is kept for external linkage).

There is no attempt to alias bar with foo. Two things should probably be fixed in LLVM code:

  1. isThunkProfitable should be less optimistic (currently it returns true for all functions longer than 2 instructions). This will, at least, avoid the bloat seen in the parent post.
  2. writeThunkOrAlias as described here should replace writeThunk. Then we can be even more space-efficient by making bar an alias for foo.

I will look into submitting a patch to LLVM.

All 10 comments

cc @rust-lang/compiler

(I've not attempted a reproduction locally)

This might be due to #49479, the time frame seems right and the code change observed here is precisely what MergeFunctions does. Some observations:

  • it's counter intuitive that this doesn't happen with opt-level=s
  • it's silly that it doesn't generate a tail call, wonder why that happens
  • if we decided that distinct source-level functions are allowed to have identical addresses and told LLVM about it with the unnamed_addr attribute, MergeFunctions could turn one function into a link-time alias for the other instead of creating a thunk that calls

cc @nox

@rkruppe Huh, I thought we used unnamed_addr already, and didn't make any promises?

@eddyb Evidently we're not using it. I don't know of any reason to make promises here, but since we de facto don't merge addresses, who knows whether anyone in the wild is relying on it for crazy shenanigans 🤷‍♂️

Interestingly, the LLVM IR the sample code results in appears to be using both unnamed_addr and tail_call, but the assembly is the same as in the parent post.

define void @bar(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16) %self) unnamed_addr #0 {
 ; ...
}
; ...
define void @foo(%A* noalias nocapture sret dereferenceable(16), %A* noalias nocapture readonly dereferenceable(16)) unnamed_addr #0 {
  tail call void @bar(%A* noalias nocapture sret dereferenceable(16) %0, %A* noalias nocapture readonly dereferenceable(16) %1) #0
  ret void
}

The lack of a tail call is caused by the sret attribute. Relevant LLVM code.

unnamed_addr doesn't have the expected effect because here the only possible outcomes are:

  1. If bar has local linkage, then it is erased (all uses are replaced with foo).
  2. Otherwise, if creating a thunk is profitable, bar becomes a thunk (tail-)calling foo (which isn't lowered as an actual tail call because of sret, as above). This is what happened in the parent post.
  3. Otherwise, foo and bar remain unchanged (all local uses of bar are replaced with foo, but the definition is kept for external linkage).

There is no attempt to alias bar with foo. Two things should probably be fixed in LLVM code:

  1. isThunkProfitable should be less optimistic (currently it returns true for all functions longer than 2 instructions). This will, at least, avoid the bloat seen in the parent post.
  2. writeThunkOrAlias as described here should replace writeThunk. Then we can be even more space-efficient by making bar an alias for foo.

I will look into submitting a patch to LLVM.

For the record, I've submitted a couple of patches for MergeFuncs to improve handling in some cases and add alias support. They're accepted, but not landed yet:

https://reviews.llvm.org/D53262
https://reviews.llvm.org/D53271
https://reviews.llvm.org/D53285

The alias functionality will be behind a -mergefunc-use-aliases flag, because historically there have been issues here with lacking linker support. We can try enabling this flag for rust and see if there are any issues.

The patches have all landed upstream. After the next LLVM update, we can enable use of aliases by passing -mergefunc-use-aliases in https://github.com/rust-lang/rust/blob/f1d61837d1cf058cfbd0902b0bf79a2657b81187/src/librustc_codegen_llvm/llvm_util.rs#L54

Was this page helpful?
0 / 5 - 0 ratings