Returning a newly created struct from a function causes an unnecessary memcpy even if the function is inlined:
use std::mem::uninitialized;
#[repr(C)]
pub struct SomeStruct(u64, u32, u16, u8, usize, u64, u64, u32, usize, u16, u32, usize, isize);
extern {
pub fn SomeExternFun(val: *mut SomeStruct);
}
#[inline(always)]
unsafe fn ret_val() -> SomeStruct {
let mut val: SomeStruct = uninitialized();
SomeExternFun(&mut val);
val
}
#[inline(always)]
unsafe fn by_ref(val: &mut SomeStruct) {
SomeExternFun(val);
}
pub unsafe fn call_ret_val() {
let val = ret_val();
println!("{}", val.0);
}
pub unsafe fn call_by_ref() {
let mut val: SomeStruct = uninitialized();
by_ref(&mut val);
println!("{}", val.0);
}
playground::call_ret_val: # @playground::call_ret_val
# %bb.0:
sub rsp, 184
lea rdi, [rsp + 16]
call qword ptr [rip + SomeExternFun@GOTPCREL]
movups xmm0, xmmword ptr [rsp + 80]
movaps xmmword ptr [rsp + 160], xmm0
movups xmm0, xmmword ptr [rsp + 16]
movups xmm1, xmmword ptr [rsp + 32]
movups xmm2, xmmword ptr [rsp + 48]
movups xmm3, xmmword ptr [rsp + 64]
movaps xmmword ptr [rsp + 144], xmm3
movaps xmmword ptr [rsp + 128], xmm2
movaps xmmword ptr [rsp + 112], xmm1
movaps xmmword ptr [rsp + 96], xmm0
lea rax, [rsp + 96]
mov qword ptr [rsp], rax
mov rax, qword ptr [rip + core::fmt::num::<impl core::fmt::Display for u64>::fmt@GOTPCREL]
mov qword ptr [rsp + 8], rax
lea rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.2]
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 24], 2
lea rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.3]
mov qword ptr [rsp + 32], rax
mov qword ptr [rsp + 40], 1
mov rax, rsp
mov qword ptr [rsp + 48], rax
mov qword ptr [rsp + 56], 1
lea rdi, [rsp + 16]
call qword ptr [rip + std::io::stdio::_print@GOTPCREL]
add rsp, 184
ret
# -- End function
playground::call_by_ref: # @playground::call_by_ref
# %bb.0:
push rbx
sub rsp, 144
lea rbx, [rsp + 64]
mov rdi, rbx
call qword ptr [rip + SomeExternFun@GOTPCREL]
mov qword ptr [rsp], rbx
mov rax, qword ptr [rip + core::fmt::num::<impl core::fmt::Display for u64>::fmt@GOTPCREL]
mov qword ptr [rsp + 8], rax
lea rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.2]
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 24], 2
lea rax, [rip + .Lanon.865785a9e3ffc8ff7f56f0b35003ddee.3]
mov qword ptr [rsp + 32], rax
mov qword ptr [rsp + 40], 1
mov rax, rsp
mov qword ptr [rsp + 48], rax
mov qword ptr [rsp + 56], 1
lea rdi, [rsp + 16]
call qword ptr [rip + std::io::stdio::_print@GOTPCREL]
add rsp, 144
pop rbx
ret
# -- End function
Removing this kind of memcpy is generally the job of call slot optimization. I believe in this case it will fail because SomeExternFun() may capture the pointer.
Not totally sure whether that restriction needs to apply if the destination is an alloca, as we're just changing one alloca for another.
Yeah, I think that as-is LLVM is not permitted to optimize away that memcpy. SomeExternFun() func may capture the pointer and println captures the pointer and together they could observe whether or not that memcpy has been optimized away.
UItimately the really issue here is the println, which captures the pointer and prevents SROA from working. See https://github.com/rust-lang/rust/issues/50519#issuecomment-447566009. If you write println!("{}", {val.0}); instead it optimizes properly.
If you write
println!("{}", {val.0});instead it optimizes properly.
Unfortunately, this is not a solution when we need to pass a pointer again - for example, replacing println with another call to SomeExternFun brings memcpy back.
@MSxDOS I'm not totally certain, but I think it's not legal to optimize the memcpy away in this case. Assuming you have code like this:
unsafe fn ret_val() -> SomeStruct {
let mut val: SomeStruct = uninitialized();
SomeExternFun(&mut val);
val
}
pub unsafe fn call_ret_val() {
let val = ret_val();
SomeExternFun2(&val);
}
Then SomeExternFun() could store the pointer passed to it and SomeExternFun2() could store the pointer passed to it and then compare the two pointers. With the memcpy the pointers will differ, without it they will be the same, changing observable behavior.
Maybe someone (@rkruppe or @RalfJung?) could comment on whether the result of that pointer comparison might be unspecified behavior? I'm pretty sure it's well defined under LLVM semantics (making elision of this memcpy illegal for LLVM), but possibly in rustc we could have relaxed rules (something akin to RVO)?
The pointer passed to SomeExternFun is dead and dangling by the time it gets compared, so it doesn't seem unreasonable to say that a new, live pointer could re-use the same address.
Cc @eddyb
Is unsafe operation a necessary condition to reproduce this?
I think this is directly related to the unnecessary memcpy when using Box::new() (https://github.com/rust-lang/rust/issues/49733#issuecomment-433150210)
You can do
unsafe {
let mut fast_box: Box<T> = Box::new(std::mem::uninitialized());
(fast_box.as_mut() as *mut T).write(/* stuff */);
}
to write directly to the allocation as when using box syntax, but if you try to wrap it in a fuction and return the box, the memcpy is back :angry: -> Playground
This will probably be solved by #47954 (which will hopefully get revived in the next few months).
That's assuming there's a copy/move in the MIR, that we can get rid of.
The first example was fixed by https://github.com/rust-lang/rust/pull/72205 :tada:
https://github.com/rust-lang/rust/issues/57077#issuecomment-488240316 - this still copies stuff.
https://rust.godbolt.org/z/PGCmf7 - here call_eat_guid_val initializes the struct in the stack while call_eat_guid_ref just references it from a read-only section.
Most helpful comment
The first example was fixed by https://github.com/rust-lang/rust/pull/72205 :tada:
https://github.com/rust-lang/rust/issues/57077#issuecomment-488240316 - this still copies stuff.
https://rust.godbolt.org/z/PGCmf7 - here
call_eat_guid_valinitializes the struct in the stack whilecall_eat_guid_refjust references it from a read-only section.