I have a big Copy
type:
#[derive(Clone, Copy)]
struct HugeCopyType {
arr: [i64; 1024],
}
I'm passing it to a function by value, and the function does not modify the argument which means it should be equivalent to passing it by reference:
fn process(arg: HugeCopyType) {
let sum = arg.arr.iter().sum::<i64>();
println!("{}", sum);
}
I'm calling the function from a loop:
let hct = HugeCopyType {
arr: [0; 1024],
};
for _ in 0..512 {
process(hct);
}
I expected Rust to be smart about this and convert the argument to be passed by reference, however it seems like that is not the case and it ends up being memcpy
'd over and over inside the loop (see godbolt link above).
Changing the argument to pass by reference manually gets rid of the memcpy
s.
Rust: 1.26.0 (stable) and also beta and nightly.
Since the function takes the argument by value it can make arbitrary mutations to the value. Passing by reference would cause those mutations to be reflected in the calling function and subsequent invocations of process. The function is being passed a pointer, just a pointer to a fresh copy of the value.
The point is, the function doesn't modify the value (it doesn't even take mut arg
), but the copy is still not optimized away.
Whether the function actually mutates the argument really really shouldn't not affect its ABI. It's probably possible: we don't guarantee ABI stability for "Rust" ABI, shims can fix the problems with function pointers / vtables, and we could stick a "needs copy of this argument?" flag into metadata. But it's not desirable: it's a ton of work and really messes with incremental compilation and is just overall a huge mess.
There is a way to make the ABI the same whether or not the function mutates the value (unconditionally take pointer to the caller's original data, make a copy locally if you want to mutate it) but that has far-reaching repercussions and I don't know for sure whether it's actually correct. I also think it's not an unequivocal win: while it would avoid copies in this case, it adds extra copies compared to the status quo if the value being passed is a temporary rather than a copy of a longer-lived allocation.
There's also the issue that passing a pointer to the caller's actual local variable would be observable through the address of the parameter (consider foo(arg: HugeCopy, by_ref: &HugeCopy)
and foo
does ptr::eq(&arg, by_ref)
). I'm not sure whether that is actually a deal breaker but it seems quite unfortunate.
I completely understand the appeal of having a clear ABI boundary between function calls, but it shouldn't get in the way of _zero-cost abstractions_, Rust's core principle. Ownership, borrowing and Copy
semantics are the abstractions imposed on us by Rust, and Rust should try to keep them as "zero-cost" as possible.
Functions and function calls are abstractions too. _This is a hot path_. Normally, the function will get completely _inlined_ (doesn't that affect ABI and incremental compilation..?); in this case it's impossible or undesirable, but we should still benefit from as many compiler optimizations as possible. (I'm thinking about it more in terms of "inlining" the info whether the function needs its argument memcopied, if we can't inline the whole function.)
On another note, I really don't think huge Copy
types are even a good idea. Copy
_means_ a small primitive value that is so cheap to .clone()
we shouldn't need to do it explicitly. An int or a reference, yes, a huge buffer that needs to be memcopied around, no. Maybe rustc/clippy could at least lint on implicit memcopies of huge Copy
types, if they don't already. After all, making expensive operations _visibly expensive_ is another of Rust's principles.
I agree with the points @bugaevc makes.
The Copy
docs currently say:
Generally speaking, if your type can implement Copy, it should.
Which is the advise I've been following thus far. It also makes sense to pass Copy
types by value everywhere, since it's less restrictive than passing by reference in the general case, and because for small types (up to a couple of pointers in size) it can indeed lead to better performance due to not having to dereference extra pointers.
However, with the current state of affairs it looks like I have to _guess_ whether I should pass a Copy
type by reference or by value. Should I pass a type containing three ints by value? Four ints? Actually, the answer depends on lots of other context like the target architecture (number of registers), number and sizes of other parameters to every given function and so on. In this case even a lint doesn't really help, which is why I'm expecting the compiler to figure out the best way of passing a value in every given situation on its own.
I don't have very detailed opinions on this (and even if I had, I don't have the time right now to write a novel), but some points briefly:
#[inline]
attribute or the like). Under this proposal, all codegen units that mention the changed function at all need to be recompiled if the "needs to copy arg?" flag of a function changes, even those where inlining the function wouldn't be possible in the first place.We already have a lint in clippy that tells you to change small &T
arguments to T
if T: Copy
. We could totally add the opposite for large T
and suggest taking &T
. This makes sure you do get the optimization for the relevant cases without causing a lot of compiler optimization headaches.
We already have a lint in clippy that tells you to change small &T arguments to T if T: Copy. We could totally add the opposite for large T and suggest taking &T. This makes sure you do get the optimization for the relevant cases without causing a lot of compiler optimization headaches.
That's great! How do you decide if the type is small enough, though? Since AFAIU it depends on the platform and other circumstances.
How do you decide if the type is small enough, though? Since AFAIU it depends on the platform and other circumstances.
We do check the platform :D and hope for the best. If you have any improvement suggestions we're all ears.
I'll go ahead and close this in favor of the discussed clippy lint. Feel free to reopen if you think this issue is valuable. This would need far-reaching changes to incremental compilation and how the compiler works at the ABI level though.
Most helpful comment
We already have a lint in clippy that tells you to change small
&T
arguments toT
ifT: Copy
. We could totally add the opposite for largeT
and suggest taking&T
. This makes sure you do get the optimization for the relevant cases without causing a lot of compiler optimization headaches.