This is a tracking issue for the RFC hint::bench_black_box
(rust-lang/rfcs#2360).
Steps:
Unresolved questions:
[ ] const fn
: it is unclear whether bench_black_box
should be a const fn
. If it
were, that would hint that it cannot have any side-effects, or that it cannot
do anything that const fn
s cannot do.
[ ] Naming: during the RFC discussion it was unclear whether black_box
is the
right name for this primitive but we settled on bench_black_box
for the time
being. We should resolve the naming before stabilization.
Also, we might want to add other benchmarking hints in the future, like
bench_input
and bench_output
, so we might want to put all of this
into a bench
sub-module within the core::hint
module. That might
be a good place to explain how the benchmarking hints should be used
holistically.
Some arguments in favor or against using "black box" are that:
_box
has nothing to do with Box
or box
-syntax, which might be confusingAlternative names suggested: pessimize
, unoptimize
, unprocessed
, unknown
,
do_not_optimize
(Google Benchmark).
I believe the main implementation work remaining is to enhance the documentation of hint::black_box
as well as to rename it.
An idea I wish I had earlier (but we can still try it out):
wasm
/miri
support)--test
crate:Note that because this is a warning, and not an error, it doesn't add to the dreaded post-monomorphization errors. We could also just do 2. but not 1, if we need to.
#[inline]
and/or generics could then be used to delay the codegen until the --test
crate, so it can do its intended purpose.
Would be interesting to know what the benchmarking frameworks out there use, in terms of test runners, and if they end up in a rustc --test
crate (i.e. via cargo test
) or just a plain executable.
The term "fence" may also be appropriate, as it doesn't prevent optimisation of the value inside it, but rather, doesn't allow optimisations to cross the barrier.
For example, vec.push(bench_black_box(convoluted_way_to_compute_zero()))
could still be optimised to vec.push(bench_black_box(0))
, but vec.push
couldn't use the knowledge that the value is zero to optimise itself.
Another argument in favor of black_box
: not only is it a common term in computer programming, but already used a lot in all kinds of documentation/resources about Rust. For example, benchmarking libraries such as Criterion use black_box
and plenty of StackOverflow questions and answers use black_box
. Also, I doubt that anyone would really think of Box
when reading black_box
, and even if, this confusion should be resolved quickly and I can't see how it would lead to bugs. I don't really see a good reason to change an established name for a questionable benefit.
I don't want to retreat ground that has been covered to exhaustion and beyond in the RFC discussion, but since all discussion here including the issue text is not even mentioning the key point that was blocking the black_box
name it seems necessary to quote it again:
I'm strongly opposed to adding a function to the language called black_box that doesn't guarantee the expected semantics of a black_box function in other languages and systems. There are a number of important unsafe operations that rely on the ability to create a "black box" from the compiler to show that e.g. a pointer escapes or some other operation is not elided.
The semantics this concern was in reference to is (roughly) what's in the accepted RFC: this function is not actually guaranteed to be a "black box" to anyone, least of all the optimizer -- more explicitly: it is allowed to be implemented as completely transparent identity function, and behaves as such for purposes of e.g. whether a program is UB. Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods), anyone who wants the black_box
name to be accepted has to convincingly argue that the aforementioned concern is actually a non-issue for some reason. You're invited to do that, but be aware that quibbling over e.g. the potential for confusion with Box
isn't going to help with that.
By the way, the same concern probably applies to most names mentioned as alternatives so far, as they all stress the "blocks optimizations / blinds the compiler" angle without simultaneously highlighting that it's is only suitable for benchmarks, where surprisingly aggressive optimizations only lead to confusing results rather than UB or other serious wrongness.
const fn: it is unclear whether bench_black_box should be a const fn. If it
were, that would hint that it cannot have any side-effects, or that it cannot
do anything that const fns cannot do.
We can implement this using the approach that kind of got consensus in https://github.com/rust-lang/rust/pull/64683:
fn bench_black_box_rt<T>(x: T) -> T { asm!("....") }
const fn bench_black_box_ct<T>(x: T) -> T { x }
pub const fn bench_black_box<T>(x: T) {
const_eval_select(bench_black_box_ct, bench_black_box_rt)
}
cc @Ralf @oli-obk
(Hello. I ended up reading this issue because I tried to solve the problem black_box
is aimed at a different way, and found some to-me-surprising compiler behaviour - #74476. @the8472 helpfully referred me to black_box
.)
Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods),
Sorry to pick up on this. I definitely do not want to re-litigate a decision which has already been made. However:
I skimread a lot of these discussions including https://github.com/rust-lang/rust-memory-model/issues/45 . Much of the compiler internals went over my head; and as a result I wasn't able to see why this had been decided this way. Of course I looked in the RFC which ought to mention this under _Alternatives_, but it doesn't seem to discuss it at all.
It seems to me that regardless of whether the semantics question is regarded as settled, it needs to be addressed in the RFC text. _Especially_ if it was controversial! I.e. the _Alternatives_ should have a section explaining why the proposal does not provide a useful semantic guarantee. The best description of the most desirable guarantee seems to me to be in https://github.com/rust-lang/rfcs/issues/1484#issuecomment-182392175 or https://github.com/rust-lang/rust-memory-model/issues/45#issuecomment-374369870. Therefore it would probably be best for the RFC to quote one of those, or to paraphrase them, acknowledge that that would be the most useful thing to provide, and then explain why the RFC proposes something different.
Since the existing RFC was merged, maybe this would need to be done as part of an attempt at stabilisation.
With the current very weak semantics, I strongly agree with @hanna-kruppe's concerns over the misleading name.
I also agree with @hanna-kruppe that black_box
is probably a misleading name for a function with the semantics in the RFC. To try and move discussion forward, here are some alternative suggestions that focus specifically on what the function _does_, rather than what it's "used for" or what it might do:
assume_used
mock_use
use_and
use_then
emulate_use
mark_used
I particularly like std::hint::assume_used
. For the RFC example, this (I think) reads quite well (and accurate):
fn push_cap(v: &mut Vec<i32>) {
for i in 0..4 {
assume_used(v.as_ptr());
v.push(bench_black_box(i));
assume_used(v.as_ptr());
}
}
It also works even when used as an identity function:
let x = assume_used(x);
Is still _fairly_ clear doesn't _change_ the value in any way.
I went ahead and created a PR that renames the method to assume_used
in https://github.com/rust-lang/rust/pull/74853 to encourage further directed discussion.
@jonhoo I wonder how would it go along with other assume-like functions in
the standard library. There is assume
intrinsic and a family of assume_init
functions, for those it is responsibility of the caller to guarantee the thing
being assumed. In the proposed assume_used
, the role of the callee.
@tmiasko I replied to your comment over in https://github.com/rust-lang/rust/pull/74853#issuecomment-665019065. Figured it'd be good to keep all the discussion about this particular name suggestion to one PR.
I was seeing some weird effects from using black_box
in conjunction with counting instructions, and then I remembered that black_box
's current incarnation has to place the value on the stack.
So I tried making my own, and ended up with this, which is sadly limited to values that fit in a register (godbolt):
pub fn better_black_box(mut x: u64) -> u64 {
unsafe { asm!("/* {x} */", x = inout(reg) x); }
x
}
From my brief testing, it seems to result in 3 less instructions than std::hint::black_box
, at this time, as counted by the CPU (what perf stat
calls instructions:u
, although I'm using rdpmc
directly to get my data).
Might be interesting to see if we could specialize the version in libstd (based on the type) so that it's cheaper, for at least integers, but maybe there's not really any point to that. cc @Amanieu
You don't even need specialization, you can just check size_of::<T>
to see if it fits in a usize
and then transmute_copy
it. (godbolt)
You need to avoid the case when T
may have destructors, as transmute_copy
will unsafely duplicate the value, but other than that, seems like it could work great, thanks! (we could presumably also have cases for types smaller than usize
, or types the size of two usize
s, like &[T]
, as long as we don't introduce extra work beyond forcing the values into registers)
Adding options(nostack)
removes the push
and pop
instructions as well:
asm!("/* {x} */", x = inout(reg) int_x, options(nostack));
example::better_black_box:
mov rax, rdi
#APP
#NO_APP
ret
My latest suggestion from #74853 was consume_and_produce
. The idea being that it is a function that consumes the given value in some unspecified way, and produces a value in some unspecified way (that happens to be identical to the consumed value).
This hasn't seen some activity in a while. Has an acceptable default implementation for this been found? It looks like the latest "best implementation" would be a combination of the proposals by @eddyb, @Amanieu, and @m-ou-se
I made some modifications and created the following godbolt:
fn better_black_box<T>(mut x: T) -> T {
unsafe {
if mem::size_of::<T>() <= mem::size_of::<usize>() {
let man_x = mem::ManuallyDrop::new(x);
let mut int_x: usize = mem::transmute_copy(&man_x);
asm!("/* {x} */", x = inout(reg) int_x, options(nostack));
x = mem::transmute_copy(&int_x);
} else {
asm!("/* {x} */", x = in(reg) &mut x);
}
}
x
}
It's a small modification from their code that makes it work for anything that'll fit in a usize
(not sure if it's completely sound), and also fixes the destructor issue that @eddyb brought up, I think.
Has any consensus been come to on the name? If not, I'd like to humbly propose my own option: bench_identity
. I think this name works well as it describes the actual behavior of the function (it behaves as an identity) and the purpose of the function (its use in benchmarking).
let mut int_x: usize = mem::transmute_copy(&man_x);
This line is unsound if x
is smaller than usize
, see the documentation for transmute_copy
.
@Amanieu that's what I figured. Would we then need to do a separate branch for each integer size? Is there any way to get it to work for 3,5,6,7-byte types? I tried using a byte-array but you can't pass that directly to asm. Perhaps I could
I'll have to try that out. Would that be sound?
I'd just use the fallback path if it doesn't fit in a usize
exactly.
Well as an exercise I created a version which works for any size up to 8. godbolt
For 3,5,6,7 it only saves 1 instruction, but it saves 5 stack operations. Not sure how much of a difference that makes.
Keep in mind that your code is a very artificial benchmark. In practice, the different components of a struct are likely to be in separate registers. In that case it is usually faster to store everything to memory than to do bit shuffling to fit everything in one register.
Oh absolutely. It would certainly require some more realistic benchmarks to confirm this as a viable optimization. It was just fun to implement as an exercise.
Most helpful comment
I don't want to retreat ground that has been covered to exhaustion and beyond in the RFC discussion, but since all discussion here including the issue text is not even mentioning the key point that was blocking the
black_box
name it seems necessary to quote it again:The semantics this concern was in reference to is (roughly) what's in the accepted RFC: this function is not actually guaranteed to be a "black box" to anyone, least of all the optimizer -- more explicitly: it is allowed to be implemented as completely transparent identity function, and behaves as such for purposes of e.g. whether a program is UB. Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods), anyone who wants the
black_box
name to be accepted has to convincingly argue that the aforementioned concern is actually a non-issue for some reason. You're invited to do that, but be aware that quibbling over e.g. the potential for confusion withBox
isn't going to help with that.By the way, the same concern probably applies to most names mentioned as alternatives so far, as they all stress the "blocks optimizations / blinds the compiler" angle without simultaneously highlighting that it's is only suitable for benchmarks, where surprisingly aggressive optimizations only lead to confusing results rather than UB or other serious wrongness.