Rust: Tracking issue for RFC 2360, `hint::bench_black_box`

Created on 2 Sep 2019  路  23Comments  路  Source: rust-lang/rust

This is a tracking issue for the RFC hint::bench_black_box (rust-lang/rfcs#2360).

Steps:

  • [ ] Implement the RFC
  • [ ] Stabilization PR

Unresolved questions:

  • [ ] const fn: it is unclear whether bench_black_box should be a const fn. If it
    were, that would hint that it cannot have any side-effects, or that it cannot
    do anything that const fns cannot do.

  • [ ] Naming: during the RFC discussion it was unclear whether black_box is the
    right name for this primitive but we settled on bench_black_box for the time
    being. We should resolve the naming before stabilization.

    Also, we might want to add other benchmarking hints in the future, like
    bench_input and bench_output, so we might want to put all of this
    into a bench sub-module within the core::hint module. That might
    be a good place to explain how the benchmarking hints should be used
    holistically.

    Some arguments in favor or against using "black box" are that:

    • pro: [black box] is a common term in computer programming, that conveys
      that nothing can be assumed about it except for its inputs and outputs.
      con: [black box] often hints that the function has no side-effects, but
      this is not something that can be assumed about this API.
    • con: _box has nothing to do with Box or box-syntax, which might be confusing

    Alternative names suggested: pessimize, unoptimize, unprocessed, unknown,

    do_not_optimize (Google Benchmark).


B-RFC-approved C-tracking-issue E-easy Libs-Tracked T-compiler T-libs

Most helpful comment

I don't want to retreat ground that has been covered to exhaustion and beyond in the RFC discussion, but since all discussion here including the issue text is not even mentioning the key point that was blocking the black_box name it seems necessary to quote it again:

I'm strongly opposed to adding a function to the language called black_box that doesn't guarantee the expected semantics of a black_box function in other languages and systems. There are a number of important unsafe operations that rely on the ability to create a "black box" from the compiler to show that e.g. a pointer escapes or some other operation is not elided.

The semantics this concern was in reference to is (roughly) what's in the accepted RFC: this function is not actually guaranteed to be a "black box" to anyone, least of all the optimizer -- more explicitly: it is allowed to be implemented as completely transparent identity function, and behaves as such for purposes of e.g. whether a program is UB. Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods), anyone who wants the black_box name to be accepted has to convincingly argue that the aforementioned concern is actually a non-issue for some reason. You're invited to do that, but be aware that quibbling over e.g. the potential for confusion with Box isn't going to help with that.

By the way, the same concern probably applies to most names mentioned as alternatives so far, as they all stress the "blocks optimizations / blinds the compiler" angle without simultaneously highlighting that it's is only suitable for benchmarks, where surprisingly aggressive optimizations only lead to confusing results rather than UB or other serious wrongness.

All 23 comments

I believe the main implementation work remaining is to enhance the documentation of hint::black_box as well as to rename it.

An idea I wish I had earlier (but we can still try it out):

  1. use an intrinsic (required anyway for nice wasm/miri support)
  2. when codegening the intrinsic in a non---test crate:

    • don't generate the inline assembly (so LLVM can optimize right through it)

    • emit a warning that it should only be used in a test crate

Note that because this is a warning, and not an error, it doesn't add to the dreaded post-monomorphization errors. We could also just do 2. but not 1, if we need to.

#[inline] and/or generics could then be used to delay the codegen until the --test crate, so it can do its intended purpose.

Would be interesting to know what the benchmarking frameworks out there use, in terms of test runners, and if they end up in a rustc --test crate (i.e. via cargo test) or just a plain executable.

The term "fence" may also be appropriate, as it doesn't prevent optimisation of the value inside it, but rather, doesn't allow optimisations to cross the barrier.

For example, vec.push(bench_black_box(convoluted_way_to_compute_zero())) could still be optimised to vec.push(bench_black_box(0)), but vec.push couldn't use the knowledge that the value is zero to optimise itself.

Another argument in favor of black_box: not only is it a common term in computer programming, but already used a lot in all kinds of documentation/resources about Rust. For example, benchmarking libraries such as Criterion use black_box and plenty of StackOverflow questions and answers use black_box. Also, I doubt that anyone would really think of Box when reading black_box, and even if, this confusion should be resolved quickly and I can't see how it would lead to bugs. I don't really see a good reason to change an established name for a questionable benefit.

I don't want to retreat ground that has been covered to exhaustion and beyond in the RFC discussion, but since all discussion here including the issue text is not even mentioning the key point that was blocking the black_box name it seems necessary to quote it again:

I'm strongly opposed to adding a function to the language called black_box that doesn't guarantee the expected semantics of a black_box function in other languages and systems. There are a number of important unsafe operations that rely on the ability to create a "black box" from the compiler to show that e.g. a pointer escapes or some other operation is not elided.

The semantics this concern was in reference to is (roughly) what's in the accepted RFC: this function is not actually guaranteed to be a "black box" to anyone, least of all the optimizer -- more explicitly: it is allowed to be implemented as completely transparent identity function, and behaves as such for purposes of e.g. whether a program is UB. Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods), anyone who wants the black_box name to be accepted has to convincingly argue that the aforementioned concern is actually a non-issue for some reason. You're invited to do that, but be aware that quibbling over e.g. the potential for confusion with Box isn't going to help with that.

By the way, the same concern probably applies to most names mentioned as alternatives so far, as they all stress the "blocks optimizations / blinds the compiler" angle without simultaneously highlighting that it's is only suitable for benchmarks, where surprisingly aggressive optimizations only lead to confusing results rather than UB or other serious wrongness.

const fn: it is unclear whether bench_black_box should be a const fn. If it
were, that would hint that it cannot have any side-effects, or that it cannot
do anything that const fns cannot do.

We can implement this using the approach that kind of got consensus in https://github.com/rust-lang/rust/pull/64683:

fn bench_black_box_rt<T>(x: T) -> T { asm!("....") }
const fn bench_black_box_ct<T>(x: T) -> T { x }
pub const fn bench_black_box<T>(x: T) { 
    const_eval_select(bench_black_box_ct, bench_black_box_rt)
}

cc @Ralf @oli-obk

(Hello. I ended up reading this issue because I tried to solve the problem black_box is aimed at a different way, and found some to-me-surprising compiler behaviour - #74476. @the8472 helpfully referred me to black_box.)

Assuming nobody wants to re-litigate these semantics (doesn't seem like it so far, thank gods),

Sorry to pick up on this. I definitely do not want to re-litigate a decision which has already been made. However:

I skimread a lot of these discussions including https://github.com/rust-lang/rust-memory-model/issues/45 . Much of the compiler internals went over my head; and as a result I wasn't able to see why this had been decided this way. Of course I looked in the RFC which ought to mention this under _Alternatives_, but it doesn't seem to discuss it at all.

It seems to me that regardless of whether the semantics question is regarded as settled, it needs to be addressed in the RFC text. _Especially_ if it was controversial! I.e. the _Alternatives_ should have a section explaining why the proposal does not provide a useful semantic guarantee. The best description of the most desirable guarantee seems to me to be in https://github.com/rust-lang/rfcs/issues/1484#issuecomment-182392175 or https://github.com/rust-lang/rust-memory-model/issues/45#issuecomment-374369870. Therefore it would probably be best for the RFC to quote one of those, or to paraphrase them, acknowledge that that would be the most useful thing to provide, and then explain why the RFC proposes something different.

Since the existing RFC was merged, maybe this would need to be done as part of an attempt at stabilisation.

With the current very weak semantics, I strongly agree with @hanna-kruppe's concerns over the misleading name.

I also agree with @hanna-kruppe that black_box is probably a misleading name for a function with the semantics in the RFC. To try and move discussion forward, here are some alternative suggestions that focus specifically on what the function _does_, rather than what it's "used for" or what it might do:

  • assume_used
  • mock_use
  • use_and
  • use_then
  • emulate_use
  • mark_used

I particularly like std::hint::assume_used. For the RFC example, this (I think) reads quite well (and accurate):

fn push_cap(v: &mut Vec<i32>) {
    for i in 0..4 {
        assume_used(v.as_ptr());
        v.push(bench_black_box(i));
        assume_used(v.as_ptr());
    }
}

It also works even when used as an identity function:

let x = assume_used(x);

Is still _fairly_ clear doesn't _change_ the value in any way.

I went ahead and created a PR that renames the method to assume_used in https://github.com/rust-lang/rust/pull/74853 to encourage further directed discussion.

@jonhoo I wonder how would it go along with other assume-like functions in
the standard library. There is assume intrinsic and a family of assume_init
functions, for those it is responsibility of the caller to guarantee the thing
being assumed. In the proposed assume_used, the role of the callee.

@tmiasko I replied to your comment over in https://github.com/rust-lang/rust/pull/74853#issuecomment-665019065. Figured it'd be good to keep all the discussion about this particular name suggestion to one PR.

I was seeing some weird effects from using black_box in conjunction with counting instructions, and then I remembered that black_box's current incarnation has to place the value on the stack.

So I tried making my own, and ended up with this, which is sadly limited to values that fit in a register (godbolt):

pub fn better_black_box(mut x: u64) -> u64 {
    unsafe { asm!("/* {x} */", x = inout(reg) x); }
    x
}

From my brief testing, it seems to result in 3 less instructions than std::hint::black_box, at this time, as counted by the CPU (what perf stat calls instructions:u, although I'm using rdpmc directly to get my data).


Might be interesting to see if we could specialize the version in libstd (based on the type) so that it's cheaper, for at least integers, but maybe there's not really any point to that. cc @Amanieu

You don't even need specialization, you can just check size_of::<T> to see if it fits in a usize and then transmute_copy it. (godbolt)

You need to avoid the case when T may have destructors, as transmute_copy will unsafely duplicate the value, but other than that, seems like it could work great, thanks! (we could presumably also have cases for types smaller than usize, or types the size of two usizes, like &[T], as long as we don't introduce extra work beyond forcing the values into registers)

Adding options(nostack) removes the push and pop instructions as well:

            asm!("/* {x} */", x = inout(reg) int_x, options(nostack));
example::better_black_box:
        mov     rax, rdi
        #APP
        #NO_APP
        ret

My latest suggestion from #74853 was consume_and_produce. The idea being that it is a function that consumes the given value in some unspecified way, and produces a value in some unspecified way (that happens to be identical to the consumed value).

This hasn't seen some activity in a while. Has an acceptable default implementation for this been found? It looks like the latest "best implementation" would be a combination of the proposals by @eddyb, @Amanieu, and @m-ou-se

I made some modifications and created the following godbolt:

fn better_black_box<T>(mut x: T) -> T {
    unsafe {
        if mem::size_of::<T>() <= mem::size_of::<usize>() {
            let man_x = mem::ManuallyDrop::new(x);
            let mut int_x: usize = mem::transmute_copy(&man_x);
            asm!("/* {x} */", x = inout(reg) int_x, options(nostack));
            x = mem::transmute_copy(&int_x);
        } else {
            asm!("/* {x} */", x = in(reg) &mut x);
        }
    }
    x
}

It's a small modification from their code that makes it work for anything that'll fit in a usize (not sure if it's completely sound), and also fixes the destructor issue that @eddyb brought up, I think.

Has any consensus been come to on the name? If not, I'd like to humbly propose my own option: bench_identity. I think this name works well as it describes the actual behavior of the function (it behaves as an identity) and the purpose of the function (its use in benchmarking).

let mut int_x: usize = mem::transmute_copy(&man_x);

This line is unsound if x is smaller than usize, see the documentation for transmute_copy.

@Amanieu that's what I figured. Would we then need to do a separate branch for each integer size? Is there any way to get it to work for 3,5,6,7-byte types? I tried using a byte-array but you can't pass that directly to asm. Perhaps I could

  1. create a usize-sized zeroed byte-array
  2. transmute_copy man_x into an x-sized portion of it
  3. from_ne_bytes into a usize
  4. pass the usize through asm
  5. to_ne_bytes back into an array
  6. transmute_copy same x-sized portion of array back into x

I'll have to try that out. Would that be sound?

I'd just use the fallback path if it doesn't fit in a usize exactly.

Well as an exercise I created a version which works for any size up to 8. godbolt

For 3,5,6,7 it only saves 1 instruction, but it saves 5 stack operations. Not sure how much of a difference that makes.

Keep in mind that your code is a very artificial benchmark. In practice, the different components of a struct are likely to be in separate registers. In that case it is usually faster to store everything to memory than to do bit shuffling to fit everything in one register.

Oh absolutely. It would certainly require some more realistic benchmarks to confirm this as a viable optimization. It was just fun to implement as an exercise.

Was this page helpful?
0 / 5 - 0 ratings