Rust: Bad codegen with explicitly nonzero trailing/leading zeros

Created on 14 Oct 2018 · 13Comments · Source: rust-lang/rust

The compiler should be able to remove intrinsics::cttz's test when the input is explicitly nonzero.

https://play.rust-lang.org/?version=nightly&mode=release&edition=2015&gist=3d298b145a80d2a7584048de73ffc94c

if x == 0 { 0 } else { cttz(x) }

The cttz version produces three test instructions while cttz_nonzero produces one.

The LLVM IR show that cttz/trailing_zeros use hardcoded is_zero_undef values.
call i32 @llvm.cttz.i32(i32 %x, i1 false)

(The same is true of leading_zeros/trailing_zeros/ctlz/ctlz_nonzero)

A-LLVM I-slow

Source

TheIronBorn

Most helpful comment

FWIW started working on this yesterday https://reviews.llvm.org/D60846

dlrobertson on 18 Apr 2019

❤3

All 13 comments

LLVM patch to fix this: https://reviews.llvm.org/D55786

nikic on 18 Dec 2018

🎉1

Verified fixed by LLVM upgrade.

nikic on 27 Jan 2019

This is not yet fixed (or regressed?) for u64: https://rust.godbolt.org/z/qZ9E0u

pub unsafe fn fn_cttz(x: u64) -> u32 {
    if x == 0 { 0 } else { cttz(x) as u32 }
}

Still leads to:

example::fn_cttz:
        test    rdi, rdi
        je      .LBB1_1
        je      .LBB1_3
        bsf     rax, rdi
        ret
.LBB1_1:
        xor     eax, eax
        ret
.LBB1_3:
        mov     eax, 64
        ret

Can we reopen this?

LukasKalbertodt on 14 Apr 2019

In your example the additional truncation prevents conversion into a select, leaving an explicit branch on zero. It looks like isKnownNonZero() only checks dominating conditions for pointer types, it does not recognize that the cttz argument cannot be zero.

For now, you can work around this by returning a u64 instead of a u32.

nikic on 14 Apr 2019

The problem is that u64::trailing_zeros returns a u32. And that's the only stable interface to this feature right now :/

LukasKalbertodt on 14 Apr 2019

@LukasKalbertodt Using x.trailing_zeros() as u64 and returning that should do the trick.

nikic on 14 Apr 2019

IR test case:

declare i64 @llvm.cttz.i64(i64, i1)

define i32 @test(i64 %x) {
start:
  %c = icmp eq i64 %x, 0
  br i1 %c, label %exit, label %non_zero

non_zero:
  %ctz = call i64 @llvm.cttz.i64(i64 %x, i1 false)
  %ctz32 = trunc i64 %ctz to i32
  br label %exit

exit:
  %res = phi i32 [ %ctz32, %non_zero ], [ 0, %start ]
  ret i32 %res
}

If someone is interested in tackling this issue:

The existing fold for handling a known non-zero argument to cttz is here: https://github.com/llvm-mirror/llvm/blob/581b638b1686c57586b44424d1a1fb0649315ce5/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1329-L1339

The isKnownNonZero() function calls isKnownNullFromDominatingCondition() here: https://github.com/llvm-mirror/llvm/blob/581b638b1686c57586b44424d1a1fb0649315ce5/lib/Analysis/ValueTracking.cpp#L2069 This function is capable of determining that %x is non-zero in the above case. However, it is currently only used for pointer types.

Fixing this is in principle as easy as moving this check out of the isPointerTy() branch. However, there is a tricky compile-time vs optimization quality tradeoff here, because this is a potentially expensive operation. It might be that reviewers will prefer to handle this in some other way.

Edit: If this approach doesn't pan out, then an alternative would be to add this to the CorrelatedValuePropagation pass, by checking if the ConstantRange of the argument doesn't contain zero.

nikic on 14 Apr 2019

isKnownNonNullFromDominatingCondition does contain some code that checks that the argument is used with a nonnull attribute. I think we could split out the section that checks for icmp eq <ty> <val>, 0 which is also then used in isKnownNonNullFromDominatingCondition. Then we could use something like.

if (V->getType()->isPointerTy()) {
  ...
} else {
  newFunction(...)
}

dlrobertson on 17 Apr 2019

FWIW started working on this yesterday https://reviews.llvm.org/D60846

dlrobertson on 18 Apr 2019

❤3

This has now landed as https://reviews.llvm.org/D69571.

nikic on 31 Oct 2019

🎉1

seems like this is not fixed.
see https://rust.godbolt.org/z/D_z3TX

andjo403 on 23 Mar 2020

@andjo403 It's fixed in LLVM 10, while Rust is still on LLVM 9.

nikic on 23 Mar 2020

👍2

think this can be closed now see https://rust.godbolt.org/z/D_z3TX where all versions generate the same instructions

andjo403 on 11 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Comments for macro expansions

Robbepop · 3Comments

detect when "unconstrained type parameters" could be provided explicitly to a fn call

nikomatsakis · 3Comments

core::marker::Send and raw pointers

drewcrawford · 3Comments

decl_macro: imports don't work in cross-crate invocations

tikue · 3Comments

#![allow(unknown_lints)] does not work if there is a unknow lint at the same level since nightly-2017-08-11-x86_64-unknown-linux-gnu

overvenus · 3Comments