Rust: Missed optimization: extra branch with slicing and integer overflow checks

Created on 24 Feb 2019  路  13Comments  路  Source: rust-lang/rust

Godbolt: https://rust.godbolt.org/z/PXY1lr

Basically the issue is that you compute s.position + n, check that it doesn't overflow, and then you verify that s.position + n is greater than s.position, which it of course always is, since you checked for an overflow! This is probably an LLVM optimization bug, but I'm starting here :-)

Command line arguments: -C opt-level=3 -C debug-assertions

Source:

pub struct S1<'a> {
    data: &'a [u8],
    position: usize,
}

pub fn f1<'a>(s: &'a mut S1, n: usize) -> &'a [u8] {
    let d = &s.data[s.position..s.position+n];
    s.position += n;
    return d;
}

ASM:

example::f1:
        push    rax
        mov     rax, qword ptr [rdi + 16]
        add     rsi, rax
        jb      .LBB0_4
        mov     rdx, rsi
        sub     rdx, rax
        jb      .LBB0_5
        mov     rcx, qword ptr [rdi + 8]
        cmp     rcx, rsi
        jb      .LBB0_6
        add     rax, qword ptr [rdi]
        mov     qword ptr [rdi + 16], rsi
        pop     rcx
        ret
.LBB0_4:
        lea     rdi, [rip + .L__unnamed_1]
        call    qword ptr [rip + _ZN4core9panicking5panic17h8f422ca74d186618E@GOTPCREL]
        ud2
.LBB0_5:
        mov     rdi, rax
        call    qword ptr [rip + _ZN4core5slice22slice_index_order_fail17h7ab7c9c6113ae4b8E@GOTPCREL]
        ud2
.LBB0_6:
        mov     rdi, rsi
        mov     rsi, rcx
        call    qword ptr [rip + _ZN4core5slice20slice_index_len_fail17h3a0f2c4c66447e54E@GOTPCREL]
        ud2

str.0:
        .ascii  "/tmp/compiler-explorer-compiler119124-62-359u5j.hrbs4/example.rs"

str.1:
        .ascii  "attempt to add with overflow"

.L__unnamed_1:
        .quad   str.1
        .quad   28
        .quad   str.0
        .quad   64
        .long   8
        .long   33
A-LLVM E-needs-test I-slow WG-codegen

Most helpful comment

Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the slice_index_order_fail check in this example.

All 13 comments

Slightly reduced testcase:

define i64 @test(i64* %p, i64 %n) nounwind {
  %x = load i64, i64* %p
  %y = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %x, i64 %n)
  %z = extractvalue { i64, i1 } %y, 1
  br i1 %z, label %bb3, label %bb1

bb1:
  %a = extractvalue { i64, i1 } %y, 0
  %b = icmp ult i64 %a, %x
  br i1 %b, label %bb3, label %bb2

bb2:
  store i64 %a, i64* %p
  ret i64 0

bb3:
  ret i64 1
}

declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)

LLVM does not realize that %z and %b are the same here.

Ok, looks like the thing to do is file an LLVM bug -- do you know off hand if they're ok with Rust reproducers, or do they prefer C ones?

Ok, ended up converting to C because that seemed easy: https://bugs.llvm.org/show_bug.cgi?id=40845

Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the slice_index_order_fail check in this example.

@nikic am I correct in reading that this has been waiting on a review of the LLVM patch?

@alex Needs a rebase. More generally though, I think this is fighting a losing battle. uadd.with.overflow and usub.with.overflow are not considered canonical IR for a reason: They optimize badly in the middle end.

I think we'd be better off with generating the canonical variants instead (which are a + b; (a + b) < a and a - b; a < b). LLVM will still form uadd.with.overflow/usub.with.overflow during CGP, but this should optimize better in the middle end (including avoiding the issue here).

As we already have a good benchmark for the overhead of overflow checks in #58475, I think it might be worth trying to rerun that without using the unsigned overflow intrinsics.

From the second LLVM bug above:

Fixed in https://github.com/llvm/llvm-project/commit/8db5143b1a1521915c842ebef23cb9fe8fe607ce.

I'm keeping this issue open, because the negated variation of this is not handled yet.

馃帀

馃帀 once this makes it's way into Rust's LLVM (is there a good way to track this?) I'll verify that the assembly in this bug was improved, and if so I'll rebase the rustc patch and we can get some new numbers!

Ok, confirmed that the recent LLVM10 update removes this branch: https://rust.godbolt.org/z/B38vDX

Shall this be closed?

Actually, that was probably premature. We should add a codegen test.

@Mark-Simulacrum is there a good example of what a test looks like? I can spin one up pretty easily if someone can point me at how to add one.

@alex Codegen tests are located in src/test/codegen. One test that is thematically related: slice-position-bounds-check.rs. They use LLVM FileCheck to verify generated LLVM IR.

Codegen tests can be run with:

./x.py test --stage 1 src/test/codegen

Thanks much, that's exactly what I needed! PR is up now.

Was this page helpful?
0 / 5 - 0 ratings