Rust: Missed optimization: extra branch with slicing and integer overflow checks

Created on 24 Feb 2019 · 13Comments · Source: rust-lang/rust

Godbolt: https://rust.godbolt.org/z/PXY1lr

Basically the issue is that you compute s.position + n, check that it doesn't overflow, and then you verify that s.position + n is greater than s.position, which it of course always is, since you checked for an overflow! This is probably an LLVM optimization bug, but I'm starting here :-)

Command line arguments: -C opt-level=3 -C debug-assertions

Source:

pub struct S1<'a> {
    data: &'a [u8],
    position: usize,
}

pub fn f1<'a>(s: &'a mut S1, n: usize) -> &'a [u8] {
    let d = &s.data[s.position..s.position+n];
    s.position += n;
    return d;
}

ASM:

example::f1:
        push    rax
        mov     rax, qword ptr [rdi + 16]
        add     rsi, rax
        jb      .LBB0_4
        mov     rdx, rsi
        sub     rdx, rax
        jb      .LBB0_5
        mov     rcx, qword ptr [rdi + 8]
        cmp     rcx, rsi
        jb      .LBB0_6
        add     rax, qword ptr [rdi]
        mov     qword ptr [rdi + 16], rsi
        pop     rcx
        ret
.LBB0_4:
        lea     rdi, [rip + .L__unnamed_1]
        call    qword ptr [rip + _ZN4core9panicking5panic17h8f422ca74d186618E@GOTPCREL]
        ud2
.LBB0_5:
        mov     rdi, rax
        call    qword ptr [rip + _ZN4core5slice22slice_index_order_fail17h7ab7c9c6113ae4b8E@GOTPCREL]
        ud2
.LBB0_6:
        mov     rdi, rsi
        mov     rsi, rcx
        call    qword ptr [rip + _ZN4core5slice20slice_index_len_fail17h3a0f2c4c66447e54E@GOTPCREL]
        ud2

str.0:
        .ascii  "/tmp/compiler-explorer-compiler119124-62-359u5j.hrbs4/example.rs"

str.1:
        .ascii  "attempt to add with overflow"

.L__unnamed_1:
        .quad   str.1
        .quad   28
        .quad   str.0
        .quad   64
        .long   8
        .long   33

A-LLVM E-needs-test I-slow WG-codegen

Source

alex

Most helpful comment

Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the slice_index_order_fail check in this example.

nikic on 24 Feb 2019

👍5 🚀1 ❤1

All 13 comments

Slightly reduced testcase:

define i64 @test(i64* %p, i64 %n) nounwind {
  %x = load i64, i64* %p
  %y = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %x, i64 %n)
  %z = extractvalue { i64, i1 } %y, 1
  br i1 %z, label %bb3, label %bb1

bb1:
  %a = extractvalue { i64, i1 } %y, 0
  %b = icmp ult i64 %a, %x
  br i1 %b, label %bb3, label %bb2

bb2:
  store i64 %a, i64* %p
  ret i64 0

bb3:
  ret i64 1
}

declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)

LLVM does not realize that %z and %b are the same here.

nikic on 24 Feb 2019

Ok, looks like the thing to do is file an LLVM bug -- do you know off hand if they're ok with Rust reproducers, or do they prefer C ones?

alex on 24 Feb 2019

Ok, ended up converting to C because that seemed easy: https://bugs.llvm.org/show_bug.cgi?id=40845

alex on 24 Feb 2019

👍2 ❤1

nikic on 24 Feb 2019

👍5 🚀1 ❤1

@nikic am I correct in reading that this has been waiting on a review of the LLVM patch?

alex on 3 Nov 2019

@alex Needs a rebase. More generally though, I think this is fighting a losing battle. uadd.with.overflow and usub.with.overflow are not considered canonical IR for a reason: They optimize badly in the middle end.

I think we'd be better off with generating the canonical variants instead (which are a + b; (a + b) < a and a - b; a < b). LLVM will still form uadd.with.overflow/usub.with.overflow during CGP, but this should optimize better in the middle end (including avoiding the issue here).

As we already have a good benchmark for the overhead of overflow checks in #58475, I think it might be worth trying to rerun that without using the unsigned overflow intrinsics.

nikic on 3 Nov 2019

From the second LLVM bug above:

Fixed in https://github.com/llvm/llvm-project/commit/8db5143b1a1521915c842ebef23cb9fe8fe607ce.

I'm keeping this issue open, because the negated variation of this is not handled yet.

🎉

comex on 12 Dec 2019

🎉 once this makes it's way into Rust's LLVM (is there a good way to track this?) I'll verify that the assembly in this bug was improved, and if so I'll rebase the rustc patch and we can get some new numbers!

alex on 12 Dec 2019

Ok, confirmed that the recent LLVM10 update removes this branch: https://rust.godbolt.org/z/B38vDX

Shall this be closed?

alex on 22 May 2020

Actually, that was probably premature. We should add a codegen test.