Rust: regression: unit test `coretest::num::test_i32f64` fails on some targets

Created on 16 Sep 2016  路  10Comments  路  Source: rust-lang/rust

Found using smoke.

Affected targets:

  • arm-unknown-linux-gnueabi
  • mips-unknown-linux-musl
  • mipsel-unknown-linux-musl

STR

# NOTE assumes transparent QEMU emulation

$ rustup component add rust-src

$ rustc \
    --crate-name coretest \
    --target $TARGET \
    --test \
    $(rustc --print sysroot)/lib/rustlib/src/rust/src/libcoretest/lib.rs

$ ./coretest

Output

$ ./coretest

(..)

failures:

---- num::test_i32f64 stdout ----

    thread 'num::test_i32f64' panicked at 'assertion failed: `(left == right)` (left: `-536870912`, right: `-2147483648`)', /home/travis/.multirust/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcoretest/num/mod.rs:162

failures:

    num::test_i32f64

test result: FAILED. 591 passed; 1 failed; 2 ignored; 0 measured

Comments

  • This test passes if the crate is compiled with optimizations (-O) enabled.
  • This test passes if compiled with the nightly from 2016-09-13.
  • Given that this test passed two days ago and the list of affected targets, IMO the prime suspect
    is the recent change related to compiler-rt intrinsics: #35021. Because this test is likely to
    involve some compiler-rt intrinsic for the affected targets (no FPU).
  • Disclaimer: This test was executed under QEMU. There is a chance this is actually a QEMU
    bug/failure.

    Meta

$ rustc -V
rustc 1.13.0-nightly (6ffdda1ba 2016-09-14)

cc @alexcrichton @brson

A-codegen P-high T-compiler T-libs regression-from-stable-to-beta

Most helpful comment

After spending much time to establish a test environment, I can reproduce the issue and have found at least one bug in the compiler-rt software floating point routines.

In particular, this line:

https://github.com/rust-lang/compiler-rt/blob/master/lib/floatsidf.c#L35

to save you the trouble of following the link, its this:

    if (a < 0) {
        sign = signBit;
        a = -a;
    }

That does not properly handle the case where a is already the minimum (negative) value.

It needs to instead start by promoting a to the wider type before doing things like negation on it.

  • Update: see comments below. In particular, effort has been expended to handle this case, but I still think I'm seeing evidence of something being mishandled down the line.

(Incidentally, I found this bug relatively quickly because I decided to take the C code, once I found it, and port it to Rust before doing things like instrumenting it to inspect the intermediate results. And of course (ta da!) our arithmetic overflow checking catches this case.)

All 10 comments

Yay smoke!

Float ABI mismatch maybe? We recently switched MIPS to a different float ABI in rustc side, perhaps compiler-rt simply didn鈥檛 follow the suite?

triage: P-high

More details:

  • This test is about converting an i32 to f64. Targets without FPU lower this operation to the
    aeabi_i2d intrinsic on ARM or to the floatsidf one on other architectures. (Both intrinsics
    have the exact same C implementation, they just have different names on different architectures)
  • Before #35021 was merged, this test was using the libgcc_s.so implementation of this intrinsic
    (at least on the affected targets).
  • After #35024 was merged, this test started using the rust-lang/compiler-rt implementation of
    this intrinsic.

I'm inclined to think that the compiler-rt implementation of this particular intrinsic has some
bug.

@nagisa

Float ABI mismatch maybe?

It's possible. We may not be compiling compiler-rt with the right gcc flags or something like
that. But I would expect way more test failures if the cause of the problem was ABI mismatch. This
was the only test failure from running the unit tests of all the standard crates (see this list).
For example, the arm-unknown-linux-gnueabi relies on a bunch of intrinsics (both float and
non-float related) but only failed on this test and not on the other tests that also involve a
integer -> float conversion.

After spending much time to establish a test environment, I can reproduce the issue and have found at least one bug in the compiler-rt software floating point routines.

In particular, this line:

https://github.com/rust-lang/compiler-rt/blob/master/lib/floatsidf.c#L35

to save you the trouble of following the link, its this:

    if (a < 0) {
        sign = signBit;
        a = -a;
    }

That does not properly handle the case where a is already the minimum (negative) value.

It needs to instead start by promoting a to the wider type before doing things like negation on it.

  • Update: see comments below. In particular, effort has been expended to handle this case, but I still think I'm seeing evidence of something being mishandled down the line.

(Incidentally, I found this bug relatively quickly because I decided to take the C code, once I found it, and port it to Rust before doing things like instrumenting it to inspect the intermediate results. And of course (ta da!) our arithmetic overflow checking catches this case.)

our arithmetic overflow checking catches this case

Bravo! Overflow checks pay off. :clap:

and port it to Rust

If you still have your port around, we'd be happy to integrate it into rustc-builtins, where we are porting compiler-rt intrinsics to Rust. We'll integrate rustc-builtins into rust-lang/rust at some point the future (hopefully, soon).

@japaric yeah I'm still actively working on the port.

(That negation issue I raised above is only the first arithmetic overflow that was detected along the control flow; I'm now dissecting a second overflow to determine if it is a mistake in the port or another bug in the original code.)

(further update: comments in code further down indicate that they are aware of potential for INT_MIN to leak in, which means I was wrong up above; but still, seems like something is wrong with this code.)

Oh is negating INT_MIN is undefined behavior in C (for two's complement representation)? ... so even if the code down below tries to compensate for it, it is still a big bug to have that line there?

Seems like if I revise the code to convert the parameter to an unsigned int at the outset and use ~a + 1 for negation when necessary (the portable way to negate that "works" on INT_MIN), then the problem goes away.

Was this page helpful?
0 / 5 - 0 ratings