Rust: powerpc-unknown-linux-gnu fails to compile simple crates

Created on 22 May 2018  路  115Comments  路  Source: rust-lang/rust

Example:

Compiling rand v0.4.2
Expected no forward declarations!
!262 = <temporary!> !{}
scope points into the type hierarchy
!327 = !DILocation(line: 1, scope: !255)
scope points into the type hierarchy
!329 = distinct !DILexicalBlock(scope: !255, file: !256, line: 388, column: 8)
scope points into the type hierarchy
!346 = !DILocation(line: 388, scope: !255)
scope points into the type hierarchy
!351 = !DILocation(line: 404, scope: !255)
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `rand`.
  nightly-powerpc-unknown-linux-gnu unchanged - rustc 1.28.0-nightly (cb20f68d0 2018-05-21)
  beta-powerpc-unknown-linux-gnu installed - rustc 1.27.0-beta.5 (84b5a46f8 2018-05-15)

Additionally I cannot install stable on powerpc and powerpc64le due rust-doc being not available.

A-LLVM C-bug O-PowerPC P-medium T-compiler regression-from-stable-to-stable

Most helpful comment

I have downloaded nightly builds now to perform manual bisecting.

Here's my result:

2017-12-20 - bad
2017-01-02 - bad
2017-12-01 - good
2017-12-10 - bad
2017-12-05 - good
2017-12-07 - good
2017-12-09 - bad - ad3543db3
2017-12-08 - good - c8ddf2852

So, the bug was introduced between c8ddf2852 and ad3543db3. Bisecting now.

All 115 comments

@lu-zero think you can possibly bisect to find the source of the problem? The cargo-bisect-rustc tool may help:

https://github.com/rust-lang-nursery/cargo-bisect-rustc

rustc-1.24.0 seems working. Now I can build cargo-bisect-rustc and try.

If in the mean time you could unbreak 1.26.0 would be great btw :)

I used cargo-bisect-rustc as testcase.

nightly-2018-01-13 managed to build most of its dependencies and fail at num_cpus.

Expected no forward declarations!
!382 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `num_cpus`.

rustc-1.25.0 seems to work fine as well.

I believe we're waiting on a bisection here to a nightly range or a specific PR. Given that we're T-3 weeks from release if that doesn't happen soon we'll want the bisection soon.

I gave @nikomatsakis access to the instance since I was unable to get a result less coarse than what I posted before.

Now rust-1.26.1 installs and fails as well:

Expected no forward declarations!
!335 = <temporary!> !{}
scope points into the type hierarchy
!340 = !DILocation(line: 1, scope: !332)
scope points into the type hierarchy
!345 = distinct !DILexicalBlock(scope: !332, file: !24, line: 1128, column: 8)
scope points into the type hierarchy
!406 = !DILocation(line: 1128, scope: !332)
scope points into the type hierarchy
!412 = !DILocation(line: 1191, scope: !332)
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `version_check`.

I'm assigning @nikomatsakis and nominating for compiler team. 1.26.1 did not contain any fixes for this issue so it's unsurprising that it continues to not work.

To be clear, I do have access to the machine, but I also have very little time at the moment to really take advantage of that and do investigation =) (powerpc is not a Tier 1 platform, so it's not clear how much time to allocate here.) That said, I would like it to work!

@lu-zero

nightly-2018-01-13 managed to build most of its dependencies and fail at num_cpus.

are you saying that the nightly builds prior to this worked great?

No, no nightly picked up by the bisect program worked. All failed in different ways.

@lu-zero could you share access to your instance with me as well? I want to try to help with the bisection if I can.

Sure

manually bisecting since that's how I roll.

Did a manual trackback from the stable version 1.25 to an approximate nightly that was the basis of that stable.

rustup default nightly-2018-02-11 works. (Version info: "rustc 1.25.0-nightly (45fba43b3 2018-02-10)", for those that prefer that.

So now that I have that starting point, I'll bisect.

Bisecting indicates that nightly-2018-02-25 may be injection point.

I've been using a fresh cargo project that just adds rand = "0.4.2" as a dependency.

pnkfelix@videolan-ubuntu-be1:~/Mozilla/issue-50960$ cat Cargo.toml
[package]
name = "issue-50960"
version = "0.1.0"
authors = ["pnkfelix"]

[dependencies]
rand = "0.4.2"

pnkfelix@videolan-ubuntu-be1:~/Mozilla/issue-50960$ rustup default nightly-2018-02-24 && cargo clean && cargo build
info: using existing install for 'nightly-2018-02-24-powerpc-unknown-linux-gnu'
info: default toolchain set to 'nightly-2018-02-24-powerpc-unknown-linux-gnu'

  nightly-2018-02-24-powerpc-unknown-linux-gnu unchanged - rustc 1.26.0-nightly (063deba92 2018-02-23)

   Compiling libc v0.2.41
   Compiling rand v0.4.2
   Compiling issue-50960 v0.1.0 (file:///home/pnkfelix/Mozilla/issue-50960)
    Finished dev [unoptimized + debuginfo] target(s) in 6.6 secs
pnkfelix@videolan-ubuntu-be1:~/Mozilla/issue-50960$ rustup default nightly-2018-02-25 && cargo clean && cargo build
info: using existing install for 'nightly-2018-02-25-powerpc-unknown-linux-gnu'
info: default toolchain set to 'nightly-2018-02-25-powerpc-unknown-linux-gnu'

  nightly-2018-02-25-powerpc-unknown-linux-gnu unchanged - rustc 1.26.0-nightly (28a1e4ffe 2018-02-24)

   Compiling libc v0.2.41
   Compiling rand v0.4.2
Expected no forward declarations!
!189 = <temporary!> !{}
scope points into the type hierarchy
!191 = !DILocation(line: 1, scope: !186)
scope points into the type hierarchy
!193 = !DILocation(line: 464, scope: !186)
scope points into the type hierarchy
!194 = !DILocation(line: 465, scope: !186)
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `rand`.

To learn more, run the command again with --verbose.

My usual next step is to bisect in the range between the two commits given above (which will require manually building because our per-commit storage does not keep artifacts from so long ago I believe).

There are 93 commits in the range 063deba92~..28a1e4ffe (inclusive), but only four bors commits:

fklock-Oenone:rust.git fklock$ git log --format=oneline --author=bors 063deba92~..28a1e4ffe
28a1e4ffefa2620ad9f4179ea339833448874fd3 Auto merge of #48510 - Manishearth:rollup, r=Manishearth
6070d3e47e5e9f15575a3bd33583358b52bc6eda Auto merge of #48476 - Manishearth:rollup, r=Manishearth
b0a8620ed639d5085d7e1cca3626681a6e4e328e Auto merge of #48487 - Mark-Simulacrum:appveyor-split, r=Mark-Simulacrum
063deba92e44809125a433ca6e6c1ad0993313bf Auto merge of #47799 - topecongiro:fix-span-of-visibility, r=petrochenkov

Manually skimming over the titles of the rollups did not reveal any smoking guns to me, unfortunately. (I am still planning to move forward with bisection over the commit series.)

Even more unfortunately, I seem to be having problems actually doing a build of rustc on the machine to which @lu-zero granted me access... (And I can't even figure out what exactly is going wrong; at first I was running out of space, because I had a bunch of artifacts from the above bisection lying around in ~/.rustup... but now I should have plenty of space, yet x.py is just failing very early in the build invocation:

running: /home/pnkfelix/Mozilla/rust.git/build/powerpc64-unknown-linux-gnu/stage0/bin/cargo build --manifest-path /home/pnkfelix/Mozilla/rust.git/src/bootstrap/Cargo.toml --verbose
Traceback (most recent call last):
  File "./x.py", line 20, in <module>
    bootstrap.main()
  File "/home/pnkfelix/Mozilla/rust.git/src/bootstrap/bootstrap.py", line 763, in main
    bootstrap()
  File "/home/pnkfelix/Mozilla/rust.git/src/bootstrap/bootstrap.py", line 743, in bootstrap
    build.build_bootstrap()
  File "/home/pnkfelix/Mozilla/rust.git/src/bootstrap/bootstrap.py", line 621, in build_bootstrap
    run(args, env=env, verbose=self.verbose)
  File "/home/pnkfelix/Mozilla/rust.git/src/bootstrap/bootstrap.py", line 143, in run
    ret = subprocess.Popen(args, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

I asked for more space, hopefully it will be available soon

@lu-zero wait, is your theory that I did run out of space?

Oh, maybe I ran out of space in the tmpfs ... (no, I think I inverted my reading the df output; 0% is good, not bad...)

I built many copies of rustc while developing the fist altivec support on the LE machine that is equivalent and I had to ask for more space as well. I did not check on the BE host yet though.

I have observed this particular issue on Debian powerpc as well but I haven't found time yet to bisect.

If anyone needs access to a very fast powerpc porterbox, let me know. I can create accounts for anyone interested working on this issue.

The system now has plenty of space btw :)

Ping @pnkfelix, we're getting close to the 1.27.0 release.

Oh, yeah. That should definitely slip into the 1.27.0 release.

Edit: Never mind, confused this with a different issue. I assume this one will take a bit longer.

After explicitly setting build = "powerpc-unknown-linux-gnu" in the config.toml for my rust build, I have managed to make a local build of rust.

But I haven't had a chance to look into more fine-grained bisection of the problem yet (or even verify that I can replicate the problem via my local build).

Okay, in a local build, I actually see very similar messages even when "just" compiling libc. Investigating.

(Yikes, the messages are coming from a rustc build based off of 063deba92e44809125a433ca6e6c1ad0993313bf which I had previously believed to be the "reliable baseline" to compare against...)

Ping @pnkfelix, 1.27.0 will be released in a bit more than a week.

I have tried bi-secting the problem but due to the strong non-linearity of the git history, I haven't managed to find the offending commit yet.

Anyone got any tips on how to bisect the issue?

@glaubitz you should only bisect merge commits by @bors

@pietroalbini Thanks for the hint, I'll give it a try.

I just tried bisecting using bisect-rust but that didn't work:

buildd@kapitsa:/srv/tmp/rust$ bisect --end 84203cac67e65ca8640b8392348411098c856985 --start d3ae9a9e08edf12de0ed82af57ba2a56c26496ea --test /srv/tmp/test.sh
Error: Expected author Brian Anderson to be bors for beb9a0dfc52ebda4f8db4e5d439e08e4f3a43a39
buildd@kapitsa:/srv/tmp/rust$

This won't make into 1.27.

@pietroalbini Did you manage to bisect the problem? I'm still struggling to find the merge conflict which broke it. I'm manually bisecting now, using only bors commits. But it takes forever.

@glaubitz I'm currently busy with exams, and I don't have access to a powerpc machine. I'll try working something out in the next few weeks.

@pietroalbini Ok. I was just wondering whether you you have an idea on how to tackle the bisecting. And I have a fast PowerPC (POWER7) machine in case you need one.

triage: P-medium

Discussed in the @rust-lang/compiler triage meeting. Obviously there hasn't been any progress since last week. @Mark-Simulacrum is going to give some directions for how to speed up the bisection. In the meantime, we're going to lower the priority here to P-medium since powerpc is not a fully supported architecture.

@glaubitz You can use https://crates.io/crates/rustup-toolchain-install-master (probably cross compiling it, I guess) to install rustc versions. You might be able to use https://github.com/rust-lang-nursery/cargo-bisect-rustc as well.

Let me know if I can help out more.

@Mark-Simulacrum Yes, I am cross-building rustc on amd64 before copying it over to my POWER machine to test-build it. However, that process is still very time-consuming and since bisecting doesn't work at all, the manual search has been going on for days. I still haven't managed to find a commit after the 1.24.0 tag which works :(.

I wonder if https://github.com/rust-lang/rust/pull/48782 is the culprit :-/

I wonder if #48782 is the culprit :-/

I just tried 2789b067da2ac921b86199bde21dd231ace1da39 which is the commit before that and the problem is still there.

It must be between 1.24 and 1.25 as it broke with 1.25.

@pnkfelix

I have tested these four commits that you mentioned:

28a1e4f Auto merge of #48510 - Manishearth:rollup, r=Manishearth
6070d3e Auto merge of #48476 - Manishearth:rollup, r=Manishearth
b0a8620 Auto merge of #48487 - Mark-Simulacrum:appveyor-split, r=Mark-Simulacrum
063deba Auto merge of #47799 - topecongiro:fix-span-of-visibility, r=petrochenkov

They all build fine on my Debian PowerPC box with Rust 1.24.1.

And 45fba43b3d5b4d1944268cf973099bfacb11bf4c is definitely broken for me.

Edit: My test was incorrect, 45fba43b3d5b4d1944268cf973099bfacb11bf4c works. More digging.

Correction, that commit is broken. I made the mistake that I didn't test the stage2 compiler. Getting closer now.

I have downloaded nightly builds now to perform manual bisecting.

Here's my result:

2017-12-20 - bad
2017-01-02 - bad
2017-12-01 - good
2017-12-10 - bad
2017-12-05 - good
2017-12-07 - good
2017-12-09 - bad - ad3543db3
2017-12-08 - good - c8ddf2852

So, the bug was introduced between c8ddf2852 and ad3543db3. Bisecting now.

Ok, I found the culprit. The last good commit is: 5f4b09ee480aab38e466700563e2a6276f9a73e7.

After that, we have this series:

commit 539e1717728f7a5ed0b5ed9bad4ab7260117e600 (HEAD)
Author: Michael Woerister <michaelwoerister@posteo>
Date:   Fri Dec 8 10:17:17 2017 +0100

    incr.comp.: Fix merge fallout.

commit f5bd1ca6786dd2e375353b5f031f77eb21727efb
Author: Michael Woerister <michaelwoerister@posteo>
Date:   Thu Dec 7 12:31:40 2017 +0100

    incr.comp.: Make Span decoding more consistent so it doesn't mess up -Zincremental-verify-ich

commit 1c0e611dff9090b06f69a177721e01d24edc82b7
Author: Michael Woerister <michaelwoerister@posteo>
Date:   Thu Dec 7 12:29:53 2017 +0100

    Remove some svh-tests from run-pass.

    These were already broken for debug builds.

commit c5dd9f5301d3eb1ff09c97ea611c7bee41971f78
Author: Michael Woerister <michaelwoerister@posteo>
Date:   Mon Dec 4 12:47:16 2017 +0100

    incr.comp.: Hash spans unconditionally for full accuracy.

commit 829a349739cd8798db007fcad223752f17b86613
Author: Michael Woerister <michaelwoerister@posteo>
Date:   Mon Dec 4 20:08:25 2017 +0100

    incr.comp: Cache results of more queries.

The first four (in chronological order) commits don't build at all, unfortunately. The last commit, 539e1717728f7a5ed0b5ed9bad4ab7260117e600, builds and shows the regression.

So, the issue was introduced with this changeset.

CC @michaelwoerister

Interesting. Some questions:

  • Can you bootstrap the compiler on that platform? Or is it cross-compiled from somewhere else?
  • Is this with incremental compilation enabled?
  • Does it occur on the very first build of a crate or only when re-compiling it?
  • Are you testing with a compiler that has debug-assertions enabled?
  • Is it always crashing in LLVM? Or sometimes earlier in the compiler?
  • Does it crash when compiling without debuginfo?

Can you bootstrap the compiler on that platform? Or is it cross-compiled from somewhere else?

I can bootstrap it natively if I use a working stage0 compiler that is older than the 2017-12-09 nightly, 2017-12-09 or newer are broken. For my tests, I was cross-compiling from x86_64 and then testing natively on powerpc.

Is this with incremental compilation enabled?

I'm using the default setting. I tried disabling incremental compilation with CARGO_INCREMENTAL=0 but that didn't have any effect. Is there any other way to disable incremental compilation? FWIW, on sparc64, we patched cargo to never use incremental compilation as it suffers from #49773 there.

Does it occur on the very first build of a crate or only when re-compiling it?

It occurs on the first build when trying to build rustc with the cross-compiled rustc, for example.

Are you testing with a compiler that has debug-assertions enabled?

Not at the moment. I initially downloaded the snapshots from https://static.rust-lang.org/dist/index.html to find which nightly introduced the regression and it was 2017-12-09 which showed the bug first and no longer allowed building the compiler natively using the downloaded binaries.

Then I just cross-built various commits from x86_64 with just --host=powerpc-unknown-linux-gnu, no other options provided. I partially did the bisecting manually, because the commits above were unbuildable and had to be skipped.

Is it always crashing in LLVM? Or sometimes earlier in the compiler?

The error always looks like this:

buildd@kapitsa:/srv/debian/rust$ rm -rf build/* && ./configure --host=powerpc-unknown-linux-gnu --build=powerpc-unknown-linux-gnu --target=powerpc-unknown-linux-gnu --enable-local-rust --local-rust-root=/srv/debian/stage2/ && ./x.py build
configure: processing command line
configure: 
configure: build.rustc          := /usr/bin/rustc
configure: build.cargo          := /usr/bin/cargo
configure: build.host           := ['powerpc-unknown-linux-gnu']
configure: build.build          := powerpc-unknown-linux-gnu
configure: build.rustc          := /srv/debian/stage2//bin/rustc
configure: build.cargo          := /srv/debian/stage2//bin/cargo
configure: build.target         := ['powerpc-unknown-linux-gnu']
configure: build.configure-args := ['--host=powerpc-unknown-linux-gnu', '--build= ...
configure: 
configure: writing `config.toml` in current directory
configure: 
configure: run `python /srv/debian/rust/x.py --help`
configure: 
Updating submodules
   Compiling itoa v0.3.4
   Compiling serde v1.0.27
   Compiling cc v1.0.4
   Compiling libc v0.2.39
   Compiling getopts v0.2.15
   Compiling quote v0.3.15
   Compiling lazy_static v0.2.11
   Compiling cfg-if v0.1.2
   Compiling dtoa v0.4.2
   Compiling unicode-xid v0.0.4
   Compiling num-traits v0.1.41
   Compiling synom v0.11.3
Expected no forward declarations!
!46 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `quote`.
warning: build failed, waiting for other jobs to finish...
Expected no forward declarations!
!14 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `synom`.
warning: build failed, waiting for other jobs to finish...
Expected no forward declarations!
!52 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `getopts`.
warning: build failed, waiting for other jobs to finish...
Expected no forward declarations!
!23 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `num-traits`.
warning: build failed, waiting for other jobs to finish...
Expected no forward declarations!
!57 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `cc`.
warning: build failed, waiting for other jobs to finish...
Expected no forward declarations!
!62 = <temporary!> !{}
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `serde`.

To learn more, run the command again with --verbose.
failed to run: /srv/debian/stage2//bin/cargo build --manifest-path /srv/debian/rust/src/bootstrap/Cargo.toml
Build completed unsuccessfully in 0:00:18
buildd@kapitsa:/srv/debian/rust$

Does it crash when compiling without debuginfo?

Haven't tried that yet. Do you mean that the cross-compiled compiler should be built with debuginfo or should I enable/disable debuginfo when testing the cross-built compiler?

OK, it looks like the error occurs during a bootstrap, which by default should:

  • not use incremental compilation, and
  • not generate debuginfo.

Building the compiler under test with both LLVM assertions and debug-assertions would be a good next step. You can do that by passing --enable-debug-assertions and --enable-llvm-assertions to configure. Adding --enable-debuginfo-lines is also a good idea. The debuginfo-lines will option will give you more readable backtraces when running under RUST_BACKTRACE=1.

Haven't tried that yet. Do you mean that the cross-compiled compiler should be built with debuginfo or should I enable/disable debuginfo when testing the cross-built compiler?

Building the former with debuginfo-lines = true is always a good idea but what I meant was the latter: Test whether the compiler crashes if it produces a program that does not contain debuginfo. It looks like it's already disabled though.

I think the only commit from that range that could influence non-incremental builds is https://github.com/rust-lang/rust/commit/f5bd1ca6786dd2e375353b5f031f77eb21727efb. Maybe we are not getting endianess right during metadata encoding/decoding. It's still weird that that should result in the observed LLVM errors though.

Have there been LLVM upgrades around the same time?

Building the compiler under test with both LLVM assertions and debug-assertions would be a good next step. You can do that by passing --enable-debug-assertions and --enable-llvm-assertions to configure. Adding --enable-debuginfo-lines is also a good idea. The debuginfo-lines will option will give you more readable backtraces when running under RUST_BACKTRACE=1.

I will do this now.

Building the former with debuginfo-lines = true is always a good idea but what I meant was the latter: Test whether the compiler crashes if it produces a program that does not contain debuginfo. It looks like it's already disabled though.

Ok.

Have there been LLVM upgrades around the same time?

I think the upgrade to LLVM 6 happened later.

Here's some more debugging output. Looks like a failure in LLVM:

For more information on this warning you can consult
https://github.com/rust-lang/cargo/issues/5330
   Compiling unicode-xid v0.1.0
   Compiling itoa v0.4.1
   Compiling serde v1.0.40
   Compiling cfg-if v0.1.2
   Compiling dtoa v0.4.2
   Compiling ordermap v0.3.5
   Compiling cc v1.0.10
   Compiling libc v0.2.40
   Compiling num-traits v0.2.2
   Compiling fixedbitset v0.1.9
   Compiling lazy_static v0.2.11
   Compiling build_helper v0.1.0 (file:///srv/debian/rust/src/build_helper)
   Compiling getopts v0.2.17
   Compiling proc-macro2 v0.3.6
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `fixedbitset`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `build_helper`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `getopts`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `ordermap`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `cc`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `libc`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `proc-macro2`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `num-traits`.
warning: build failed, waiting for other jobs to finish...
rustc: /local_scratch/glaubitz/rust/rust/src/llvm/lib/IR/Metadata.cpp:635: void llvm::MDNode::resolveCycles(): Assertion `!N->isTemporary() && "Expected all forward declarations to be resolved"' failed.
error: Could not compile `serde`.

To learn more, run the command again with --verbose.
failed to run: /srv/debian/stage2//bin/cargo build --manifest-path /srv/debian/rust/src/bootstrap/Cargo.toml
Build completed unsuccessfully in 0:00:21

Yes, that's an error in LLVM.

If you revert the changes from https://github.com/rust-lang/rust/commit/f5bd1ca6786dd2e375353b5f031f77eb21727efb, does it still break? That's the only thing that affects non-incremental builds here.

I have a similar problem on building rustc on PowerPC (try to bootstrap from rustup)

For example the failed command can be:

/root/.cargo/bin/rustc 
--crate-name num_traits /root/.cargo/registry/src/github.com-eae4ba8cbf2ce1c7/num-traits-0.2.2/src/lib.rs 
--crate-type lib 
--emit=dep-info,link 
-C debug-assertions=off 
-C overflow-checks=on 
-C metadata=b5fb8610320c145f 
-C extra-filename=-b5fb8610320c145f 
--out-dir /var/cache/acbs/build/acbs.vr6az_qf/rustc-1.27.2-src/build/bootstrap/debug/deps 
-L dependency=/var/cache/acbs/build/acbs.vr6az_qf/rustc-1.27.2-src/build/bootstrap/debug/deps 
--cap-lints allow 
-Cdebuginfo=2

I ran this manually, it returned 1 with error:

Expected no forward declarations!                                                      
!47 = !DILocation(line: 77, scope: !43)                                                
LLVM ERROR: Broken function found, compilation aborted!

Interestingly, the issue disappeared if I remove -Cdebuginfo=2 or specify -Cdebuginfo=0.

I can reproduce this by even compiling an empty program.

root@stable32_210001 [ test ] # cat empty_prog.rs
fn main() {}
root@stable32_210001 [ test ] # ls
empty_prog.rs
root@stable32_210001 [ test ] # ~/.cargo/bin/rustc -C debuginfo=2 empty_prog.rs
Expected no forward declarations!
!9 = <temporary!> !{}
scope points into the type hierarchy
!10 = !DILocation(line: 1, scope: !5)
LLVM ERROR: Broken function found, compilation aborted!
root@stable32_210001 [ test ] ! ls
empty_prog.crate.allocator.rcgu.o  empty_prog.empty_prog1.rcgu.o  empty_prog.rs
empty_prog.crate.metadata.rcgu.o   empty_prog.empty_prog3.rcgu.o
empty_prog.empty_prog0.rcgu.o      empty_prog.empty_prog4.rcgu.o
root@stable32_210001 [ test ] ! tar caf ../empty_prog.result.tar.xz *
root@stable32_210001 [ test ] # rm empty_prog.c* empty_prog.empty_prog*
root@stable32_210001 [ test ] # ls
empty_prog.rs
root@stable32_210001 [ test ] # ~/.cargo/bin/rustc empty_prog.rs
root@stable32_210001 [ test ] # ls
empty_prog  empty_prog.rs
root@stable32_210001 [ test ] #

empty_prog.result.tar.xz uploaded here: empty_prog.zip

If you revert the changes from f5bd1ca, does it still break? That's the only thing that affects non-incremental builds here.

Testing this now. Checking out 539e1717728f7a5ed0b5ed9bad4ab7260117e600 and then just reverting f5bd1ca6786dd2e375353b5f031f77eb21727efb.

Interestingly, the issue disappeared if I remove -Cdebuginfo=2 or specify -Cdebuginfo=0.

Oh, this is interesting. So far, I have been unable to get any newer version of Rust working on powerpc.

Ok, two new observations. Reverting f5bd1ca6786dd2e375353b5f031f77eb21727efb does not help. However, passing --disable-debuginfo makes it go away.

After debugging a whole day, now I understand what happened.

I wrote this patch: https://pastebin.aosc.io/paste/6XG51ItmP9dHtHT30-m7hA to fixed this issue. But you can see, something deeper is wrong.

Oh wow, that's some find, @LionNatsu!
Thanks a lot for testing the changes, @glaubitz!

So it looks like rustc/LLVM is generating a wrong address for the isDefinition parameter in llvm::LLVMRustDIBuilderCreateFunction(). Maybe the Rust bool is different from the C bool on this platform. It would make sense because isDefinition is the first parameter after a bool parameter. Most of the regular bindings in librustc_codegen_llvm/llvm/ffi.rs use Bool which is an alias for c_uint. We should do the same everywhere.

cc @nagisa, who's been known to use PPC at times.

@michaelwoerister Thanks for pinging me, I鈥檓 already subscribed to this thread :)

C鈥檚 _Bool/C++鈥檚 bool being different from Rust鈥檚 bool would be very, very bad, as we fairly recently said that Rust鈥檚 bool should be usable in an FFI context. I can鈥檛 really dig the promise out right now, will do that later today.

C++鈥檚 bool has a size of 1 byte and ought to be passed this way (LLVM-IR):

zeroext i1 @foo(i1 zeroext) #0

So that is not a problem. I checked whether ABI matches for such function (which is essentially a untypedef鈥檇 LLVMRustDIBuilderCreateFunction, with last few arguments chopped off):

extern "C" void * foo(void*b, void*s, void*n, void*ln, void*f, unsigned n2, void*t, bool l, bool def, unsigned sl){
    return 0;
}

And it does:

; c++
define i8* @foo(i8*, i8*, i8*, i8*, i8*, i32, i8*, i1 zeroext, i1 zeroext, i32) #0

; Rust
declare i8* @foo(i8*, i8*, i8*, i8*, i8*, i32, i8*, i1 zeroext, i1 zeroext, i32) unnamed_addr #1

My suspicion here falls solely on LLVM generating bad code.

That being said, using Bool = c_uint instead of the bool would probably hide the problem.

FWIW, we reviewed the bool ABI #46176, and in https://github.com/rust-lang/rust/pull/46176#issuecomment-358868675 I reported that powerpc darwin does have an unusual 4-byte bool. On Linux it's still just a byte though.

# <+96>: lbz r19,139(r1)
to
# <+96>: lbz r19,136(r1)

So the function was reading the word's least-significant byte (as this is big-endian), and you're patching it to read the most-significant byte instead. Does this imply that the caller did not properly extend the argument to 32-bit? AIUI that's supposed to happen here:

https://github.com/rust-lang/rust/blob/bf1e461173e3936e4014cc951dfbdd7d9ec9190b/src/librustc_target/abi/call/powerpc.rs#L41

That being said, using Bool = c_uint instead of the bool would probably hide the problem.

I would like to test this. Where exactly is this in the Rust code?

So the function was reading the word's least-significant byte (as this is big-endian), and you're patching it to read the most-significant byte instead. Does this imply that the caller did not properly extend the argument to 32-bit? AIUI that's supposed to happen here:

If you're extending a bool, it would be zero-extended, not sign-extended (and if you bool as char rather than a bit, it doesn't matter which you do as the sign bit is always 0).

If you're extending a bool, it would be zero-extended, not sign-extended (and if you bool as char rather than a bit, it doesn't matter which you do as the sign bit is always 0).

Yes, Rust should zero-extend bool -- see the i1 zeroext that @nagisa demonstrated.

My point was that if you don't extend at all, then the significant 0 or 1 of the bool would be written at the word's lowest address 136(r1), rather than the highest address 139(r1) where it should be on big-endian. I suppose if it somehow extended like little-endian, you would also have this problem.

FWIW, while I don't use powerpc, I do build for big-endian powerpc64 and s390x on Fedora. I haven't seen such problems there, so I doubt this is a general big-endian problem, though we have had such in the past...

I think you misunderstood something...

My point was that if you don't extend at all, then the significant 0 or 1 of the bool would be written at the word's lowest address 136(r1), rather than the highest address 139(r1) where it should be on big-endian. I suppose if it somehow extended like little-endian, you would also have this problem.

If you read PowerPC SystemV ABI Spec,

Arguments not otherwise handled above are passed in the parameter words of the caller鈥檚 stack frame. SIMPLE_ARGs, as defined above, are considered to have 4-byte size and alignment, with simple integer types shorter than 32 bits sign- or zero-extended (conceptually) to 32 bits. float, long long (where implemented), and double arguments are considered to have 8-byte size and alignment, with float arguments converted to double representation. Round starg up to a multiple of the alignment requirement of the argument and copy the argument byte-for-byte, beginning with its lowest addressed byte, into starg, ..., starg+size-1. Set starg to starg+size, then go to SCAN.

PowerPC has 8 registers, from r3 to r10 (which do not care "endianness"), to pass arguments, then, uses stack to pass the rest of them. i1 must be extended to 32bit. It is the definition from PowerPC SystemV ABI, not from a high-level language such as C/C++.

let fn_metadata = unsafe {
    llvm::LLVMRustDIBuilderCreateFunction(
        DIB(cx),                // r3
        containing_scope,       // r4
        function_name.as_ptr(), // r5
        linkage_name.as_ptr(),  // r6
        file_metadata,          // r7
        loc.line as c_uint,     // r8
        function_type_metadata, // r9
        is_local_to_unit,       // r10 
        true,                   // r1+ 0x8
        scope_line as c_uint,   // r1+ 0xc    (r1 is the base address of the stack frame)
        flags,                  // r1+0x10
        cx.sess().opts.optimize != config::OptLevel::No, // r1+0x14
        llfn,                   // r1+0x18
        template_parameters,    // r1+0x1c
        ptr::null_mut())        // r1+0x20
};

So now true hits the issue.

To simplify the issue, consider the following LLVM language code (FYI: https://llvm.org/docs/LangRef.html, you may use llc extern_C.ll to compile to PowerPC asm in console.):

target datalayout = "E-m:e-p:32:32-i64:64-n32"
target triple = "powerpc-unknown-linux-gnu"

declare zeroext i1 @a_strange_function(i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext)

define i32 @main(i32, i8**) {
top:
  %2 = call zeroext i1 @a_strange_function(i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true)
  ret i32 0
}

llc produce:

li 3, 1
li 4, 1
li 5, 1
li 6, 1
stb 3, 12(1)
stb 3, 8(1)
li 3, 1
li 7, 1
li 8, 1
li 9, 1
li 10, 1
bl a_strange_function@PLT

This is wrong... stb 3, 8(1) store one byte on stack, it ignored zeroext and alignment that PowerPC align stack objects to 32 bits.

Interestingly, rustc itself produce clean code:

extern "C" {
    pub fn a_strange_function(r3: bool, r4: bool, r5: bool, r6: bool, r7: bool, r8: bool, r9: bool, r10: bool, s8: bool, s12: bool) -> bool;
}
fn main() {
    unsafe {
        print!("{}\n", a_strange_function(true, true, true, true, true, true, true, true, true, true))
    }
}

Use rustc extern_C.rs -Csave-temps -Ccodegen-units=1 -Cno-integrated-as.
If you got extern_C.extern_C0_......no-opt.bc, you can use llvm-dis ....bc to see the LLVM code.
The code looks like:

declare zeroext i1 @a_strange_function(i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext) unnamed_addr #2

; in main():
  %5 = call zeroext i1 @a_strange_function(i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true, i1 zeroext true)

The .s file looks like

li 3, 1
stw 3, 12(1)
stw 3, 8(1)
stw 3, 28(1)
lwz 4, 28(1)
lwz 5, 28(1)
lwz 6, 28(1)
lwz 7, 28(1)
lwz 8, 28(1)
lwz 9, 28(1)
lwz 10, 28(1)
bl a_strange_function@PLT

Look, they are good!

Conclusion: There may be some cross-compiling issues and/or misconfiguration on LLVM module.

@nagisa Yes. I found Bool, True, False in ffi.rs, it was introduced from 8 years ago, before Rust 0.1.

It did a type hack for FFI to enforce them to c_uint, completely worked for that time. We can see, 8 years later, now there are many bool, true, false in these code, they triggered the issue.

Now it is weird:

  1. Rustc compiles the right and ABI compatible LLVM bytecode and the bundled LLVM generates right assembly/machine code. i.e. correctly zeroexted and correctly aligned
  2. Rustc (from rustup) itself was compiled to wrong machine code which is non-ABI compatible.

The ad-hoc binary hack I posted fixed this function call (in callee side, you can do it on caller side anyways) but I am afraid there are some other bugs are still hidden.

The most strange thing is the official console-based LLVM compiler llc compiled wrong.

@cuviper

Does this imply that the caller did not properly extend the argument to 32-bit?

Yes sir. It didn't properly extend. We can patch from either callee side or caller side. I simply chose callee side. Incorrect + incorrect = correct :)

I think you misunderstood something...

I think we're in agreement actually, and I just didn't express myself well enough.

The caller _should_ be zero-extending that i1 to 32-bit. Doing so would write a true in memory like a u8 array [0, 0, 0, 1], since this is big-endian. If the callee was reading 139(r1), this checks out. Your patch changed the callee to 136(r1) to make it work, which implies that the caller wrote the bool value in the first byte only.

PowerPC has 8 registers, from r3 to r10 (which do not care "endianness"), to pass arguments, then, uses stack to pass the rest of them.

Right, but endianness does come into play for those values passed on the stack.

This is wrong... stb 3, 8(1) store one byte on stack, it ignored zeroext and alignment that PowerPC align stack objects to 32 bits.

Agreed! The caller is wrong here, and your LLVM-IR is a useful reproducer!

For comparison, if I modify the target lines for powerp64, then it does generate std to give us 64-bit values on the stack as required. So for some reason, just 32-bit powerpc is not behaving as we expect.

Right, but endianness does come into play for those values passed on the stack.

Completely agree. That's why I listed the code from codegen and counted from r3 to r10 to the stack hell.

What a lovely coincidence, most of librust_llvm functions have a long argument list longer than 8 arguments.

Goodness... it turns out to be a much deeper issue.
It is not a bug of Rust, but of LLVM, about optimisation in instruction selection. I am trying to find it out now.

-O0 and -O1 select the correct instructions (stw, store a word) while -O2 and -O3 select incorrectly (stb to the little-endian location).

llc uses -O2 as default optimisation level, no wonder it generated the wrong assembly.

So FFI breaks if two parts use different optimisation level. There is a C++ example.
Library

#include <iostream>
extern "C" void strange_function(
    bool r3, bool r4, bool r5, bool r6, bool r7, bool r8, bool r9, bool r10,
    bool s1, bool s2
) {
    std::cout << r3 << r4 << r5 << r6 << r7 << r8 << r9 << r10 << s1 << s2 << std::endl;
};

Main program

extern "C" void strange_function(
    bool r3, bool r4, bool r5, bool r6, bool r7, bool r8, bool r9, bool r10,
    bool s1, bool s2
);
int main() {
    strange_function(true, true, true, true, true, true, true, true, true, true);
}

screenshot_20180821_133200

I believe I've tracked it down in LLVM. Can you try with the following patch?

diff --git a/lib/Target/PowerPC/PPCISelLowering.cpp b/lib/Target/PowerPC/PPCISelLowering.cpp
index 037c4b5de9d1..a903f431ccee 100644
--- a/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -5463,10 +5463,11 @@ SDValue PPCTargetLowering::LowerCall_32SVR4(
       Arg = PtrOff;
     }

-    if (VA.isRegLoc()) {
-      if (Arg.getValueType() == MVT::i1)
-        Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Arg);
+    // Promote booleans to 32-bit values.
+    if (Arg.getValueType() == MVT::i1)
+      Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i32, Arg);

+    if (VA.isRegLoc()) {
       seenFloatArg |= VA.getLocVT().isFloatingPoint();
       // Put argument in a physical register.
       RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));

That patch only addresses the caller.

There still appears to be some problem with the callee, e.g.

define zeroext i1 @a_strange_function(i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext, i1 zeroext) {
  %result = and i1 %8, %9
  ret i1 %result
}

produces

        .globl  a_strange_function      # -- Begin function a_strange_function
        .p2align        2
        .type   a_strange_function,@function
a_strange_function:                     # @a_strange_function
.Lfunc_begin0:
        .cfi_startproc
# %bb.0:
        lbz 3, 12(1)
        lbz 4, 8(1)
        and 3, 4, 3
        clrlwi  3, 3, 31
        blr
.Lfunc_end0:
        .size   a_strange_function, .Lfunc_end0-.Lfunc_begin0
        .cfi_endproc
                                        # -- End function

... loading bytes from the wrong offset. I think this is consistent with what you found about mismatched optimization levels in either direction.

And BTW, we don't see that problem in the function definition in RustWrapper.cpp, because it's currently compiled with GCC, not Clang.

Please let me know when you open an issue in LLVM Bugzilla so that I can alert the PPC team.

@LionNatsu or @cuviper, would one of you be willing to report the bug with LLVM? You seem to have the best handle on the issue.

I'm filing it right now.

@michaelwoerister Filing it right now. I registered a new LLVM Bugzilla account a few hours before.

@LionNatsu sorry to race you -- I just filed https://bugs.llvm.org/show_bug.cgi?id=38661

Oh @cuviper It is okay. I'll add me to CC of your issue page.

These two issues can be marked as duplicated:

  • powerpc32: Expected no forward declarations! LLVM ERROR: Broken function found, compilation aborted! #41253
  • '-g' flag causes crash in PowerPC rustc for certain inputs. #31302

BTW, should we change the title of the issue a little bit?

By the way, there was another ICE panic issue on rand module about syscall number: https://github.com/rust-random/rand/pull/589 fixed and released version 0.4.3 last week.

rustc calls rand for incremental build and temporary filenames, so if your rust program on powerpc has not been fixed (migrated to 0.5 or recompiled with rand 0.4.3), it probably doesn't work. I have a patch here: https://pastebin.aosc.io/paste/lOjhMoUXRGJWSCOb0YDKvQ

find ~/.rustup ~/.cargo -type f | xargs python3 fix_rand_0.4_syscall_on_powerpc.py

@LionNatsu Were you able to test if that LLVM patch is enough to fix rustc overall? It's clear to me that it fixes the narrow IR test case, but it's always possible that there's more going on. I'm especially wondering since this bug report only appeared fairly recently in Rust, but my bisection of the LLVM codegen goes all the way back before LLVM 3.5!

I'm especially wondering since this bug report only appeared fairly recently in Rust,

Well, I guess the other bugs you just referenced are much older, so maybe this has indeed been a long problem, just somehow it's been good enough to limp along anyway...

@cuviper Testing the patch now. Will report back ASAP.

Were you able to test if that LLVM patch is enough to fix rustc overall?

Emmm.. rustc compiles my Rust code without error message now. Yes, but I am not sure if it was completely working as intended. The bug can silently change the logic... This time we found it because it marked a function definition as declaration then LLVM complained "Are you adding stuff in a function declaration?" aka. "Expected no forward declarations!".

but it's always possible that there's more going on.

Yes.

BTW, I ran grep -lr 'extern "C"'| grep '\.rs' on rust 1.28.0: https://pastebin.aosc.io/paste/z6yoTdfChTQ8FhzcM-9z7w

I have tested @LionNatsu 's patch from the LLVM review site and I can confirm that Rust works again for me \o/. Great job!

@glaubitz Thanks for opening the LLVM buf and creating a patch. I'm glad the combined effort could track this down.

@edelsohn All credits go to @LionNatsu :)

Hmm, so while the LLVM bug was now fixed, rustc is now running into this problem on powerpc:

thread 'main' panicked at 'unexpected getrandom error: Invalid argument (os error 22)', vendor/rand/src/os.rs:130:21

https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=powerpc&ver=1.28.0%2Bdfsg1-3&stamp=1536680035&raw=0

Even rustc 1.24 doesn't work anymore. I have the impression that something external is causing this.

@glaubitz Have you checked this message? I have already fixed this recently https://github.com/rust-random/rand/pull/589, released at rand 0.4.3.
All versions of rand before 0.4.3 were using wrong system call number on PPC(32/64).

Aha! I haven't seen that. Thanks a lot for pointing me at that. You probably saved me lots of digging now ;).

You're welcome :)

All versions of rand before 0.4.3 were using wrong system call number on PPC(32/64).

FWIW, their #[cfg(target_arch = "powerpc")] only applies to PPC32. PPC64 uses target_arch = "powerpc64", for both big and little endian, so they've never even tried to use getrandom. I just opened rust-random/rand#608 for this.

Is it actually necessary to add other architectures like sparc64 there as well? I have just skimmed over the code, so I don't know whether the syscall definitions are necessary.

It's not necessary -- getrandom didn't even exist until kernel 3.17, and Rust supports older targets than that. It will fall back on /dev/urandom instead, keeping a static File open.

Right, I completely forgot about the fact that getrandom is rather new. I'll send a few PRs to amend the other architectures then.

The LLVM fix is now committed. The next step for Rust is to backport it onto rust-lang/llvm branch rust-llvm-release-8-0-0-v1 in a pull request. When that's merged, you can bump the submodule here, like I did in #54136, and mark it to fix this issue.

I sent the PR just now. It may also fix #31302 and #41253, but I don't have that permission to close them.

Nightly is still suffering from the rand issue:

thread 'main' panicked at 'unexpected getrandom error: Invalid argument (os error 22)', /home/glaubitz/.cargo/registry/src/github.com-1ecc6299db9ec823/rand-0.4.2/src/os.rs:130:21

I have to fix that manually first and then re-try.

@glaubitz, rand was fixed in 0.4.3, and the compiler has been updated to that since #53567. But you appear to have something of your own built with 0.4.2, since the path is under your ~/.cargo/. In that case, that program will have the wrong NR_getrandom regardless of the compiler.

@cuviper Yes, for some reason it took rand from my home directory instead of the vendored one, despite the fact I downloaded a full nightly tarball and cross-built that on x86_64. Will try that again later today.

There is configure --enable-vendor for that, or [build] vendor = true directly in config.toml. But even if you don't use the vendored sources, src/Cargo.lock should still direct you to rand 0.4.3.

Maybe it's actually the rustup shim getting in your way? If you also compiled that yourself, it looks like they're still locked on 0.4.2:
https://github.com/rust-lang-nursery/rustup.rs/blob/c3e4ce4c5e11a5557a080be9bf8f6ee9d2c87839/Cargo.lock#L1746

No, I've never used rustup. But I'll make sure --enable-vendor is set.

Turns out I'm an idiot. I put the cross-compiled rustc distribution under /root/ but for testing I picked the older, buggy one from /srv/debian/ -.-.

Testing now, looks good but let me wait for the build to finish.

I can confirm that on nightly, rustc works correctly on powerpc-unknown-linux-gnu again.

This can be resolved as fixed and closed. Thanks everyone!

You tested with #54266 applied? Normally we would let that PR merge close the bug automatically.

No, I tested the nightly and confirmed that it works.

Well, nightly doesn't have the fix yet, so I think you just got lucky that the bug didn't manifest. Meaning that the unwritten stack locations must have had "harmless" bytes in them already, this time.

That surprises me. Without the fix, I could barely compile anything on powerpc. It bailed out very early trying to build the compiler itself. The nightly, on the other hand, works completely fine.

But I just looked, nightly doesn't seem to have it. What are the odds.

Was this page helpful?
0 / 5 - 0 ratings