Rust: Build hang on Linux

Created on 20 Sep 2020  Ā·  74Comments  Ā·  Source: rust-lang/rust

Problem
I am seeing the build process hang at the last few steps when attempting to build this particular repository on Linux using either the nightly or the beta toolchain. I've seen this happen for the repository both when building locally (Linux Mint) and in Github's CI (Ubuntu). This does not happen with the stable toolchain, and I have not seen it happen with any toolchain on Windows.

I expect to see the build process finish in approximately 20 minutes like it does with the stable toolchain, but I observe a hanging build. If there is a problem with the build process, I expect to see an indication of what it is, especially if I pass a verbose flag, but I observe no further output once it gets into the hanging state---the only output I see is from steps that have already finished, which provides no insight into why it hangs.

Steps

Switch to nightly toolchain:

$ rustup default nightly
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.48.0-nightly (bbc677480 2020-09-18)

Fetch the following repository:

$ git clone https://github.com/danieldeankon/hypermine.git
$ cd hypermine/
$ git lfs pull

Attempt to build the binaries:

$ cargo build --release -v

The last thing I see before it gets into a hanging state:

   Compiling client v0.1.0 (/home/daniel/git/testinghypermine/hypermine/client)
     Running `rustc --crate-name client --edition=2018 client/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="use-repo-assets"' -C metadata=c38d85b37e3e2e40 -C extra-filename=-c38d85b37e3e2e40 --out-dir /home/daniel/git/testinghypermine/hypermine/target/release/deps -L dependency=/home/daniel/git/testinghypermine/hypermine/target/release/deps --extern anyhow=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libanyhow-43ee26a5a4ae988b.rmeta --extern ash=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libash-1d91e46c2845798f.rmeta --extern ash_window=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libash_window-e581873b9287fab4.rmeta --extern common=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libcommon-bdb19128efefc960.rmeta --extern directories=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libdirectories-4a64043fdaf679b9.rmeta --extern downcast_rs=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libdowncast_rs-be493f8657cf1f42.rmeta --extern futures_util=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libfutures_util-79782c1686738829.rmeta --extern fxhash=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libfxhash-aa9a2aa3814ab312.rmeta --extern gltf=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libgltf-ae78e76833b667c0.rmeta --extern hdrhistogram=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libhdrhistogram-1c5607d77dc61a11.rmeta --extern hecs=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libhecs-7ceed138b109be7b.rmeta --extern lahar=/home/daniel/git/testinghypermine/hypermine/target/release/deps/liblahar-c3a4855d049ab353.rmeta --extern memoffset=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmemoffset-33ada65033678cfe.rmeta --extern metrics=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmetrics-79a79d3342c8701e.rmeta --extern metrics_core=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libmetrics_core-7e657b2c60f0cf49.rmeta --extern na=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libnalgebra-615d0479649eed30.rmeta --extern png=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libpng-089627b494fcd05d.rmeta --extern quinn=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libquinn-a3a9da2f9e31ef73.rmeta --extern rcgen=/home/daniel/git/testinghypermine/hypermine/target/release/deps/librcgen-be435a028b2dfce2.rmeta --extern rustls=/home/daniel/git/testinghypermine/hypermine/target/release/deps/librustls-825cbe6b34245b7a.rmeta --extern serde=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libserde-fbaa982a65688d4a.rmeta --extern server=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libserver-8fd1f97c8fb84524.rmeta --extern tokio=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtokio-aa180fba706594fa.rmeta --extern toml=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtoml-f7f6190f25a07c8e.rmeta --extern tracing=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libtracing-e0bf1ed6aae74c7d.rmeta --extern vk_shader_macros=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libvk_shader_macros-d2fb7777a36a4f79.rmeta --extern webpki=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwebpki-66dacb92629212d5.rmeta --extern whoami=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwhoami-05f81a39d2c15f6f.rmeta --extern winit=/home/daniel/git/testinghypermine/hypermine/target/release/deps/libwinit-0172bcc7807380da.rmeta -L native=/home/daniel/git/testinghypermine/hypermine/target/release/build/ring-86d47ac3f01cd916/out`
    Building [=====================================================> ] 255/257: client

Notes

I was able to reproduce this behaviour in Github CI here: https://github.com/danieldeankon/hypermine/pull/2/checks?check_run_id=1138809646

CI / test (ubuntu-latest, stable) finished in 17m 39s. CI / test (ubuntu-latest, beta) has been going for two hours and counting at time of writing, and similarly for CI / test (ubuntu-latest, nightly).

Output of cargo version:
cargo 1.48.0-nightly (8777a6b1e 2020-09-15)

My operating system is Linux Mint 19.2 Cinnamon.

E-hard E-needs-test I-hang P-medium T-compiler

Most helpful comment

While the options @tmandry lists would work, they're not necessary. As I already pointed out, the C++ symbols in libLLVM.so have different symbol versions (LLVM_xxx), so shared objects that were linked against the system libstdc++ (GLIBCXX_xxx) should never link to them. There's just an oddity with some of the symbols where the dynamic loader is trying to make them globally unique in the process despite the symbol versioning that's applied. Additionally, rustc_llvm doesn't actually link to the STL symbols exported by libLLVM.so.

I don't know why this changed between LLVM 10 and LLVM 11, but here you can clearly see the difference:

$ objdump -T stable-x86_64-unknown-linux-gnu/lib/libLLVM-10-rust-1.46.0-stable.so |grep _ZNSs4nposE
000000000077cfd8  w   DO .rodata    0000000000000008  LLVM_10     _ZNSs4nposE
                  ^
                weak symbol
$ objdump -T nightly-x86_64-unknown-linux-gnu/lib/libLLVM-11-rust-1.48.0-nightly.so | grep _ZNSs4nposE
00000000008214b8 u    DO .rodata    0000000000000008  LLVM_11     _ZNSs4nposE
                 ^
                STB_GNU_UNIQUE symbol

So, again, here are the options I propose that are much less ā€œinvasiveā€ in the sense that they don't affect linking flows or code generation, just what symbols are exported from the libraries we already build and how:

  1. Avoid generating STB_GNU_UNIQUE-bound symbols, in one of three ways:
    a. Find out what changed between LLVM 10 and LLVM 11 to see if there's a flag we can set to prevent this behavior.
    b. Compile LLVM with the -fno-gnu-unique compiler flag
    c. Run the tool from https://github.com/rust-lang/rust/issues/76980#issuecomment-704480880 on libLLVM.so after LLVM is built.
  2. Change what symbols are exported from libLLVM.so. We only need the LLVM C/C++ API, nothing else.

    • Presumably there are some CMake flags we can use for this

All 74 comments

Transferred the issue to rust-lang/rust. This is a rustc issue, and I don't think there is a lot that Cargo can do about it. rustc is doing work, it just never finishes.

I have bisected this to #73526. cc @cuviper, are there any known issues with the LLVM 11 upgrade causing rustc to go into any kind of infinite loops? When I attached with gdb, it looked like it was deep inside a proc macro (vk-shader-macros).

I'm not aware of any LLVM 11 issues like that.

@cuviper Did you remove I-hang and T-compiler labels intentionally or was it conflicting with the fact that I just added them?

@LeSeulArtichaut I only meant to change the regression label. I used the Android client, so I guess it batched that update, with clobbering. šŸ¤•

Adding them back now, and Iā€™ll mark this conversation as resolved to limit the noise

@rustbot ping llvm

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
[instructions] for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.

I couldn't reproduce this issue with my MacOS laptop (x86_64-apple-darwin). I tried compiling with stable (1.46.0), nightly-2020-09-18 and latest nightly (nightly-2020-09-23), resulting in a successful compilation in less than 15 minutes. Unless I missed something, I guess this probably means this is a Linux-only issue?
@ehuss Did you also use a Linux machine for your previous bisection?

It would be great if other people could take a look and eventually find an MCVE which would be very helpful to resolve this bug.
@rustbot ping cleanup

Hey Cleanup Crew ICE-breakers! This bug has been identified as a good
"Cleanup ICE-breaking candidate". In case it's useful, here are some
[instructions] for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @AminArria @camelid @chrissimpkins @contrun @DutchGhost @elshize @ethanboxx @h-michael @HallerPatrick @hdhoang @hellow554 @imtsuki @JamesPatrickGill @kanru @KarlK90 @LeSeulArtichaut @MAdrianMattocks @matheus-consoli @mental32 @nmccarty @Noah-Kennedy @pard68 @PeytonT @pierreN @Redblueflame @RobbieClarken @RobertoSnap @robjtede @SarthakSingh31 @senden9 @shekohex @sinato @smmalis37 @spastorino @Stupremee @turboladen @woshilapin @yerke

I can reproduce it on x86_64-unknown-linux-gnu with rustc 1.48.0-nightly (fb1dc34a8 2020-09-21).

I think I found a mvce. (The build is currently running for about ~10 minutes).

Cargo.toml

[dependencies]
vk-shader-macros = "0.2.6"

shader.vert

#version 450
void main()  {}

main.rs

const _VERT: &[u32] = vk_shader_macros::include_glsl!("shader.vert");

fn main() {}

I couldn't reproduce this issue with my MacOS laptop [...] and latest nightly (nightly-2020-09-23)

I can reproduce it on x86_64-unknown-linux-gnu with rustc 1.48.0-nightly (fb1dc34a8 2020-09-21).

2020-09-21 to 2020-09-23 includes the LLVM 11-rc3 update. @Stupremee can you try with the latest nightly?

Same result

I can confirm, this only happens on Linux. Windows and macOS do not seem to be affected. It doesn't seem to matter if it is --release or not. Tested with rustc 1.48.0-nightly (8b4085359 2020-09-23).

Thanks for making the smaller repro @Stupremee! I imagine it will be difficult to narrow it down more. shaderc-sys is a huge dependency.

Attaching with a debugger, the function to_string is returning an empty string, causing the code here to go into an infinite loop because it is unable to generate a unique string.

My C++ is fairly rusty, and I don't have any guesses what could cause that apparently straightforward code to misbehave.

Yep, thats what I found out as well. But it's still super weird that it works on stable, which I haven't figured out right now.

My current hypothesis is that LLVM 11 is performing some incorrect optimization of the SPIRV-Tools library, which would explain why it works on stable.

At least, I cannot find any reason that the code linked by @ehuss above would be allowed to have to_string return an empty string.

(Of course I still need to confirm this hypothesis.)

Another report of shaderc-sys corruption when used as a proc-macro: https://github.com/rust-lang/rust/issues/77532#issuecomment-703302256. Similarly, a seemingly simple ostringstream construction is returning a corrupt string.

I suspect this may be a C++ symbol clash with rustc's static libstdc++, since proc-macro libraries are loaded in the same process.

Maybe setting RTLD_DEEPBIND during dlopen might fix that, rust-analyzer had problems with proc macros until it set that flag

Yeah that fixes it

Long term, we might be better off loading proc macros in a helper process. I think the token stream is already marshalled enough that it could be sent inter-process.

Looks like that wasn't the fix, I was using the wrong LLVM

LD_DEBUG=bindings may help confirm whether the proc-macro is binding to symbols from our static libstdc++, and which symbols exactly. We can also compare that with a pre-LLVM11 build to see if this issue already existed, perhaps just with less problematic symbols. See man ld.so for more options -- might want to use a LD_DEBUG_OUTPUT file too.

I have confirmed this regressed in 7ce71c362be9a89e7897ac066aba6e3e6f747800, at least, which I think we had not done before. That matches our prior assumption of being triggered by an LLVM upgrade.

Currently experimenting with the debug options @cuviper suggested.

https://github.com/Mark-Simulacrum/rust-issue-76980 contains a minimal reproduction with just a cc crate dependency to compile the C++ file. On stable, the proc macro sees 2, on beta it sees 0. I'm still investigating the cause though.

I tried LD_DEBUG myself on your minimal reproducer, and there are a lot of bindings from /lib64/libstdc++.so.6 to .../lib/libLLVM-1x-rust-1.47.0-nightly.so, which seems bad. I don't see any _new_ bindings that way -- this was also happening before LLVM 11 -- but there's a sequence of internal bindings that seem to show that we're on a different track:

With e482c86b9 (just before LLVM 11 landed): debug.e482c86b9.477027.gz

binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo9_M_insertImEERSoT_' [GLIBCXX_3.4.9]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryC1ERSo' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNKSt5ctypeIcE13_M_widen_initEv' [GLIBCXX_3.4.11]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `memcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE13_M_insert_intImEES3_S3_RSt8ios_basecT_' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEE8_M_pbumpEPcS5_l' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `memcpy' [GLIBC_2.14]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryD1Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_replaceEmmPKcm' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_baseD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base17_M_call_callbacksENS_5eventE' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base20_M_dispose_callbacksEv' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt3_V214error_categoryD2Ev' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt14error_categoryD2Ev' [GLIBCXX_3.4.15]

With 7ce71c362 (with LLVM 11): debug.7ce71c362.477169.gz

binding file /lib64/libstdc++.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `strcmp' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo9_M_insertImEERSoT_' [GLIBCXX_3.4.9]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryC1ERSo' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt16__throw_bad_castv' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_allocate_exception' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_throw' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_get_globals' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/ld-linux-x86-64.so.2 [0]: normal symbol `__tls_get_addr' [GLIBC_2.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_init_primary_exception' [CXXABI_1.3.11]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt14get_unexpectedv' [GLIBCXX_3.4.20]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt13get_terminatev' [GLIBCXX_3.4.20]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_RaiseException' [GCC_3.0]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_Find_FDE' [GCC_3.0]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libc.so.6 [0]: normal symbol `dl_iterate_phdr' [GLIBC_2.2.5]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libc.so.6 [0]: normal symbol `strlen' [GLIBC_2.2.5]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libpthread.so.0 [0]: normal symbol `pthread_once'
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetLanguageSpecificData' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetRegionStart' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetIPInfo' [GCC_4.2.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNK10__cxxabiv117__class_type_info11__do_upcastEPKS0_PKvRNS0_15__upcast_resultE' [CXXABI_1.3]
binding file /lib64/libgcc_s.so.1 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_GetCFA' [GCC_3.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_SetGR' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_SetIP' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_begin_catch' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt9basic_iosIcSt11char_traitsIcEE11_M_setstateESt12_Ios_Iostate' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_end_catch' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_get_globals_fast' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libgcc_s.so.1 [0]: normal symbol `_Unwind_DeleteException' [GCC_3.0]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt9exceptionD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `__cxa_free_exception' [CXXABI_1.3]
binding file /lib64/libstdc++.so.6 [0] to /home/jistone/.rustup/toolchains/COMMIT/bin/rustc [0]: normal symbol `free' [GLIBC_2.2.5]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSo6sentryD1Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_assignERKS4_' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_baseD2Ev' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base17_M_call_callbacksENS_5eventE' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base20_M_dispose_callbacksEv' [GLIBCXX_3.4.6]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt3_V214error_categoryD2Ev' [GLIBCXX_3.4.21]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZNSt14error_categoryD2Ev' [GLIBCXX_3.4.15]

I don't know that we should read anything into that difference -- seems like it could be happenstance. In any case, I think we're on very shaky ground loading dynamic C++ proc macros into the rustc process with our static libstdc++.


This also looks a bit related: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42679

Thanks a lot for all these details in the debugging process, it's really enjoyable to read.

My small contribution, which confirms that things can be right if the symbols are actually the same instead of clashing:
With rust 1.47.0 (preview release) compiled with the system llvm (11 rc2), itself built with the system libc++ (11 rc2) which is also used as the default system c++ stdlib, the proc macro ends up using the same libc++ as rustc itself, and the example project outputs 2 as expected.

Jonathan Wakely pointed me to one of his Stack Overflow answers:
https://stackoverflow.com/questions/46746878/is-it-safe-to-link-c17-c14-and-c11-objects/49119902#49119902

Where you have problems is if you link together objects compiled with different versions of GCC and you have used unstable features from a new C++ standard before GCC's support for that standard is complete.

But we're using GCC 5.5 and building for C++14, which he tells me should be fine in that regard.

There's another caveat about std::string vs. std::__cxx11::string, but our build of LLVM does appear to be using the new ABI there, and my Fedora 32 system is also using the new ABI when building the reproducer

I note that the answer (at least as I interpret it) seems to be about linking together a bunch of code (compiled against and potentially inlining header definitions etc), but still linking to a single copy off libstdc++ -- maybe I'm misinterpreting though?

Yeah, that SO situation is not a perfect match.

When I step through the reproducer's ostream<< in libstdc++-10.2.1-1.fc32.x86_64, here's where it goes wrong:

src/foo.cpp

  1 #include <string>
  2 #include <sstream>
  3  
  4 extern "C" int custom_to_string(unsigned int id) {
  5         std::stringstream os;
  6         os << id;
  7         return os.str().length();
  8 }

libstdc++-v3/include/bits/ostream.tcc

 60            template<typename _CharT, typename _Traits>
 61              template<typename _ValueT>
 62                basic_ostream<_CharT, _Traits>&
 63                basic_ostream<_CharT, _Traits>::
 64                _M_insert(_ValueT __v)
 65                {
 66                  sentry __cerb(*this);
 67                  if (__cerb)
 68                    {
 69                      ios_base::iostate __err = ios_base::goodbit;
 70                      __try
 71                        {
>72                          const __num_put_type& __np = __check_facet(this->_M_num_put);
 73                          if (__np.put(*this, *this, this->fill(), __v).failed())
 74                            __err |= ios_base::badbit;
 75                        }
 76                      __catch(__cxxabiv1::__forced_unwind&)
 77                        {
 78                          this->_M_setstate(ios_base::badbit);
 79                          __throw_exception_again;
 80                        }
 81                      __catch(...)
 82                        { this->_M_setstate(ios_base::badbit); }
 83                      if (__err)
 84                        this->setstate(__err);
 85                    }
 86                  return *this;
 87                }

At line 72, this->_M_num_put is NULL. Then _check_facet calls __throw_bad_cast(), which is caught and line 78 sets badbit. Back in custom_to_string, I can indeed see that stream state:

(gdb) p os.rdstate()
$15 = std::_S_badbit

That's why the stringstream is empty.

I still don't know why we get that NULL, but there was a tell in what I posted earlier -- just the facet lines...

Before:

binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEERKT_RKSt6locale' [GLIBCXX_3.4]

After:

binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9use_facetISt5ctypeIcEERKT_RKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]
binding file /lib64/libstdc++.so.6 [0] to /lib64/libstdc++.so.6 [0]: normal symbol `_ZSt9has_facetISt7num_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale' [GLIBCXX_3.4]

I guess that may need stepping through the stream constructor, but that's a mess of locale code...

If I change all dynamic symbols in libLLVM-11-rust-1.48.0-nightly.so that are of binding STB_GNU_UNIQUE and I change them to binding GLOBAL, the problem goes away.

fn main() {
    let mut file = std::fs::File::open("libLLVM-11-rust-1.48.0-nightly.so").unwrap();
    let mut elf = elfkit::elf::Elf::from_reader(&mut file).unwrap();
    elf.load_all(&mut file).unwrap();

    let dynsym = elf.sections.iter_mut().find(|sec| sec.name == b".dynsym").unwrap();
    let syms = dynsym.content.as_symbols_mut().unwrap();
    for sym in syms {
        if sym.bind == elfkit::types::SymbolBind::STB_GNU_UNIQUE {
            sym.bind = elfkit::types::SymbolBind::GLOBAL
        }
    }

    let mut f = std::fs::File::create("test.so").unwrap();
    elf.to_writer(&mut f).unwrap();
}

I found this by observing (with LD_DEBUG) that symbols with version LLVM_11 are being bound to symbols with version GLIBCXX_3.4. This is not normal behavior. I'm guessing there are some special rules for STB_GNU_UNIQUE.

     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
     25504: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-11-rust-1.48.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
...

Should we be building the LLVM shared object with some flags to only export LLVM symbols, not other unrelated symbols?

I'm not sure, but we may need libstdc++ interop symbols between librustc_driver.so (with crate rustc_llvm's C++ wrappers) and libLLVM.so.

I do see some STL use in rustc_llvm, but that looks to be all inlined/linked statically based on objdump -TC librustc_driver-cc82839033d44d0f.so|grep LLVM_11

I tried reading the LLVM CMake files but I can't make much out of it. According to https://github.com/llvm/llvm-project/blob/master/llvm/tools/llvm-shlib/CMakeLists.txt#L1-L3 by default the shared object should only contain the C functions, but that's clearly not the case for us and I don't see how we're enabling any deviation from the default.

Alternatively, we might compile LLVM with -fno-gnu-unique.

FWIW, I can also reproduce this with my own local rust/llvm build configured with [llvm] static-libstdcpp = true -- so the versions of the static and dynamic libstdc++ are identical.

Hmm, but it also requires [llvm] link-shared = true. When I set that false, the reproducer behaves properly.

For this post I consulted with my friendly neighborhood LLVM expert, @petrhosek.

The libstdc++ symbols get exported from libLLVM.so because we don't compile with -fvisibility=hidden. If they didn't get exported the dynamic linker wouldn't override them. However, turning on this flag would likely break the rustllvm bridge, which calls into LLVM's private C++ ABI.

So it looks like our options are one of:

  1. Document that proc macros must not dynamically link the C++ standard library (or more generally, any dynamic libraries other than the proc macro runtime)
  2. Run proc macros in a separate process
  3. Link LLVM statically
  4. Build rustllvm into libLLVM.so somehow (@petrhosek said we could build a libLLVM.a and then link that into a rustllvm shared library, but this would likely require adding support for building a static library to LLVM's build system)

I've opened up https://github.com/rust-lang/rust/pull/77634 to switch off ThinLTO in our x86_64-unknown-linux-gnu CI, which I believe would mean that we end up with a statically linked LLVM. It's not something I want to merge into master, since it currently would break downloading LLVM from CI (I think), but it is something I would feel comfortable backporting to stable, presuming the performance hit is not too big. (It's also performance every other platform has, which is not entirely unreasonable).

This is essentially option 3 above.

I suspect that in the long run option 4 is the right thing for us -- it seems feasible to do, at least, and we could also replace rustllvm with a commit directly to LLVM since we maintain a fork anyway which expands the C ABI for us (I think this is true)?

we could also replace rustllvm with a commit directly to LLVM since we maintain a fork anyway which expands the C ABI for us (I think this is true)?

Sort of, but not really. I guess you can say that rustc_llvm (formerly rustllvm) is that expanded C ABI, because it wraps C++ calls in an extern "C" API for Rust FFI. But this stands alone so it can be used with external LLVM too. The only added APIs within our forked LLVM are MCSubtargetInfo::getCPUTable and getFeatureTable, which are still C++, and neither are essential. It would be leaving external LLVM users (distros) behind if we made rust-forked APIs mandatory within LLVM.

Okay, looking at results from #77634, it looks like that's a very viable fix for stable (and likely 1.48 beta).

Wall times https://perf.rust-lang.org/compare.html?start=9fdaeb393a16951f6fdef087193fef576e36aba6&end=6f967be1d4ff00423ed16df7cb77f4074a9d6b80&stat=wall-time are pretty good -- definitely not bad enough that we can't ship it.

While the options @tmandry lists would work, they're not necessary. As I already pointed out, the C++ symbols in libLLVM.so have different symbol versions (LLVM_xxx), so shared objects that were linked against the system libstdc++ (GLIBCXX_xxx) should never link to them. There's just an oddity with some of the symbols where the dynamic loader is trying to make them globally unique in the process despite the symbol versioning that's applied. Additionally, rustc_llvm doesn't actually link to the STL symbols exported by libLLVM.so.

I don't know why this changed between LLVM 10 and LLVM 11, but here you can clearly see the difference:

$ objdump -T stable-x86_64-unknown-linux-gnu/lib/libLLVM-10-rust-1.46.0-stable.so |grep _ZNSs4nposE
000000000077cfd8  w   DO .rodata    0000000000000008  LLVM_10     _ZNSs4nposE
                  ^
                weak symbol
$ objdump -T nightly-x86_64-unknown-linux-gnu/lib/libLLVM-11-rust-1.48.0-nightly.so | grep _ZNSs4nposE
00000000008214b8 u    DO .rodata    0000000000000008  LLVM_11     _ZNSs4nposE
                 ^
                STB_GNU_UNIQUE symbol

So, again, here are the options I propose that are much less ā€œinvasiveā€ in the sense that they don't affect linking flows or code generation, just what symbols are exported from the libraries we already build and how:

  1. Avoid generating STB_GNU_UNIQUE-bound symbols, in one of three ways:
    a. Find out what changed between LLVM 10 and LLVM 11 to see if there's a flag we can set to prevent this behavior.
    b. Compile LLVM with the -fno-gnu-unique compiler flag
    c. Run the tool from https://github.com/rust-lang/rust/issues/76980#issuecomment-704480880 on libLLVM.so after LLVM is built.
  2. Change what symbols are exported from libLLVM.so. We only need the LLVM C/C++ API, nothing else.

    • Presumably there are some CMake flags we can use for this

Started building a pre-release with ThinLTO disabled in https://github.com/rust-lang/rust/pull/77642, so if that works I think we're covered for 1.47.0. We still need to fix it for 1.48.0 onwards, but for them we have the time to think about and test a proper solution (@jethrogb's proposals are interesting).

The pre-release is out! You can download it following these instructions: https://blog.rust-lang.org/inside-rust/2020/10/07/1.47.0-prerelease-2.html

I don't know why this changed between LLVM 10 and LLVM 11, but here you can clearly see the difference:

It's interesting that libLLVM-10-rust-1.46.0-stable.so had weak instead of unique symbols, and I believe you that changing this might avoid the problem, but I don't think this is the whole story. The exact regression in the LLVM 11 upgrade #73526 goes from e482c86b9de32c6392cb83aa97d72e22425163f9 to 7ce71c362be9a89e7897ac066aba6e3e6f747800, which can be inspected similarly:

$ objdump -T e482c86b9de32c6392cb83aa97d72e22425163f9/lib/libLLVM-10-rust-1.47.0-nightly.so | grep _ZNSs4nposE
00000000007b5f58 u    DO .rodata        0000000000000008  LLVM_10     _ZNSs4nposE

$ objdump -T 7ce71c362be9a89e7897ac066aba6e3e6f747800/lib/libLLVM-11-rust-1.47.0-nightly.so | grep _ZNSs4nposE
000000000081bb28 u    DO .rodata        0000000000000008  LLVM_11     _ZNSs4nposE

i.e. both have u = unique for that symbol.

I compared the entire list of symbols in those two files, and apart from obvious LLVM API differences, there are a bunch of new STL symbols now included in the file, including stream stuff that may be relevant to the reproducer. Those are all weak though, and the only additions in unique symbols are mersenne_twister_engine data.

Via cargo bisect-rustc, I found that the flip from w to u happened with the dist upgrade in #74163 -- another of mine, yay! But I suppose that makes sense as a possible change given how GCC describes this:

On systems with recent GNU assembler and C library, the C++ compiler uses the "STB_GNU_UNIQUE" binding ...

Interesting, so it's a combination of the two. With nightly-2020-07-22 I do see some symbols being bound incorrectly:

       541: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-2020-07-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-10-rust-1.47.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
       541: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-2020-07-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-10-rust-1.47.0-nightly.so [0]: normal symbol `_ZNSt8numpunctIcE2idE' [GLIBCXX_3.4]
       541: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-2020-07-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-10-rust-1.47.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
       541: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-2020-07-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-10-rust-1.47.0-nightly.so [0]: normal symbol `_ZNSt7__cxx118numpunctIcE2idE' [GLIBCXX_3.4.21]
       541: binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /home/jethro/.rustup/toolchains/nightly-2020-07-22-x86_64-unknown-linux-gnu/bin/../lib/../lib/libLLVM-10-rust-1.47.0-nightly.so [0]: normal symbol `_ZNSt7collateIcE2idE' [GLIBCXX_3.4]
...

But I suppose we got lucky and none of these have been used by proc macros (so far).

Also, I believe on Linux currently, libLLVM is statically linked to the (build) system libstdc++. Symbols such as _S_empty_rep_storage and std::__cxx11 don't appear in LLVM's libcxx, but do appear in libLLVM's symbol table.

@cuviper regarding your earlier code investigation, _check_facet has something to do with locale handling. All the ::idunique symbols also have something to do with locale handling.

Also, I believe on Linux currently, libLLVM is statically linked to the (build) system libstdc++. Symbols such as _S_empty_rep_storage and std::__cxx11 don't appear in LLVM's libcxx, but do appear in libLLVM's symbol table.

In CI, it's statically linked to our own build of GCC 5.5 libstdc++, because the system GCC is too old to build LLVM.

But yes, we use libstdc++ and not libcxx, which is the default position for Linux. I think we would have a lot more ABI problems if we had a mix of static libcxx with a user's libstdc++ proc-macro. Similarly, if a user overrode the defaults to instead use libcxx in their proc-macro today, I don't expect it would go well...

I think we would have a lot more ABI problems if we had a mix of static libcxx with a user's libstdc++ proc-macro. Similarly, if a user overrode the defaults to instead use libcxx in their proc-macro today, I don't expect it would go well...

Why would there be any (more) problems? We desperately want to avoid any cross-contamination/cross-linking between LLVM's use of stdc++ and any plugin libraries, that's what this issue is about?

I expect that libstdc++ and libcxx would still overlap in _some_ symbols, while they of course differ in internal details. It seems bad enough when we're mixing libstdc++ versions, but wholly different libraries are a new ballgame.

(long technical analysis follows, for TLDR scroll down until the horizontal line)

In my /usr/lib/x86_64-linux-gnu/libstdc++.so.6, there's this code:

0000000000101300 <_ZSt9has_facetISt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEEEbRKSt6locale@@GLIBCXX_3.4>:
  101300:       53                      push   %rbx
  101301:       48 89 fb                mov    %rdi,%rbx
  101304:       48 8b 3d 6d a3 27 00    mov    0x27a36d(%rip),%rdi        # 37b678

The global variable at 0x37b678 is:

000000000037b678 R_X86_64_GLOB_DAT  _ZNSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE2idE@@GLIBCXX_3.4

We can see how the dynamic loader is filling in this relocation.

With the libLLVM-11-rust-1.48.0-nightly.so from a recent nightly the value at libstdc++.so.6 relative address 0x37b678 is 0x7f5d6abf4508. Memory map around that point:

0x7f5d66213000     0x7f5d67d55000  0x1b42000        0x0 libLLVM-11-rust-1.48.0-nightly.so
0x7f5d67d55000     0x7f5d6a8a9000  0x2b54000  0x1b41000 libLLVM-11-rust-1.48.0-nightly.so
0x7f5d6a8a9000     0x7f5d6abd9000   0x330000  0x4694000 libLLVM-11-rust-1.48.0-nightly.so
0x7f5d6abd9000     0x7f5d6abf2000    0x19000  0x49c3000 libLLVM-11-rust-1.48.0-nightly.so
0x7f5d6abf2000     0x7f5d6ac51000    0x5f000        0x0 

At this location we find:

00000000049e1508 u    DO .bss   0000000000000008  LLVM_11     _ZNSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE2idE

With the libLLVM-10-rust-1.47.0-nightly.so from nightly-2020-07-22 the value at libstdc++.so.6 relative address 0x37b678 is 0x7ffff2dbd2e0. Memory map around that point:

7fffee76f000-7fffeff1e000 r--p 00000000 103:05 6196764 libLLVM-10-rust-1.47.0-nightly.so
7fffeff1e000-7ffff286e000 r-xp 017af000 103:05 6196764 libLLVM-10-rust-1.47.0-nightly.so
7ffff286e000-7ffff2c23000 r--p 040ff000 103:05 6196764 libLLVM-10-rust-1.47.0-nightly.so
7ffff2c23000-7ffff2dbc000 rw-p 044b4000 103:05 6196764 libLLVM-10-rust-1.47.0-nightly.so
7ffff2dbc000-7ffff2e11000 rw-p 00000000 00:00 0 

At this location we find:

000000000464e2e0 u    DO .bss   0000000000000008  LLVM_10     _ZNSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE2idE

With the libLLVM-10-rust-1.46.0-stable.so the value at libstdc++.so.6 relative address 0x37b678 is 0x7fffe8af1308. Memory map around that point:

7fffe8770000-7fffe88e2000 r-xp 00000000 103:05 2097197 libstdc++.so.6.0.21
7fffe88e2000-7fffe8ae2000 ---p 00172000 103:05 2097197 libstdc++.so.6.0.21
7fffe8ae2000-7fffe8aec000 r--p 00172000 103:05 2097197 libstdc++.so.6.0.21
7fffe8aec000-7fffe8aee000 rw-p 0017c000 103:05 2097197 libstdc++.so.6.0.21
7fffe8aee000-7fffe8af2000 rw-p 00000000 00:00 0 

At this location we find:

0000000000381308 u    DO .bss   0000000000000008  GLIBCXX_3.4 _ZNSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE2idE

So you can see that since the build host was upgraded to a newer toolchain, this global variable has been bound incorrectly to a symbol of the same name but different version.

With my patched libLLVM-11-rust-1.48.0-nightly.so, the value at libstdc++.so.6 relative address 0x37b678 is 0x7ff189912308. Memory map around that point:

0x7ff189591000     0x7ff189703000   0x172000        0x0 libstdc++.so.6.0.21
0x7ff189703000     0x7ff189903000   0x200000   0x172000 libstdc++.so.6.0.21
0x7ff189903000     0x7ff18990d000     0xa000   0x172000 libstdc++.so.6.0.21
0x7ff18990d000     0x7ff18990f000     0x2000   0x17c000 libstdc++.so.6.0.21
0x7ff18990f000     0x7ff189913000     0x4000        0x0 

Which again is 0x381308 relative in libstdc++.so.6.0.21 (same as with libLLVM-10-rust-1.46.0-stable.so above).

What is stored at _ZNSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE2idE? It's supposed to be a unique index into an array of facets, generated at module initialization time. However, if the symbol is bound incorrectly, instead of assigning a fresh unique index the code will just reuse the previous index from the other instance of libstdc++. Since the indices are shared but the array is not, this results in bad things. This then all ties back to the __check_facet code @cuviper was looking at before.


All this is to say, things were certainly going wrong already with LLVM 10 after the build system upgrade, we just weren't observing it. Perhaps due to a different number of ::id symbols being shared between the two libstdc++ instances the relevant indices happened to line up.

I stand by my original point that all problems are caused by the incorrect sharing of the unique symbols that have the same name but different version. For regular global and weak symbols there are no problems (i.e. I've not observed the dynamic loader linking them incorrectly). There are no problems for those symbols because they're versioned, so there shouldn't be any actual overlap. It's totally fine to use multiple instances of libstdc++ in the same address space as long as the symbols don't clash (e.g. using different symbol versioning).

Another solution that hasn't been proposed yet is to load the proc macro library with dlmopen(LM_ID_NEWLM, ...). I haven't tested if this fixes the incorrect sharing of unique symbols.

@jethrogb @cuviper Would either of you be up for opening a PR against master in the next couple weeks that would fix this? It sounds like you've both got some context on this.

If not, I am going to open a PR similar to #78148 which disables ThinLTO on master as well since it's not a big regression in performance.

I'd be happy to help but first we need to narrow down the solution we'd want to go for.

I think the options are still the ones laid out in https://github.com/rust-lang/rust/issues/76980#issuecomment-704725529, right?

From those I think option (1) is probably the better one, since even just LLVM symbols would likely still conflict with other LLVMs that the proc macro might want to depend on (e.g., if it wants a more recent or older LLVM, perhaps for iOS bitcode or something like that). I would prefer an upstream solution, I don't think manually patching binaries generated by LLVM's build process is a good idea (seems quite brittle). I am unopposed to the -fno-gnu-unique flag presuming it is sufficiently compatible that it would work on older compilers and clang/gcc both; mostly if it works out on CI that's probably good enough, but since we eventually want ThinLTO with LLVM on macOS and Windows as well it seems like being future-proof with this would be good.

ping @rust-lang/wg-llvm : is there anyone able to assist with investigating what change between LLVM 10 and LLVM 11 actually caused this, as described in https://github.com/rust-lang/rust/issues/76980#issuecomment-704725529

@pnkfelix I think that's a red herring, see https://github.com/rust-lang/rust/issues/76980#issuecomment-705800997

@jethrogb Are there any potential downsides to passing -fno-gnu-unique when compiling LLVM? It seems like out of the options we have available that's the easiest and least invasive one, right?

I'm not aware of any downsides, but I also don't know what the purpose is of this unique type of symbols and why GCC added this special class. It is now my preferred option as well.

I'm hoping I can coerce x.py to use that flag sometime soon. Happy to mentor someone on that as well.

Do we need that flag in the build of libstdc++ as well? (Feasible since we do build our own toolchain for dist...)

How does the statically-linked version of libLLVM get built? I ran the dist-x86_64-linux builder but that results in a dynamically-linked libstdc++.

The GCC version in the build container is new enough to generate unique symbols but too old to have -fno-gnu-unique :sob:

PR: #78946

Reopening as E-needs-test, but dropping the regression label. See discussion on #78946 for some context there.

Not opened yet.

Was this page helpful?
0 / 5 - 0 ratings