Rust: 1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28 or older.

Created on 5 Jun 2019  路  20Comments  路  Source: rust-lang/rust

Originally reported as #60593, and appears to be caused by #54592, this is my reduction:

  • common.rs
pub static STATIC: u8 = 0;
pub fn get() -> *const u8 { &STATIC }
  • plugin.rs
extern crate common;
#[no_mangle]
pub fn run() {
    println!("plugin: {:p}", common::get as *const ());
    assert_eq!(common::get(), &common::STATIC as *const _);
}
  • driver.rs
extern crate common;
fn main() {
    println!("driver: {:p}", common::get as *const ());
    // You can comment this line to make it work.
    assert_eq!(common::get(), &common::STATIC as *const _);
    unsafe {
        extern "C" {
            fn dlopen(filename: *const u8, flag: i32) -> *mut u8;
            fn dlsym(handle: *mut u8, symbol: *const u8) -> *const u8;
        }
        std::mem::transmute::<*const u8, fn()>(dlsym(
            dlopen(b"./libplugin.so\0".as_ptr(), 1),
            b"run\0".as_ptr(),
        ))()
    }
}

How to run it:

RELEASE=nightly-2018-10-11
rustup toolchain add $RELEASE

echo 'System toolchain versions:'
ldd --version | head -n1 | sed 's/^ldd //'
ld.bfd --version | head -n1
ld.gold --version | head -n1
echo

rustc +$RELEASE common.rs --crate-type=lib
# You can change this from dylib to cdylib to make it work.
rustc +$RELEASE plugin.rs --crate-type=dylib -L.
# You can uncomment -Z plt=yes to make it work.
rustc +$RELEASE driver.rs -L. # -Z plt=yes
./driver

I was able to try this out with NixOS 18.03 and 18.09 releases, via:

NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.03.tar.gz \
    nix-shell -p stdenv --run ./test.sh
NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.09.tar.gz \
    nix-shell -p stdenv --run ./test.sh


Example of failure (using a NixOS 18.03 toolchain):

info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'

  nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)

System toolchain versions:
(GNU libc) 2.26
GNU ld (GNU Binutils) 2.28.1
GNU gold (GNU Binutils 2.28.1) 1.14

driver: 0x55d244a55820
plugin: 0x55d244a55820
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0x55d244a9e7ac`,
 right: `0x7f87a69b3f19`', plugin.rs:5:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Example of success (by switching to NixOS 18.09):

info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'

  nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)

System toolchain versions:
(GNU libc) 2.27
GNU ld (GNU Binutils) 2.30
GNU gold (GNU Binutils 2.30) 1.15

driver: 0x5649e7dbf4f0
plugin: 0x7fca6df40930

Other ways to get it to succeed:

  • use a Rust version before #54592 (e.g. RELEASE=nightly-2018-10-11)
  • pass -Z plt=yes when compiling driver

    • restores pre-#54592 behavior

  • change the crate type of plugin to cdylib (instead of dylib)

    • presumably common::get is private in that case?

    • not a general fix AFAIK, but I'm not sure how to repro with cdylib

  • don't call common::get in driver

    • common::get won't get looked up early, which means driver's copy won't override plugin's (within the context of affected toolchains)

  • set RTLD_DEEPBIND when calling dlopen

    • common::get from driver is ignored when looking up the use from plugin

cc @alexcrichton @rust-lang/compiler

O-linux P-high T-compiler regression-from-stable-to-stable

Most helpful comment

I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro,

I think 2.27 only added a way to configure the default:

* Add a configure option --enable-relro to decide whether -z relro should
  be enabled in ELF linker by default.  Default to yes for all Linux
  targets except FRV, HPPA, IA64 and MIPS.  Note that -z relro can increase
  disk and memory size.

But -z relro has existed much longer -- I see references way back in ChangeLog-2004.

All 20 comments

Looks like relro support was added in GNU binutils 2.28, so maybe it had some bugs?
There isn't any 2.29 binutils build on the official NixOS build servers so I can't easily test that version.

How confident are we that PLT entirely negates this issue other than just accidentally preventing it from triggering in this particular case?

To me it seems that exactly what happens with PLT here is prevention and the core issue still remains regardless of whether PLT is enabled or not.

Not opposed to flipping back the default for now, though.

I agree, I have no idea what the actual bug is yet, just what affects this specific reduction.
Note that in all cases that work, common::get is not deduplicated, as opposed to both common::get and common::STATIC being deduplicated.

What I suspect is happening with -Z plt=yes is bypassing a buggy linker feature.
But I'd have to bisect binutils to prove that. Which I might still do, I just hope it's easier than glibc.

@eddyb an easy way to (dis-)prove linker fault is to attempt linking with lld.

triage: we need to discuss this at today's meeting before I'll be comfortable assigning a priority to it.

discussed in triage meeting. P-high. Removing nomination tag. Assigning to @nagisa and myself.

Linking with lld works. Using clang instead of gcc as a ld driver also fails. This does indeed point all the fingers to ld. I鈥檒l try to close down on the ld version now.

This issue is no longer present with binutils 2.29.0.

@nagisa @alexcrichton Is there precedent for checking the version of the linker to work around bugs?
I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro, do we check for that somehow, and fallback, or are we asking for relro in a way that gets ignored?

At least 2.27 also fails so not relro related. Waiting for 2.26 to build right now.

2.26 and 2.23.1 also fail (although 2.23.1 only prints the addresses and exits with exit code 0 before calling into the plugin), so it is probably safe to say that this issue is present in all relevant versions of the GNU toolchain before 2.29.

@nagisa Oh, right, I forgot, the bug was fixed after 2.28 but I never tested earlier versions.

We have only one rudimentary check for support of a c compiler flag (-pie I think) but other than that we don't have any precedence for c compiler or linker version checking in rustc itself

I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro,

I think 2.27 only added a way to configure the default:

* Add a configure option --enable-relro to decide whether -z relro should
  be enabled in ELF linker by default.  Default to yes for all Linux
  targets except FRV, HPPA, IA64 and MIPS.  Note that -z relro can increase
  disk and memory size.

But -z relro has existed much longer -- I see references way back in ChangeLog-2004.

pre-triage stray thought: While I worry about trying to change compiler behavior based on a dynamically detected linker version, I would have fewer objections to use such detection just to drive the emission of a diagnostic warning. Where said diagnostic tells people to take one of the aforementioned steps to work around the bug.

nominating for discussion at T-compiler meeting.

This has been discussed briefly during the meeting last week, but no real conclusion was arrived at.

Some points made during aforementioned meeting that I thought were worth transcribing here:

  • nagisa: "the warning message based on just linker version would lead to false positives more often than not, but I don鈥檛 know of any other solution that鈥檚 not reenabling plt by default."
  • pnkfelix: " is there particular pattern of codegen we could associate the diagnostic with, to try to lower the false positive rate?"

    • nagisa: "Maybe? Ultimately to trigger that particular bug a dynamic library needs to be loaded at runtime (via e.g. dlopen)"

  • nagisa: "Most relevant case where this almost always happen are compiler plugins"

I haven't taken any time to look into this since I last commented here (about two months ago). Its tempting to downgrade this to P-medium with justification "this is a bug in ld that is fixed in a later version." But I don't want to do that without at least surveying the current popular GNU-based distributions.

I tried landing a lint to detect this case (PR #66839) but no one seems terribly enthused about that approach, at least not at this time. (Maybe it would have gotten more traction sooner after @eddyb found the bug.)

In that PR, we did get a survey of what version of ld is used in popular GNU distributions, taken from https://repology.org/project/binutils/versions

So, I will just close this issue as a bug in GNU ld that is fixed in later versions and thus not something we are going to workaround.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nikomatsakis picture nikomatsakis  路  259Comments

withoutboats picture withoutboats  路  211Comments

nikomatsakis picture nikomatsakis  路  268Comments

Mark-Simulacrum picture Mark-Simulacrum  路  681Comments

aturon picture aturon  路  417Comments