Originally reported as #60593, and appears to be caused by #54592, this is my reduction:
common.rspub static STATIC: u8 = 0;
pub fn get() -> *const u8 { &STATIC }
plugin.rsextern crate common;
#[no_mangle]
pub fn run() {
println!("plugin: {:p}", common::get as *const ());
assert_eq!(common::get(), &common::STATIC as *const _);
}
driver.rsextern crate common;
fn main() {
println!("driver: {:p}", common::get as *const ());
// You can comment this line to make it work.
assert_eq!(common::get(), &common::STATIC as *const _);
unsafe {
extern "C" {
fn dlopen(filename: *const u8, flag: i32) -> *mut u8;
fn dlsym(handle: *mut u8, symbol: *const u8) -> *const u8;
}
std::mem::transmute::<*const u8, fn()>(dlsym(
dlopen(b"./libplugin.so\0".as_ptr(), 1),
b"run\0".as_ptr(),
))()
}
}
How to run it:
RELEASE=nightly-2018-10-11
rustup toolchain add $RELEASE
echo 'System toolchain versions:'
ldd --version | head -n1 | sed 's/^ldd //'
ld.bfd --version | head -n1
ld.gold --version | head -n1
echo
rustc +$RELEASE common.rs --crate-type=lib
# You can change this from dylib to cdylib to make it work.
rustc +$RELEASE plugin.rs --crate-type=dylib -L.
# You can uncomment -Z plt=yes to make it work.
rustc +$RELEASE driver.rs -L. # -Z plt=yes
./driver
I was able to try this out with NixOS 18.03 and 18.09 releases, via:
NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.03.tar.gz \
nix-shell -p stdenv --run ./test.sh
NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.09.tar.gz \
nix-shell -p stdenv --run ./test.sh
Example of failure (using a NixOS 18.03 toolchain):
info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'
nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)
System toolchain versions:
(GNU libc) 2.26
GNU ld (GNU Binutils) 2.28.1
GNU gold (GNU Binutils 2.28.1) 1.14
driver: 0x55d244a55820
plugin: 0x55d244a55820
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `0x55d244a9e7ac`,
right: `0x7f87a69b3f19`', plugin.rs:5:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
Example of success (by switching to NixOS 18.09):
info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'
nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)
System toolchain versions:
(GNU libc) 2.27
GNU ld (GNU Binutils) 2.30
GNU gold (GNU Binutils 2.30) 1.15
driver: 0x5649e7dbf4f0
plugin: 0x7fca6df40930
Other ways to get it to succeed:
RELEASE=nightly-2018-10-11)-Z plt=yes when compiling driverplugin to cdylib (instead of dylib)common::get is private in that case?cdylibcommon::get in drivercommon::get won't get looked up early, which means driver's copy won't override plugin's (within the context of affected toolchains)RTLD_DEEPBIND when calling dlopencommon::get from driver is ignored when looking up the use from plugincc @alexcrichton @rust-lang/compiler
Looks like relro support was added in GNU binutils 2.28, so maybe it had some bugs?
There isn't any 2.29 binutils build on the official NixOS build servers so I can't easily test that version.
How confident are we that PLT entirely negates this issue other than just accidentally preventing it from triggering in this particular case?
To me it seems that exactly what happens with PLT here is prevention and the core issue still remains regardless of whether PLT is enabled or not.
Not opposed to flipping back the default for now, though.
I agree, I have no idea what the actual bug is yet, just what affects this specific reduction.
Note that in all cases that work, common::get is not deduplicated, as opposed to both common::get and common::STATIC being deduplicated.
What I suspect is happening with -Z plt=yes is bypassing a buggy linker feature.
But I'd have to bisect binutils to prove that. Which I might still do, I just hope it's easier than glibc.
@eddyb an easy way to (dis-)prove linker fault is to attempt linking with lld.
triage: we need to discuss this at today's meeting before I'll be comfortable assigning a priority to it.
discussed in triage meeting. P-high. Removing nomination tag. Assigning to @nagisa and myself.
Linking with lld works. Using clang instead of gcc as a ld driver also fails. This does indeed point all the fingers to ld. I鈥檒l try to close down on the ld version now.
This issue is no longer present with binutils 2.29.0.
https://sourceware.org/bugzilla/show_bug.cgi?id=20995 may be related.
@nagisa @alexcrichton Is there precedent for checking the version of the linker to work around bugs?
I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro, do we check for that somehow, and fallback, or are we asking for relro in a way that gets ignored?
At least 2.27 also fails so not relro related. Waiting for 2.26 to build right now.
2.26 and 2.23.1 also fail (although 2.23.1 only prints the addresses and exits with exit code 0 before calling into the plugin), so it is probably safe to say that this issue is present in all relevant versions of the GNU toolchain before 2.29.
@nagisa Oh, right, I forgot, the bug was fixed after 2.28 but I never tested earlier versions.
We have only one rudimentary check for support of a c compiler flag (-pie I think) but other than that we don't have any precedence for c compiler or linker version checking in rustc itself
I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports
relro,
I think 2.27 only added a way to configure the default:
* Add a configure option --enable-relro to decide whether -z relro should
be enabled in ELF linker by default. Default to yes for all Linux
targets except FRV, HPPA, IA64 and MIPS. Note that -z relro can increase
disk and memory size.
But -z relro has existed much longer -- I see references way back in ChangeLog-2004.
pre-triage stray thought: While I worry about trying to change compiler behavior based on a dynamically detected linker version, I would have fewer objections to use such detection just to drive the emission of a diagnostic warning. Where said diagnostic tells people to take one of the aforementioned steps to work around the bug.
nominating for discussion at T-compiler meeting.
This has been discussed briefly during the meeting last week, but no real conclusion was arrived at.
Some points made during aforementioned meeting that I thought were worth transcribing here:
I haven't taken any time to look into this since I last commented here (about two months ago). Its tempting to downgrade this to P-medium with justification "this is a bug in ld that is fixed in a later version." But I don't want to do that without at least surveying the current popular GNU-based distributions.
I tried landing a lint to detect this case (PR #66839) but no one seems terribly enthused about that approach, at least not at this time. (Maybe it would have gotten more traction sooner after @eddyb found the bug.)
In that PR, we did get a survey of what version of ld is used in popular GNU distributions, taken from https://repology.org/project/binutils/versions
So, I will just close this issue as a bug in GNU ld that is fixed in later versions and thus not something we are going to workaround.
Most helpful comment
I think 2.27 only added a way to configure the default:
But
-z relrohas existed much longer -- I see references way back inChangeLog-2004.