Describe the bug
llvmPackages_11 produces some executables that segmentation fault on use and other binaries that appear to work but compile executables that segmentation fault themselves. Can someone please test if they can reproduce this? The current build running on https://hydra.nixos.org/job/nixos/trunk-combined/tested appears to have this issue. Programs built with this will build successfully but only fail upon being executed. I have not tested if this is limited to llvmPackages_11 or if other versions are affected.
To Reproduce
$ git checkout master
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/tmp/nix-shell-22937-0/rc: line 1: 22984 Segmentation fault (core dumped) ld
$ git checkout upstream/nixos-unstable
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/p792j5f44l3f0xi7ai5jllwnxqwnka88-binutils-2.31.1/bin/ld: no input files
master is currently 662d6a9cb1aaeb4081d75485b218f80b7b8c8727
upstream/nixos-unstable is currently 24c9b05ac53e422f1af81a156f1fd58499eb27fb
Additional context
I have been trying to get Chromium built with LLD to run for the past two days wondering why my changes always cause it to segmentation fault despite the fact they should work.
Notify maintainers
@lovek323
@7c6f434c
@dtzWill
@primeos
@ggreif
@DieGoldeneEnte
Metadata
"x86_64-linux"
Linux 5.8.14, NixOS, 21.03pre246543.24c9b05ac53 (Okapi)
yes
yes
nix-env (Nix) 2.3.7
""
"nixos-21.03pre246543.24c9b05ac53"
/nix/var/nix/profiles/per-user/root/channels/nixos
git bisect yields:
30286ebcc178aec1fb3797ba3ec88cf75feb282b is the first bad commit
https://github.com/NixOS/nixpkgs/pull/95089
@edolstra
@ma27
Current guess: https://sourceware.org/bugzilla/show_bug.cgi?id=26534
https://en.wikipedia.org/wiki/FMA_instruction_set:
FMA4 is supported in AMD processors starting with the Bulldozer architecture. FMA4 was performed in hardware before FMA3 was. Support for FMA4 was removed since Zen 1.
What CPUs does https://hydra.nixos.org/ use? Does it automatically deploy to nixos-unstable? This might get past testing.
https://nixos.wiki/wiki/Nix_channels:
The nixos.org server has a cronjob for which nixos-channel-scripts are executed and poll for the newest jobset that satisfies the above two conditions and trigger a channel update.
I am going to risk being a little obnoxious to prevent nixos-unstable from being broken.
First of all, I think you've meant @edolstra
not @eelco
, please make sure you're using the right github handles for maintainers (those can be found in maintainers/maintainer-list.nix
.
What CPUs does https://hydra.nixos.org/ use? Does it automatically deploy to nixos-unstable? This might get past testing.
nixos-unstable
gets bumped as soon as a critical set of tests is successfully built on the CI infrastructure[1]. Unfortunately it's a known issue that we sometimes stumble upon problems related to the required instruction sets.
Current guess: https://sourceware.org/bugzilla/show_bug.cgi?id=26534
Will prepare a patch this weekend which fixes the package, thanks for the link!
I am going to risk being a little obnoxious to prevent nixos-unstable from being broken.
Sorry, I don't understand what you mean.
[1] Explanation can be found at https://howoldis.herokuapp.com/
https://github.com/NixOS/nixpkgs/blob/c06b4077745fe1afa3a7e3e3d4504e0bd35b414c/pkgs/development/libraries/glibc/common.nix#L237
https://github.com/NixOS/nixpkgs/blob/c06b4077745fe1afa3a7e3e3d4504e0bd35b414c/maintainers/maintainer-list.nix#L27
Sorry, handle != github is not obvious.
If I understand correctly, https://hydra.nixos.org/build/128667841 contains this issue and will cause nixos-unstable to be broken. I changed the title to hopefully get someone to cancel the build before it hits production and breaks many systems.
Can't reproduce (see output below, I get the same ld
). What path does your failing ld have (the command I gave doesn't really return what I wanted :/)?
Also note, I have a CPU with fma and no fma4:
$ grep fma /proc/cpuinfo | uniq
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
$ git status
HEAD detached at 662d6a9cb1a
$ nix-build . -A llvmPackages_11.bintools
these derivations will be built:
/nix/store/bv37icm9s803q7farwcr7l03wqm54dpw-llvm-binutils-11.0.0.drv
building '/nix/store/bv37icm9s803q7farwcr7l03wqm54dpw-llvm-binutils-11.0.0.drv'...
/nix/store/08flkad738j47qp9bz2gvvaj0j364nc8-llvm-binutils-11.0.0
$ result/bin/ld
ld: error: no input files
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/hiwz81i1g3fn3p0acjs042a4h5fri6dh-binutils-2.31.1/bin/ld: no input files
Metadata:
"x86_64-linux"
Linux 5.8.11, NixOS, 21.03pre244932.84d74ae9c9c (Okapi)
yes
yes
nix-env (Nix) 2.3.7
""
"nixos-21.03pre244932.84d74ae9c9c"
/nix/var/nix/profiles/per-user/root/channels/nixos
EDIT: the command I gave doesn't really return what I wanted :/
@DieGoldeneEnte well, it's not reproducible because your system is still on glibc 2.31.
$ git checkout 662d6a9cb1aaeb4081d75485b218f80b7b8c8727
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"'
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
My system is still on glibc 2.31:
nix-store --query --requisites /run/current-system | rg glibc
/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
/nix/store/fgn3sih5vi7543jcw389a7qqax8nwkhz-glibc-2.31-bin
/nix/store/1xpr86xg998h5acn4zqrx58xjdjqnnds-glibc-locales-2.31
/nix/store/4wy9j24psf9ny4di3anjs7yk2fvfb0gq-glibc-2.31-dev
/nix/store/62m4vwphy3d6ayrs7jb5lw0v6bx45p0r-glibc-iconv-2.31
/nix/store/7k3vlmpvzjqrc7lbmz1csbhg5d1rn4fw-glibc-2.31
/nix/store/0nn11qvv1y9agkhiaaw5fhy2dpakp7km-glibc-multi-2.31
/nix/store/5gw82lwj14y3dara9i10x0r6yy43v3xp-glibc-2.31-bin
/nix/store/7isv255s5h9r4gbny93cgsl5h1msk02s-glibc-2.31-dev
/nix/store/f0brxzf4wdgcl54d1calpxj7k2yfmib3-glibc-multi-2.31-dev
/nix/store/skn0nnan2jgkcr0kkpn6k3nc2zfrakmd-glibc-multi-2.31-bin
So we use the 2.31 upstream _branch_? It's a bit ugly patching but OK I guess; precedent+explanation: 51cf42ad0.
⬆️ PR instead of pushing directly, so that people have time to react. With current Hydra speed it will take very long anyway.
I just tested:
$ git checkout master
$ git pull upstream master
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/tmp/nix-shell-25687-0/rc: line 1: 25732 Segmentation fault (core dumped) ld
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | rg glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev
So maybe the FMA issue is not causing this?
Well, SIGILL should not be reported as "Segmentation fault".
I get the same output with the patch except for the segfault (also on an older Intel-based machine). Can you also try:
nix-shell --pure -A busybox --run ld
This reduces the amount of packages that need to be rebuild (and at least for me leads to executing the same binary):
$ git status
HEAD detached at upstream/master
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A busybox --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A busybox --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev
$
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev
Both llvmPackages_10 and llvmPackages on master segmentation fault as well.
@DieGoldeneEnte:
nixos-rebuild boot --upgrade
so we are on the same kernel version, Linux 5.8.14.hardware.cpu.amd.updateMicrocode
or hardware.cpu.intel.updateMicrocode
are true.git status
say nothing to commit, working tree clean
?CPU microarchitecture is Zen2.
I did the upgrade yesterday after I started trying to reproduce.
I omitted the rest of git status
, since I had some notes (pure text-files) in the nixpkgs-root, I moved them for now.
$ uname -r
5.8.14
$ nixos-option hardware.cpu.amd.updateMicrocode
Value:
true
Default:
false
Type:
"boolean"
Description:
''
Update the CPU microcode for AMD processors.
''
Declared by:
[ "/nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/hardware/cpu/amd-microcode.nix" ]
Defined by:
[ "/etc/nixos/machine/IDEFIX-NIX.nix" ]
$ git status
HEAD detached at upstream/master
nothing to commit, working tree clean
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev
I ran nix-store --verify --check-contents --repair
without issue and still segmentation fault. Can some other people please test and list their CPU microarchitecture?
Both llvmPackages_10 and llvmPackages on master segmentation fault as well.
I am pretty sure the problem is not with llvm, since ld
is not in clang-unwrapped, only in wrapped clang and llvm-binutils (and only llvm-binutils uses lld
, wrapped-clang uses ld
from binutils which doesn't depend on llvm).
That's why think this should be enough to test:
nix-shell --pure -A busybox --run ld
It executes the same binary (for me) as the llvmPackages_11.clang-unwrapped
one.
EDIT: llvm-version doesn't matter (getting the executable for versions 5 to 11 on master):
$ for v in {5..11} ; do nix-shell --pure -A llvmPackages_$v.clang-unwrapped --run 'realpath "$(command -v ld)"'; done
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
Since a build with glibc 2.32 got past Hydra I am now no longer able to upgrade my system:
$ sudo nixos-rebuild boot --upgrade
unpacking channels...
Segmentation fault
I can't even rebuild anymore!
$ sudo nixos-rebuild boot
Segmentation fault
$ nix-store --query --requisites /run/current-system | rg glibc
/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
/nix/store/fgn3sih5vi7543jcw389a7qqax8nwkhz-glibc-2.31-bin
/nix/store/7k3vlmpvzjqrc7lbmz1csbhg5d1rn4fw-glibc-2.31
/nix/store/1xpr86xg998h5acn4zqrx58xjdjqnnds-glibc-locales-2.31
/nix/store/4wy9j24psf9ny4di3anjs7yk2fvfb0gq-glibc-2.31-dev
/nix/store/62m4vwphy3d6ayrs7jb5lw0v6bx45p0r-glibc-iconv-2.31
/nix/store/0nn11qvv1y9agkhiaaw5fhy2dpakp7km-glibc-multi-2.31
/nix/store/5gw82lwj14y3dara9i10x0r6yy43v3xp-glibc-2.31-bin
/nix/store/7isv255s5h9r4gbny93cgsl5h1msk02s-glibc-2.31-dev
/nix/store/f0brxzf4wdgcl54d1calpxj7k2yfmib3-glibc-multi-2.31-dev
/nix/store/skn0nnan2jgkcr0kkpn6k3nc2zfrakmd-glibc-multi-2.31-bin
You should be able to rollback the channel with:
# nix-channel --rollback
You could maybe also try to start ld in gdb and look at the backtrace (don't know how much info you can get from it).
EDIT: Not the executable you get from realpath above just points you to the wrapper, you need the executable at the bottom (/nix/store/p792j5f44l3f0xi7ai5jllwnxqwnka88-binutils-2.31.1/bin/ld
for me).
I figured it out. Setting environment.memoryAllocator.provider = "scudo";
breaks with glibc 2.32. I did not think that it would matter since I was using --pure
.
Well... I don't think this preloading we have in environment.memoryAllocator
is done very well. Looking at ldd result/lib/linux/libclang_rt.scudo-x86_64.so
, there's even libstdc++, so I suspect you force-override all these libraries in the packages you run. Of course, in usual cases they will be the same but not always – Nix people are used to mixing versions sometimes. IMHO memory allocator overrides are meant to just replace a few particular functions, not whole libc, libstdc++, etc.
EDIT: still, I also don't know why runs in nix-shell --pure
could be affected.
This is a known issue I believe, see #93116. For now the easiest work around is when you run into it is mv /etc/ld-nix.so.preload /etc/ld-nix.so.preload.tmp
. I am not sure if it is an issue with preloading in environment.memoryAllocator
being done badly. Firefox will segfault and so will Chromium when used with any memory allocator other than what they bundle. (Unless you select build flags which disable their memory allocators they use.) I think dynamic loading is naturally fragile and you should expect weird things.
As for why nix-shell --pure
is affected, doing nix-shell --pure -p hello --run 'env | grep glibc'
shows it loads glibc which has the patch that looks for /etc/ld-nix.so.preload
even in nix-shell --pure
.
One could possible expand the patch to check IN_NIX_SHELL
and if it is set to pure
do not load /etc/ld-nix.so.preload
.
One could possible expand the patch to check IN_NIX_SHELL and if it is set to pure do not load /etc/ld-nix.so.preload.
possible but I'm not really convinced. It would add even more magic and would it be a significant improvement?
If "pure" shells preserve $HOME
, why not global preloads? If someone wants a hardened memory allocator _system-wide_, should their development environments be an exception? I don't know for sure. Perhaps nix-shell
is trying to serve too different purposes by the same approach 🤷🏽 and in that situation I'd personally prefer more simplicity.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/warning-unknown-setting-extra-sandbox-paths/9836/4
Oh god, just spent 4 hours wrestling with half the binaries on my system segfaulting because of this after an upgrade to 20.09 (and the hardened profile). Removing /etc/ld-nix.so.preload
fixed it, thanks so much @S-NA!
The really bad thing that happened however is that if I tried to boot back into a previous system version, init itself segfaulted! I don't understand how adding a boot entry for a new boot entry corrupted an existing boot entry but it did. This made it extremely hard to recover anything, does someone understand how this happened?
I suspect that during boot the init bash script read /etc/ld-nix.so.preload
that was leftover from the previous boot, even though the system I was booting was _not_ supposed to have that file. And that made bash
segfault. So this prevented me from booting any system at all, except the one that actually expected /etc/ld-nix.so.preload
to be there. But that one had plenty of other segfaults everywhere, because my userspace programs are from nixpkgs-unstable (currently glibc 2.32), whereas my system was on nixos-20.09 (currently glibc 2.31).
Adding a report: both bash
and tailscaled
(from pkg tailscale
) pulled from unstable segfault on a 20.09 hardened NixOS. Setting environment.memoryAllocator.provider = "libc"
(undoing the use of scudo
) fixes it. This occurs on an Intel system, a Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
DigitalOcean VM. CPUID reports support for fma
, but from googling that seems to be FMA3, with FMA4 not supported.
Just to let you know that environment.memoryAllocator.provider = "scudo"
is known to completely break NixOS when the glibc
version is bumped, due to /etc/ld-nix.so.preload
, as discussed in this old PR of mine trying to mitigate this problem: https://github.com/NixOS/nixpkgs/pull/96289
Why does this option even exist? A global LD_PRELOAD sounds like a good way to break your system...
There's a fat warning in the description at least 🤷🏽
Maybe it should be removed from the hardened profile in that case. As it stands, it's encouraging people to run an unsafe config, afaict?
@danderson well, depends on your definition of safety. Programs built just right work, and programs where applying a memory protection measure fails, fail to start — depending on whether you threat model weights lack of service or potentially exploitable software as a larger problem, it might be good or bad I guess?
What really bothers me about this option is that it broke NixOS's promise that rollbacks can always undo mistakes. In my case I ended up with every boot entry except one segfaulting during init. So I would really prefer this got removed from the hardened profile because I didn't even see any warning.
cc @emilazy @joachifm who are listed as maintainers in the hardened profile.
Do people actually use the hardened profile? I see it more as a guide on security options available rather than something that can be used practically. I will submit a pull request to remove Scudo from it, but it would be better just to make the environment.memoryAllocator.provider = "scudo";
option do nothing until it works.
Many of the measures in the hardened profile were never intended for normal deployments, especially not for desktop or development machines. If a measure improves security "for free" then it should be on by default, the hardened profile was intended for those things were there are real trade-offs. I had in mind that potential users would carefully consider each measure for their particular deployment scenario and tune the toggles accordingly. Simply including the profile wholesale without modification will break many useful things and trash performance, so you shouldn't use it unless that degree of feature and performance degradation is actually worth it (not the case for most users, I reckon).
I will submit a pull request to remove Scudo from it
Never mind, I fixed the segmentation faults; pull request incoming.
Most helpful comment
Many of the measures in the hardened profile were never intended for normal deployments, especially not for desktop or development machines. If a measure improves security "for free" then it should be on by default, the hardened profile was intended for those things were there are real trade-offs. I had in mind that potential users would carefully consider each measure for their particular deployment scenario and tune the toggles accordingly. Simply including the profile wholesale without modification will break many useful things and trash performance, so you shouldn't use it unless that degree of feature and performance degradation is actually worth it (not the case for most users, I reckon).