Nixpkgs: environment.memoryAllocator may cause crashes

Created on 17 Oct 2020 · 40Comments · Source: NixOS/nixpkgs

Describe the bug
llvmPackages_11 produces some executables that segmentation fault on use and other binaries that appear to work but compile executables that segmentation fault themselves. Can someone please test if they can reproduce this? The current build running on https://hydra.nixos.org/job/nixos/trunk-combined/tested appears to have this issue. Programs built with this will build successfully but only fail upon being executed. I have not tested if this is limited to llvmPackages_11 or if other versions are affected.

To Reproduce

$ git checkout master
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/tmp/nix-shell-22937-0/rc: line 1: 22984 Segmentation fault      (core dumped) ld

$ git checkout upstream/nixos-unstable
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/p792j5f44l3f0xi7ai5jllwnxqwnka88-binutils-2.31.1/bin/ld: no input files

master is currently 662d6a9cb1aaeb4081d75485b218f80b7b8c8727
upstream/nixos-unstable is currently 24c9b05ac53e422f1af81a156f1fd58499eb27fb

Additional context
I have been trying to get Chromium built with LLD to run for the past two days wondering why my changes always cause it to segmentation fault despite the fact they should work.

Notify maintainers
@lovek323
@7c6f434c
@dtzWill
@primeos
@ggreif
@DieGoldeneEnte

Metadata

system: "x86_64-linux"
host os: Linux 5.8.14, NixOS, 21.03pre246543.24c9b05ac53 (Okapi)
multi-user?: yes
sandbox: yes
version: nix-env (Nix) 2.3.7
channels(user): ""
channels(root): "nixos-21.03pre246543.24c9b05ac53"
nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

bug

Source

TredwellGit

Most helpful comment

Many of the measures in the hardened profile were never intended for normal deployments, especially not for desktop or development machines. If a measure improves security "for free" then it should be on by default, the hardened profile was intended for those things were there are real trade-offs. I had in mind that potential users would carefully consider each measure for their particular deployment scenario and tune the toggles accordingly. Simply including the profile wholesale without modification will break many useful things and trash performance, so you shouldn't use it unless that degree of feature and performance degradation is actually worth it (not the case for most users, I reckon).

joachifm on 17 Nov 2020

👍4

All 40 comments

git bisect yields:
30286ebcc178aec1fb3797ba3ec88cf75feb282b is the first bad commit
https://github.com/NixOS/nixpkgs/pull/95089
@edolstra
@ma27

TredwellGit on 17 Oct 2020

Current guess: https://sourceware.org/bugzilla/show_bug.cgi?id=26534

https://en.wikipedia.org/wiki/FMA_instruction_set:

FMA4 is supported in AMD processors starting with the Bulldozer architecture. FMA4 was performed in hardware before FMA3 was. Support for FMA4 was removed since Zen 1.

TredwellGit on 17 Oct 2020

What CPUs does https://hydra.nixos.org/ use? Does it automatically deploy to nixos-unstable? This might get past testing.

TredwellGit on 17 Oct 2020

https://nixos.wiki/wiki/Nix_channels:

The nixos.org server has a cronjob for which nixos-channel-scripts are executed and poll for the newest jobset that satisfies the above two conditions and trigger a channel update.

I am going to risk being a little obnoxious to prevent nixos-unstable from being broken.

TredwellGit on 17 Oct 2020

First of all, I think you've meant @edolstra not @eelco, please make sure you're using the right github handles for maintainers (those can be found in maintainers/maintainer-list.nix.

What CPUs does https://hydra.nixos.org/ use? Does it automatically deploy to nixos-unstable? This might get past testing.

nixos-unstable gets bumped as soon as a critical set of tests is successfully built on the CI infrastructure[1]. Unfortunately it's a known issue that we sometimes stumble upon problems related to the required instruction sets.

Current guess: https://sourceware.org/bugzilla/show_bug.cgi?id=26534

Will prepare a patch this weekend which fixes the package, thanks for the link!

I am going to risk being a little obnoxious to prevent nixos-unstable from being broken.

Sorry, I don't understand what you mean.

[1] Explanation can be found at https://howoldis.herokuapp.com/

Ma27 on 17 Oct 2020

https://github.com/NixOS/nixpkgs/blob/c06b4077745fe1afa3a7e3e3d4504e0bd35b414c/pkgs/development/libraries/glibc/common.nix#L237
https://github.com/NixOS/nixpkgs/blob/c06b4077745fe1afa3a7e3e3d4504e0bd35b414c/maintainers/maintainer-list.nix#L27
Sorry, handle != github is not obvious.

If I understand correctly, https://hydra.nixos.org/build/128667841 contains this issue and will cause nixos-unstable to be broken. I changed the title to hopefully get someone to cancel the build before it hits production and breaks many systems.

TredwellGit on 17 Oct 2020

Can't reproduce (see output below, I get the same ld). What path does your failing ld have (the command I gave doesn't really return what I wanted :/)?

Also note, I have a CPU with fma and no fma4:

$ grep fma /proc/cpuinfo | uniq 
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

$ git status
HEAD detached at 662d6a9cb1a
$ nix-build . -A llvmPackages_11.bintools
these derivations will be built:
  /nix/store/bv37icm9s803q7farwcr7l03wqm54dpw-llvm-binutils-11.0.0.drv
building '/nix/store/bv37icm9s803q7farwcr7l03wqm54dpw-llvm-binutils-11.0.0.drv'...
/nix/store/08flkad738j47qp9bz2gvvaj0j364nc8-llvm-binutils-11.0.0
$ result/bin/ld
ld: error: no input files
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/hiwz81i1g3fn3p0acjs042a4h5fri6dh-binutils-2.31.1/bin/ld: no input files

Metadata:

system: "x86_64-linux"
host os: Linux 5.8.11, NixOS, 21.03pre244932.84d74ae9c9c (Okapi)
multi-user?: yes
sandbox: yes
version: nix-env (Nix) 2.3.7
channels(holger): ""
channels(root): "nixos-21.03pre244932.84d74ae9c9c"
nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

EDIT: the command I gave doesn't really return what I wanted :/

DieGoldeneEnte on 17 Oct 2020

@DieGoldeneEnte well, it's not reproducible because your system is still on glibc 2.31.

Ma27 on 17 Oct 2020

$ git checkout 662d6a9cb1aaeb4081d75485b218f80b7b8c8727
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"'
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld

TredwellGit on 17 Oct 2020

My system is still on glibc 2.31:

nix-store --query --requisites /run/current-system | rg glibc
/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
/nix/store/fgn3sih5vi7543jcw389a7qqax8nwkhz-glibc-2.31-bin
/nix/store/1xpr86xg998h5acn4zqrx58xjdjqnnds-glibc-locales-2.31
/nix/store/4wy9j24psf9ny4di3anjs7yk2fvfb0gq-glibc-2.31-dev
/nix/store/62m4vwphy3d6ayrs7jb5lw0v6bx45p0r-glibc-iconv-2.31
/nix/store/7k3vlmpvzjqrc7lbmz1csbhg5d1rn4fw-glibc-2.31
/nix/store/0nn11qvv1y9agkhiaaw5fhy2dpakp7km-glibc-multi-2.31
/nix/store/5gw82lwj14y3dara9i10x0r6yy43v3xp-glibc-2.31-bin
/nix/store/7isv255s5h9r4gbny93cgsl5h1msk02s-glibc-2.31-dev
/nix/store/f0brxzf4wdgcl54d1calpxj7k2yfmib3-glibc-multi-2.31-dev
/nix/store/skn0nnan2jgkcr0kkpn6k3nc2zfrakmd-glibc-multi-2.31-bin

TredwellGit on 17 Oct 2020

So we use the 2.31 upstream _branch_? It's a bit ugly patching but OK I guess; precedent+explanation: 51cf42ad0.

vcunat on 17 Oct 2020

⬆️ PR instead of pushing directly, so that people have time to react. With current Hydra speed it will take very long anyway.

vcunat on 17 Oct 2020

I just tested:

$ git checkout master
$ git pull upstream master
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/tmp/nix-shell-25687-0/rc: line 1: 25732 Segmentation fault      (core dumped) ld
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | rg glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev

So maybe the FMA issue is not causing this?

TredwellGit on 17 Oct 2020

Well, SIGILL should not be reported as "Segmentation fault".

vcunat on 17 Oct 2020

I get the same output with the patch except for the segfault (also on an older Intel-based machine). Can you also try:

nix-shell --pure -A busybox --run ld

This reduces the amount of packages that need to be rebuild (and at least for me leads to executing the same binary):

$ git status
HEAD detached at upstream/master
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A busybox --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A busybox --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev
$ 
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev

DieGoldeneEnte on 17 Oct 2020

Both llvmPackages_10 and llvmPackages on master segmentation fault as well.

@DieGoldeneEnte:

Please nixos-rebuild boot --upgrade so we are on the same kernel version, Linux 5.8.14.
Please make sure hardware.cpu.amd.updateMicrocode or hardware.cpu.intel.updateMicrocode are true.
Just to be sure, does git status say nothing to commit, working tree clean?
What CPU microarchitecture are you using? I am using Zen+.

TredwellGit on 18 Oct 2020

CPU microarchitecture is Zen2.
I did the upgrade yesterday after I started trying to reproduce.
I omitted the rest of git status, since I had some notes (pure text-files) in the nixpkgs-root, I moved them for now.

$ uname -r
5.8.14
$ nixos-option hardware.cpu.amd.updateMicrocode
Value:
true

Default:
false

Type:
"boolean"

Description:
''
  Update the CPU microcode for AMD processors.
''

Declared by:
[ "/nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/hardware/cpu/amd-microcode.nix" ]

Defined by:
[ "/etc/nixos/machine/IDEFIX-NIX.nix" ]
$ git status
HEAD detached at upstream/master
nothing to commit, working tree clean
$ curl https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/100813.patch | git apply -
$ nix-shell --pure -A llvmPackages_11.clang-unwrapped --run ld
/nix/store/xd7753hij54b5qvlpahagkn3801fsbqk-binutils-2.31.1/bin/ld: no input files
$ nix-store --query --requisites "$(nix-shell --pure -A llvmPackages_11.clang-unwrapped --run 'realpath "$(command -v ld)"')" | grep glibc
/nix/store/n047qmjrrrpkdv6lxl5vqja1dmg29na4-glibc-2.32-10
/nix/store/6qyklfxyn9kq22q0fi48iy8lxc1rn1s5-glibc-2.32-10-bin
/nix/store/2q9r0mdwjqkqqn1imy3l65h73dzgqr8p-glibc-2.32-10-dev

DieGoldeneEnte on 18 Oct 2020

I ran nix-store --verify --check-contents --repair without issue and still segmentation fault. Can some other people please test and list their CPU microarchitecture?

TredwellGit on 18 Oct 2020

Both llvmPackages_10 and llvmPackages on master segmentation fault as well.

I am pretty sure the problem is not with llvm, since ld is not in clang-unwrapped, only in wrapped clang and llvm-binutils (and only llvm-binutils uses lld, wrapped-clang uses ld from binutils which doesn't depend on llvm).

That's why think this should be enough to test:

nix-shell --pure -A busybox --run ld

It executes the same binary (for me) as the llvmPackages_11.clang-unwrappedone.

EDIT: llvm-version doesn't matter (getting the executable for versions 5 to 11 on master):

$ for v in {5..11} ; do nix-shell --pure -A llvmPackages_$v.clang-unwrapped --run 'realpath "$(command -v ld)"'; done
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld
/nix/store/h770niqkf08ksgm0n6a271kc21w95b13-binutils-wrapper-2.31.1/bin/ld

DieGoldeneEnte on 18 Oct 2020

Since a build with glibc 2.32 got past Hydra I am now no longer able to upgrade my system:

$ sudo nixos-rebuild boot --upgrade
unpacking channels...
Segmentation fault

TredwellGit on 20 Oct 2020

I can't even rebuild anymore!

$ sudo nixos-rebuild boot
Segmentation fault

$ nix-store --query --requisites /run/current-system | rg glibc
/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
/nix/store/fgn3sih5vi7543jcw389a7qqax8nwkhz-glibc-2.31-bin
/nix/store/7k3vlmpvzjqrc7lbmz1csbhg5d1rn4fw-glibc-2.31
/nix/store/1xpr86xg998h5acn4zqrx58xjdjqnnds-glibc-locales-2.31
/nix/store/4wy9j24psf9ny4di3anjs7yk2fvfb0gq-glibc-2.31-dev
/nix/store/62m4vwphy3d6ayrs7jb5lw0v6bx45p0r-glibc-iconv-2.31
/nix/store/0nn11qvv1y9agkhiaaw5fhy2dpakp7km-glibc-multi-2.31
/nix/store/5gw82lwj14y3dara9i10x0r6yy43v3xp-glibc-2.31-bin
/nix/store/7isv255s5h9r4gbny93cgsl5h1msk02s-glibc-2.31-dev
/nix/store/f0brxzf4wdgcl54d1calpxj7k2yfmib3-glibc-multi-2.31-dev
/nix/store/skn0nnan2jgkcr0kkpn6k3nc2zfrakmd-glibc-multi-2.31-bin

TredwellGit on 20 Oct 2020

You should be able to rollback the channel with:

# nix-channel --rollback

You could maybe also try to start ld in gdb and look at the backtrace (don't know how much info you can get from it).

EDIT: Not the executable you get from realpath above just points you to the wrapper, you need the executable at the bottom (/nix/store/p792j5f44l3f0xi7ai5jllwnxqwnka88-binutils-2.31.1/bin/ld for me).

DieGoldeneEnte on 20 Oct 2020

I figured it out. Setting environment.memoryAllocator.provider = "scudo"; breaks with glibc 2.32. I did not think that it would matter since I was using --pure.

TredwellGit on 20 Oct 2020

Well... I don't think this preloading we have in environment.memoryAllocator is done very well. Looking at ldd result/lib/linux/libclang_rt.scudo-x86_64.so, there's even libstdc++, so I suspect you force-override all these libraries in the packages you run. Of course, in usual cases they will be the same but not always – Nix people are used to mixing versions sometimes. IMHO memory allocator overrides are meant to just replace a few particular functions, not whole libc, libstdc++, etc.

EDIT: still, I also don't know why runs in nix-shell --pure could be affected.

vcunat on 21 Oct 2020

This is a known issue I believe, see #93116. For now the easiest work around is when you run into it is mv /etc/ld-nix.so.preload /etc/ld-nix.so.preload.tmp. I am not sure if it is an issue with preloading in environment.memoryAllocator being done badly. Firefox will segfault and so will Chromium when used with any memory allocator other than what they bundle. (Unless you select build flags which disable their memory allocators they use.) I think dynamic loading is naturally fragile and you should expect weird things.

As for why nix-shell --pure is affected, doing nix-shell --pure -p hello --run 'env | grep glibc' shows it loads glibc which has the patch that looks for /etc/ld-nix.so.preload even in nix-shell --pure.

https://github.com/NixOS/nixpkgs/blob/b3aed163d551a7c8829132022531f7526fa8a2ac/pkgs/development/libraries/glibc/common.nix#L69

One could possible expand the patch to check IN_NIX_SHELL and if it is set to pure do not load /etc/ld-nix.so.preload.

S-NA on 22 Oct 2020

❤2

One could possible expand the patch to check IN_NIX_SHELL and if it is set to pure do not load /etc/ld-nix.so.preload.

possible but I'm not really convinced. It would add even more magic and would it be a significant improvement?

If "pure" shells preserve $HOME, why not global preloads? If someone wants a hardened memory allocator _system-wide_, should their development environments be an exception? I don't know for sure. Perhaps nix-shell is trying to serve too different purposes by the same approach 🤷🏽 and in that situation I'd personally prefer more simplicity.

vcunat on 23 Oct 2020

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/warning-unknown-setting-extra-sandbox-paths/9836/4

nixos-discourse on 6 Nov 2020

Oh god, just spent 4 hours wrestling with half the binaries on my system segfaulting because of this after an upgrade to 20.09 (and the hardened profile). Removing /etc/ld-nix.so.preload fixed it, thanks so much @S-NA!
The really bad thing that happened however is that if I tried to boot back into a previous system version, init itself segfaulted! I don't understand how adding a boot entry for a new boot entry corrupted an existing boot entry but it did. This made it extremely hard to recover anything, does someone understand how this happened?

Nadrieril on 8 Nov 2020

I suspect that during boot the init bash script read /etc/ld-nix.so.preload that was leftover from the previous boot, even though the system I was booting was _not_ supposed to have that file. And that made bash segfault. So this prevented me from booting any system at all, except the one that actually expected /etc/ld-nix.so.preload to be there. But that one had plenty of other segfaults everywhere, because my userspace programs are from nixpkgs-unstable (currently glibc 2.32), whereas my system was on nixos-20.09 (currently glibc 2.31).

Nadrieril on 8 Nov 2020

Adding a report: both bash and tailscaled (from pkg tailscale) pulled from unstable segfault on a 20.09 hardened NixOS. Setting environment.memoryAllocator.provider = "libc" (undoing the use of scudo) fixes it. This occurs on an Intel system, a Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz DigitalOcean VM. CPUID reports support for fma, but from googling that seems to be FMA3, with FMA4 not supported.

danderson on 16 Nov 2020

Just to let you know that environment.memoryAllocator.provider = "scudo" is known to completely break NixOS when the glibc version is bumped, due to /etc/ld-nix.so.preload, as discussed in this old PR of mine trying to mitigate this problem: https://github.com/NixOS/nixpkgs/pull/96289

ju1m on 16 Nov 2020

Why does this option even exist? A global LD_PRELOAD sounds like a good way to break your system...

edolstra on 16 Nov 2020

👍2 👀1

There's a fat warning in the description at least 🤷🏽

vcunat on 17 Nov 2020

Maybe it should be removed from the hardened profile in that case. As it stands, it's encouraging people to run an unsafe config, afaict?

danderson on 17 Nov 2020

👍1

@danderson well, depends on your definition of safety. Programs built just right work, and programs where applying a memory protection measure fails, fail to start — depending on whether you threat model weights lack of service or potentially exploitable software as a larger problem, it might be good or bad I guess?

7c6f434c on 17 Nov 2020

👍2

What really bothers me about this option is that it broke NixOS's promise that rollbacks can always undo mistakes. In my case I ended up with every boot entry except one segfaulting during init. So I would really prefer this got removed from the hardened profile because I didn't even see any warning.

Nadrieril on 17 Nov 2020

👍1

cc @emilazy @joachifm who are listed as maintainers in the hardened profile.

Ma27 on 17 Nov 2020

Do people actually use the hardened profile? I see it more as a guide on security options available rather than something that can be used practically. I will submit a pull request to remove Scudo from it, but it would be better just to make the environment.memoryAllocator.provider = "scudo"; option do nothing until it works.

TredwellGit on 17 Nov 2020

joachifm on 17 Nov 2020

👍4