Nixpkgs: NixOS stage-2 fails cryptically when invoked with /proc but without /run

Created on 17 Dec 2016 · 9Comments · Source: NixOS/nixpkgs

Issue description

NixOS's early boot process does some mounting - such as to /proc, /sys/ and /run - as part of its operation. These mounts are typically done by stage-1-init.sh, but stage-2-init.sh has some code to perform them on its own if it is invoked without stage-1 having run first.

Unfortunately, that code only checks for the /proc mount, and doesn't correctly handle the case of being invoked with /proc mounted but other mounts missing.
This is particularly catastrophic in the case of run, since stage-2-init.sh itself performs some modifications to that directory - namely setting /run/current-system - that are necessary for NixOS to boot correctly. If the /run tmpfs has not been mounted when it does these modifications, that mount will instead be done by a later part of the boot process, and the stage-2-init.sh changes, including /run/current-system, will be shadowed by it.

This breaks things like kernel module loading, most invocations that relay on $PATH, and probably various other things I'm not thinking of right now; when I ran into it, it did manage to bring up an X session, but had broken both networking and my input devices, so tracking the issue down became highly nontrivial.

NixOS used to behave correctly here; I believe this was broken as a side-effect of 6efcfe03ae4ef426b77a6827243433b5296613a4.

Steps to reproduce

Boot NixOS from an initramfs that leaves /proc mounted but doesn't mount anything onto /run. Check the contents of /run once it's up.

Technical details

System: NixOS 16.09pre-git (Flounder)
Nix version: nix-env (Nix) 1.11.4
Nixpkgs version: NixOS 16.09pre-git (it's rev 813e63e21190cbebbdb4865706dd827a91770714)

Source

sh01

Most helpful comment

Sure, the change is simple enough; I'll write it up tomorrow or so and send a pull request.

sh01 on 17 Dec 2016

👍2

All 9 comments

@abbradar : You know this code better than I do; do you have a specific opinion on what the best way to fix this would be?

sh01 on 17 Dec 2016

If stage1 fails to mount /run, stage2 now makes not attempt to retry this, from what I saw in line https://github.com/NixOS/nixpkgs/commit/6efcfe03ae4ef426b77a6827243433b5296613a4#diff-f9e70a348805ea9c8d0741890518e1d4L117 this is probably ok, but could explains why it work out before.

Mic92 on 17 Dec 2016

Off the top of my head it is the intended behavior. My idea was that if /proc is already mounted, something already has handled all the needed mounts -- so this is to save us from checking every mount in hopes of saving some time during boot. I didn't know of any cases when it's not so on NixOS -- can you describe yours?

abbradar on 17 Dec 2016

I'm using a custom initramfs to boot NixOS. The reasons are a bit estoeric; long story short, I haven't figured out how to make the standard nixos bootloader setup work well for my uses. This worked fine until the recent changes to stage2.

The performance impact of checking for that mountpoint should be absolutely negligible; if you're really concerned about performance here, it would make more sense to turn stage2 into something other than an interpreted shell script that forks a bunch of extra processes for subsidiary work.
I don't think the time it takes to execute is significant enough for optimization here to be particularly worthwhile, including the micro-optimization of only looking at one of the mountpoints.

What I'm doing is somewhat nonstandard, but not horrifically so, and this failure mode is really ugly and hard to debug; it took me hours to figure out where it was going wrong, and the same is likely to happen to other people if we don't change this.

sh01 on 17 Dec 2016

I see; thank you for explanation! I'm not against checking each mount if there is a use case for that; however, I don't have time to tackle it right now. Do you want to do it by yourself? You'll need to remove /proc check in nixos/modules/system/stage-2-init.sh and change specialMount() so that each mount is wrapped with if mountpoint -- are you interested?

abbradar on 17 Dec 2016

Sure, the change is simple enough; I'll write it up tomorrow or so and send a pull request.

sh01 on 17 Dec 2016

👍2

Suggested fix: #21370

sh01 on 23 Dec 2016

Fixed by #21370.

fpletz on 9 Jan 2017

That fix appears to have broken containers, and was rolled back for that reason, so this issue is present once more.
I'll look into coming up with a better fix. I expect it'll be a few days at minimum.

sh01 on 11 Jan 2017

❤1

Was this page helpful?

0 / 5 - 0 ratings