NixOS's early boot process does some mounting - such as to /proc, /sys/ and /run - as part of its operation. These mounts are typically done by stage-1-init.sh, but stage-2-init.sh has some code to perform them on its own if it is invoked without stage-1 having run first.
Unfortunately, that code only checks for the /proc mount, and doesn't correctly handle the case of being invoked with /proc mounted but other mounts missing.
This is particularly catastrophic in the case of run, since stage-2-init.sh itself performs some modifications to that directory - namely setting /run/current-system - that are necessary for NixOS to boot correctly. If the /run tmpfs has not been mounted when it does these modifications, that mount will instead be done by a later part of the boot process, and the stage-2-init.sh changes, including /run/current-system, will be shadowed by it.
This breaks things like kernel module loading, most invocations that relay on $PATH, and probably various other things I'm not thinking of right now; when I ran into it, it did manage to bring up an X session, but had broken both networking and my input devices, so tracking the issue down became highly nontrivial.
NixOS used to behave correctly here; I believe this was broken as a side-effect of 6efcfe03ae4ef426b77a6827243433b5296613a4.
Boot NixOS from an initramfs that leaves /proc mounted but doesn't mount anything onto /run. Check the contents of /run once it's up.
@abbradar : You know this code better than I do; do you have a specific opinion on what the best way to fix this would be?
If stage1 fails to mount /run, stage2 now makes not attempt to retry this, from what I saw in line https://github.com/NixOS/nixpkgs/commit/6efcfe03ae4ef426b77a6827243433b5296613a4#diff-f9e70a348805ea9c8d0741890518e1d4L117 this is probably ok, but could explains why it work out before.
Off the top of my head it is the intended behavior. My idea was that if /proc is already mounted, something already has handled all the needed mounts -- so this is to save us from checking every mount in hopes of saving some time during boot. I didn't know of any cases when it's not so on NixOS -- can you describe yours?
I'm using a custom initramfs to boot NixOS. The reasons are a bit estoeric; long story short, I haven't figured out how to make the standard nixos bootloader setup work well for my uses. This worked fine until the recent changes to stage2.
The performance impact of checking for that mountpoint should be absolutely negligible; if you're really concerned about performance here, it would make more sense to turn stage2 into something other than an interpreted shell script that forks a bunch of extra processes for subsidiary work.
I don't think the time it takes to execute is significant enough for optimization here to be particularly worthwhile, including the micro-optimization of only looking at one of the mountpoints.
What I'm doing is somewhat nonstandard, but not horrifically so, and this failure mode is really ugly and hard to debug; it took me hours to figure out where it was going wrong, and the same is likely to happen to other people if we don't change this.
I see; thank you for explanation! I'm not against checking each mount if there is a use case for that; however, I don't have time to tackle it right now. Do you want to do it by yourself? You'll need to remove /proc check in nixos/modules/system/stage-2-init.sh and change specialMount() so that each mount is wrapped with if mountpoint -- are you interested?
Sure, the change is simple enough; I'll write it up tomorrow or so and send a pull request.
Suggested fix: #21370
Fixed by #21370.
That fix appears to have broken containers, and was rolled back for that reason, so this issue is present once more.
I'll look into coming up with a better fix. I expect it'll be a few days at minimum.
Most helpful comment
Sure, the change is simple enough; I'll write it up tomorrow or so and send a pull request.