Nixpkgs: OverlayFS broken on NixOS Kernel 4.19 and 4.20

Created on 23 Jan 2019  ·  26Comments  ·  Source: NixOS/nixpkgs

Issue description

When using overlayfs with NixOS and Kernel 4.19 it is not possible to overwrite a file that already exists in the lower directory with new content. Instead if attempted the file appears empty. (See how to reproduce for details).

It seems like the revert of an upstream commit in 4.19 https://github.com/NixOS/nixpkgs/pull/52942 breaks something fundamentally in overlayfs.

EDIT: Reverting https://github.com/NixOS/nixpkgs/pull/52942 fixes the behavior. But I'm unable to run the tests as the VM kernel pancis :(

This also affects docker if used with one of the overlay drivers.

Steps to reproduce

Via the test in https://github.com/NixOS/nixpkgs/pull/54508

or

Manually:

mkdir -p /tmp/mnt/upper /tmp/mnt/lower /tmp/mnt/work /tmp/mnt/merged

# Create an existing file in the lower directory
echo 'Existing' > /tmp/mnt/lower/existing.txt

# Mount overlayfs
mount -t overlay overlay -o lowerdir=/tmp/mnt/lower,upperdir=/tmp/mnt/upper,workdir=/tmp/mnt/work /tmp/mnt/merged

# Prints "Existing" - OK
cat /tmp/mnt/merged/existing.txt

# Write some new content
"echo 'New' > /tmp/mnt/merged/existing.txt",

# Prints "" (empty) but it should be New - FAIL
cat /tmp/mnt/merged/existing.txt

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the
results.

 - system: `"x86_64-linux"`
 - host os: `Linux 4.19.16, NixOS, 19.03.git.5d42db2 (Koi)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.2`
 - channels(root): `"nixos-17.03pre95306.a24728f"`
 - channels(pascal): `"nixos-19.03pre167251.20343f0ab47"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
blocker nixos

Most helpful comment

Just to keep everyone outside of #nixos-dev up to date, we do have a fix for this. Here is the upstream post and patch: https://www.spinics.net/lists/linux-unionfs/msg06733.html

All 26 comments

/cc @aszlig and @samueldr as you probably know the most about this.

Hmm, this is a blocker for 19.03, can't have either side of the current situation while keeping the latest LTS kernel.

Thanks for the (wip) test, I bet it'll be useful to check for the same kind of regression.

cc @lheckemann to keep in mind.

@aszlig was the issue we were working around reported upstream? I guess it hasn't been fixed since the patch (a partial revert) still applies.

I'm trying to debug the issue but I'm confused right now.

I tried to run the tests without the patches in https://github.com/NixOS/nixpkgs/pull/52942 but I then get the error:

switch_root: can't execute '/nix/store/i9rxcghqk9glylabsdl5wmnwv7w853wj-nixos-system-machine-19.03.git.4fb8bc8/init': Operation not permitted

I'm not sure this is actually the issue https://github.com/NixOS/nixpkgs/pull/52942 is working around?

I was first suspecting it has something to do with the interaction between 9p and overlayfs but then I made a curious observation that when I run a VM produced via:

nix-build nixos/ -A vm -I nixpkgs=/root/nixpkgs

It runs without issue using the same overlayfs over 9p setup.

Any ideas what is different between running a test VM and my second directly built VM?

IIRC, yes, Operation not permitted on switching root is the issue the patch is used to work around.

Ok but I still don't understand why it works in the non test case VM.

Ok but I still don't understand why it works in the non test case VM.

It does for me, from your commit, reverted the patch. (Also added a workaround for the gdk_pixbuf mismatch between 18.09 and unstable #54278.)

  • QEMU_KERNEL_PARAMS='console=ttyS0' result/bin/run-DUFFMAN-vm
  • View → serial0
<<< NixOS Stage 1 >>>

loading module virtio_balloon...
loading module virtio_console...
loading module virtio_rng...
loading module dm_mod...
running udev...
kbd_mode: KDSKBMODE: Inappropriate ioctl for device
starting device mapper and LVM...
checking /dev/vda...
fsck (busybox 1.29.3)
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/vda
/dev/vda: recovering journal
/dev/vda: clean, 11/32768 files, 6353/131072 blocks
mounting /dev/vda on /...
mounting store on /nix/.ro-store...
mounting tmpfs on /nix/.rw-store...
mounting shared on /tmp/shared...
mounting xchg on /tmp/xchg...
mounting overlay filesystem on /nix/store...
switch_root: can't execute '/nix/store/9h3bsjav4cxlb6xn7a4zjwvb1r71sc3m-nixos-system-DUFFMAN-19.03.git.b2669a2/init': Operation not permitted
[    1.170907] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    1.170907] 
[    1.172524] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.19.17 #1-NixOS
[    1.173684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[    1.175602] Call Trace:
[    1.176171]  dump_stack+0x5c/0x7b
[    1.176901]  panic+0xe4/0x242
[    1.177541]  do_exit+0xaed/0xaf0
[    1.178209]  ? handle_mm_fault+0xdc/0x210
[    1.179004]  do_group_exit+0x3a/0xa0
[    1.180014]  __x64_sys_exit_group+0x14/0x20
[    1.180832]  do_syscall_64+0x4e/0x100
[    1.181589]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    1.182502] RIP: 0033:0x7f1357beb646
[    1.183222] Code: Bad RIP value.
[    1.183887] RSP: 002b:00007ffdda70f9d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[    1.185284] RAX: ffffffffffffffda RBX: 00007f1357ed45a0 RCX: 00007f1357beb646
[    1.186477] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
[    1.187727] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff80
[    1.188917] R10: 0000000000000073 R11: 0000000000000246 R12: 00007f1357ed45a0
[    1.190161] R13: 0000000000000001 R14: 00007f1357edd488 R15: 0000000000000000
[    1.191539] Kernel Offset: 0x4200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.193362] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    1.193362]  ]---

boot.kernelPackages = pkgs.linuxPackages_4_19; in my configuration.nix, maybe you're using an older LTS pinned by its version number?

For what it's worth, I can confirm the issue. Pretty much all my docker containers are broken after the recent channel update to 19.03pre166987.bc41317e243 a.k.a. nixos-unstable.

Hmm, looks like we're between a rock and a hard place, something changed in the kernel breaking our tests (not sure if it's userland that broke) and the revert to make the tests pass breaks overlayfs.

The simple solution, and what probably will happen for 19.03 unless something is figured out, is to revert back to the previous LTS as default, but this is not an ideal solution.

Should we revert #52942? Maybe there's a better way to fix the test failures.

The report has been posted to unionfs-linux and should show up in mailinglist archives soon, I'll post a link as soon as that's the case.

Should we revert #52942?

Yes, please. As much as it sucks, a situation where docker is completely broken on NixOS is not really tenable.

I think reverting #52942 and got back to 4.14 as the default would be the most reasonable thing to do until the issue is resolved.

Whatever we do, we should do it soon, because the current situation is really bad and IMHO we shouldn't have kept master in that broken state as long as we did already. Waiting another couple of days before we do something doesn't feel like a good idea.

I agree with the suggestion to revert #52942 and go back to 4.14. Then we can see about alternative solutions to the test failures with 4.19.

Hm, it seems my mail (Message-ID: <20190129194114.GA7785@dnyarri>) never reached the list for some reason, even though the MTA has accepted it:

DD42F980DD95A: to=<[email protected]>, relay=vger.kernel.org[209.132.180.67]:25, delay=687, delays=621/0.01/0.63/65, dsn=2.7.0, status=sent (250 2.7.0 nothing apparently wrong in the message. BF:<H 0>; S1728330AbfA2Tww)

(This was on Jan 29 20:52:53 CET)

@szmi: Is the linux-unionfs mailing list moderated?

edit: Just sent it again without attachments, let's hope this was the culprit.

There are a bunch of overlay fixes in 4.20.7, can you try the latest kernels?

Same fixes with 4.19 (which is desired), looking this evening.

machine# Probing EDD (edd=off to disable)... ok
machine# c[    0.000000] Linux version 4.19.20 (nixbld@localhost) (gcc version 7.4.0 (GCC)) #1-NixOS SMP Wed Feb 6 16:30:16 UTC 2019
machine# [    0.000000] Command line: loglevel=7 console=ttyS0 panic=1 boot.panic_on_fail init=/nix/store/xsimkh58w0k6jmn6jfxf5arsk9xkdhk0-nixos-system-machine
-19.03.git.47aad6e/init regInfo=/nix/store/a6vn3r6600p5z216iwbjh81vxkhbz3cw-closure-info/registration console=ttyS0
...
machine# mounting overlay filesystem on /nix/store...
machine# switch_root: can't execute '/nix/store/xsimkh58w0k6jmn6jfxf5arsk9xkdhk0-nixos-system-machine-19.03.git.47aad6e/init': Operation not permitted
machine# [    1.508713] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
machine# [    1.508713]
machine# [    1.510084] CPU: 0 PID: 1 Comm: switch_root Not tainted 4.19.20 #1-NixOS
machine# [    1.511062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
machine# [    1.512691] Call Trace:
machine# [    1.513078]  dump_stack+0x5c/0x7b
machine# [    1.513568]  panic+0xe4/0x242
machine# [    1.514020]  do_exit+0xb52/0xb60
machine# [    1.514499]  ? handle_mm_fault+0xdc/0x210
machine# [    1.515097]  do_group_exit+0x3a/0xa0
machine# [    1.515622]  __x64_sys_exit_group+0x14/0x20
machine# [    1.516248]  do_syscall_64+0x4e/0x100
machine# [    1.516782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
machine# [    1.517526] RIP: 0033:0x7f5827cd9646
machine# [    1.518048] Code: Bad RIP value.
machine# [    1.518549] RSP: 002b:00007ffc3718aa98 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
machine# [    1.519685] RAX: ffffffffffffffda RBX: 00007f5827fc25a0 RCX: 00007f5827cd9646
machine# [    1.520726] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
machine# [    1.521771] RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffff80
machine# [    1.522807] R10: 0000000000000073 R11: 0000000000000246 R12: 00007f5827fc25a0
machine# [    1.523845] R13: 0000000000000001 R14: 00007f5827fcb488 R15: 0000000000000000
machine# [    1.525021] Kernel Offset: 0xbe00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
machine# [    1.526585] Rebooting in 1 seconds..

:(

WTF

In case you didn't realise, this is the same initial issue which caused us to add a patch reverting part of the changes to make the tests pass. This is the subject of the thread. So it looks like there were other changes in passing, unrelated to our problem.

Is somebody able to create the setup as it is in during the startup of the nixos test VM in a simplified version. One that would allow to have access to the shell right befor the switch_root get's executed?

For a new nixos-install of 18.09 I’m seeing something similar, with the main difference that except after switch_root it says “No such file or directory”...is this the same issue?

@tbenst this _particular_ issue shouldn't affect 18.09, and what you are experiencing shouldn't be this issue considering (1) the faulty revert patch was reverted (2) the current issue affects the testing infra. (To be fair, your comment could be about the testing infra, it's unspecified if it's your system boot or a VM.)

Though, I'm not downplaying whatever issue you're facing! In fact, let's try and figure out what's going on. Ideally, open an issue with the following information (and mention me and this issue in the body).

  • Do previous generations boot fine?
  • This was after a fresh update of 18.09, right?
  • Which kernel is your system configured to use? (nix eval --raw '(import <nixos/nixos> {}).options.boot.kernelPackages.value.kernel')
  • What's the output of nix-info?

Thanks, @samueldr for the very kind comment! I figured out the issue thankfully--as soon as I enabled systemd-boot everything worked.

Just to keep everyone outside of #nixos-dev up to date, we do have a fix for this. Here is the upstream post and patch: https://www.spinics.net/lists/linux-unionfs/msg06733.html

Was this page helpful?
0 / 5 - 0 ratings

Related issues

edolstra picture edolstra  ·  3Comments

retrry picture retrry  ·  3Comments

yawnt picture yawnt  ·  3Comments

sid-kap picture sid-kap  ·  3Comments

grahamc picture grahamc  ·  3Comments