Nixpkgs: When /boot is full, system rebuilds fail

Created on 15 Mar 2017  Â·  48Comments  Â·  Source: NixOS/nixpkgs

Issue description

When the /boot partition is entirely full (eg. when old generations have not been removed for a long time), any kind of nixos-rebuild command will fail if a new kernel is attempted to be installed.

Deleting old generations and garbage-collecting does not fix the issue, because garbage collection doesn't touch the /boot partition, and nixos-rebuild will only try to remove obsolete images after having placed the new initrd in /boot. Since it's full, the new image cannot be copied over:

cannot copy /nix/store/8i1ixqycplb4wc812wkxxf432424jxh5-initrd/initrd to /boot/kernels/8i1ixqycplb4wc812wkxxf432424jxh5-initrd-initrd.tmp
warning: error(s) occurred while switching to the new configuration

... which means the "remove old images" routine never occurs, and the user is stuck.

I've worked around this by manually moving a very old image out of the /boot partition into /root, then running nixos-rebuild boot, then moving back the moved image after cleanup and running nixos-rebuild boot again to ensure that it wasn't a necessary image after all.

Steps to reproduce

  1. Have a /boot with no space left.
  2. Try to rebuild the system with a new kernel build.

Technical details

  • System: 16.09.1836.067e66a (Flounder)
  • Nix version: nix-env (Nix) 1.11.7
  • Nixpkgs version: 16.09.1836.067e66a
nixos

Most helpful comment

I just now ran into this when doing sudo nixos-rebuild switch --upgrade:

building Nix...
building the system configuration...
Traceback (most recent call last):
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 210, in <module>
    main()
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 197, in main
    write_entry(*gen, machine_id)
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 81, in write_entry
    kernel = copy_from_profile(profile, generation, "kernel")
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 57, in copy_from_profile
    copy_if_not_exists(store_file_path, "/boot%s" % (efi_file_path))
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 21, in copy_if_not_exists
    shutil.copyfile(source, dest)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
warning: error(s) occurred while switching to the new configuration

I have plenty of space on every partition, except for the /boot partition which is completely full.

Setting boot.loader.grub.configurationLimit to a lower number didn't help.

After deleting old generations using sudo nix-collect-garbage --delete-older-than 60d and then running sudo nixos-rebuild switch it now works.

All 48 comments

Not sure why can't we just swap the order (first clean old images, then place new ones) given that they are already removed from Nix store (so they won't boot anyway).

They may be present and they may be even alive through some other GC root, but I can't see a good reason either.

also of note to prevent future issues, there are options like boot.loader.grub.configurationLimit to limit how many generations actually get copied to /boot

The OP workaround doesn't solve the issue for someone with a gummiboot UEFI install. Running nixos-rebuild boot fails because the command simply fills up /boot all over again. :(

Still a problem.

Still a problem.

Do you see this with gummiboot or GRUB? I see that GRUB indeed has this problem but from the code gummiboot should be okay.

I’m using the systemd bootloader (I believe that is gummiboot).

I'm having this with GRUB. I removed an old kernel by hand (freeing about 20MB) and then nixos-rebuild boot worked properly.

@nagisa I've tested this with systemd-bootloader on my local machine -- it seems to correctly remove old entries. Can you repeat my experiment?

  1. Move one of kernels in /boot to say /tmp;
  2. dd if=/dev/zero of=/boot/EFI/nixos/foo.efi (to fill disk space with a bogus "kernel");
  3. nixos-rebuild switch.

After it finishes foo.efi should be deleted correctly and the moved kernel should appear again, without disk space errors.

@jpotier I'll try to prepare a patch (I'm not very familiar with Perl but it looks straightforward).

I've hopefully fixed GRUB issue -- please test #26165.

@abbradar just hit it without doing anything special. Just an --upgrade with change from linuxKernel4_10 to linuxKernel4_11.

building path(s) ‘/nix/store/aai4w304vnkqr8g7q1fb4gnmvxphd2qc-dbus-1’
building path(s) ‘/nix/store/83gqr3n9b6i1i7zz20l1vs17y5b01fqd-unit-polkit.service’
building path(s) ‘/nix/store/y0h3xjxfr3z3q6vgj7rd05swdx3iwx2l-unit-systemd-fsck-.service’
building path(s) ‘/nix/store/8qadphzyxbazqgi5xsv3lcjdx73sb1ws-unit-dbus.service’
building path(s) ‘/nix/store/dj5g33zpdynjj71m9ikhl91c88113hgv-system-units’
building path(s) ‘/nix/store/jjhri8hhvg9mdnjg65yar8gbkgd8dw3b-user-units’
building path(s) ‘/nix/store/vidj2plybp70blj82sfia8a1x6w5p35j-etc’
building path(s) ‘/nix/store/hynp02zx2h8r7ggqk1zf72gja1v5jf5b-nixos-system-shirobox-17.09pre108282.53835c93cb’
Traceback (most recent call last):
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 160, in <module>
    main()
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 147, in main
    write_entry(gen, machine_id)
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 52, in write_entry
    initrd = copy_from_profile(generation, "initrd")
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 47, in copy_from_profile
    copy_if_not_exists(store_file_path, "/boot%s" % (efi_file_path))
  File "/nix/store/17x3h3fb1vbdkbvp7fpaxas5rdxw8rcw-systemd-boot-builder.py", line 17, in copy_if_not_exists
    shutil.copyfile(source, dest)
  File "/nix/store/c6j3ky32czxaqy41i9xqm2qh1ys5kixv-python3-3.6.1/lib/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/nix/store/c6j3ky32czxaqy41i9xqm2qh1ys5kixv-python3-3.6.1/lib/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
warning: error(s) occurred while switching to the new configuration
shirobox :: /tmp  
â–ªdf -h
Filesystem        Size  Used Avail Use% Mounted on
devtmpfs          804M     0  804M   0% /dev
tmpfs             7.9G     0  7.9G   0% /dev/shm
tmpfs             4.0G  5.1M  4.0G   1% /run
tmpfs             7.9G  360K  7.9G   1% /run/wrappers
rpool/root/nixos  234G   49G  185G  21% /
tmpfs             7.9G     0  7.9G   0% /sys/fs/cgroup
tmpfs             7.9G  496K  7.9G   1% /tmp
rpool/home        193G  8.1G  185G   5% /home
/dev/sdb1         100M  100M  2.0K 100% /boot
tmpfs             1.6G     0  1.6G   0% /run/user/1000

I tried running all of the nix-env --delete-generations old && nix-collect-garbage -d which usually helped with my /boot woes, but this time it neglected to get rid of the files in /boot for some reason (it used to help before). Here are the files.

-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 16x2rs5xmk251q8wn504fxhl8fi541p7-linux-4.11.3-bzImage.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 4pimpvaqylk703069z5fld1ihfa8jr9p-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 9yivpgx2mjap0qr2xvdawia5gd7d1k9f-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 apzjnr4r3jxlgjhjq6p6wp3rjz419yz9-linux-4.10.12-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 b368pwmjwkkqcszmsa94x2frgqpgbx5s-linux-4.10.15-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 ckkdipm3l32z5kk7vdaxy84m62snwi7w-linux-4.10.12-bzImage.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 d8i3vpc7v5253ryz2c5ry7ghnbvb5pqq-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 h7qgnny008h1k9yplymqx0asrg3sx6kd-linux-4.10.13-bzImage.efi*
-rwxr-xr-x 1 root root 5.6M Jun  1 03:11 ja667wvnjri5wsgbl7227qwlgzsnhdvn-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 k8xy3rsvjkjfhqi17qfzlqzwafnl6jg9-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 kwbzcflgmd4jn6w4fprxhf20plbj3brn-initrd-initrd.efi*
-rwxr-xr-x 1 root root 7.6M Jun  1 03:11 wy7jm4dwr5hvav8qkiqadnp3hsj96ibj-initrd-initrd.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 zrifs96767ixklsr2w4ykp0fwdw2g21v-linux-4.10.13-bzImage.efi*
-rwxr-xr-x 1 root root 3.6M Jun  1 03:11 zxym3scw3mj0xpqk0glgk8ln18l55mzh-linux-4.10.10-bzImage.efi*

nix-env and nix-collect-garbage doesn't clean up /boot, you have to re-run the install-grub.pl script (via nixos-rebuild switch/boot), which will update the /boot folder

@cleverca22 does that apply for the systemd-boot, though?

for systemd-boot, its this script: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/loader/systemd-boot/systemd-boot-builder.py

it still runs the same way, via nixos-rebuild switch/boot

@nagisa That's strange, it fails _after_ removing old entries. What does sudo nix-env --list-generations -p /nix/var/nix/profiles/system report?

Regarding a need to run sudo nix-collect-garbage -d to clean up space -- this is expected, you'd need to indicate that you don't need old kernels by removing profiles that are associated with them.

Oh I see. My problem is that I tend to forget to clean up my old generations. Is there a way to remove all the generations except the one I’m currently booted into and the newest one as a part of nixos-rebuild?

@nagisa Not now but that'd be relatively trivial to fix -- please open an issue.

I have 415M of -initrd and -bzImage files in /boot/kernels. It seems to have been growing streadily and is now just about out of space. I've deleted all old generations and run nixos-rebuild. How can I clean this out?

My config looks like this:

  boot.initrd.luks.devices = [{
    name   = "root";
    device = "/dev/nvme0n1p3";
    preLVM = true;
  }];

  boot.loader.grub.device = "/dev/nvme0n1";
  boot.loader.systemd-boot.enable = false;

  boot.cleanTmpDir = true;

I think it should be have been cleared after nixos-rebuild if you have deleted your old generations. This is not the case, correct? Can you show your /nix/var/nix/profiles/ contents?

@abbradar Ah nevermind, I wasn't running nix-collect-garbage as root, so I hadn't actually deleted old generations. (By the way, it's weird that --delete-older-than just silently fails if you try to use it as a non-root user.) Apologies, this is entirely unrelated to this issue.

By the way, it's weird that --delete-older-than just silently fails if you try to use it as a non-root user.

Could you create a separate issue for this? I think it might be desirable to have it at least print a warning (in case the user expected to be garbage-collecting the system environment, not just the user environment).

I tend not to garbage collect that often, so my efi partition had always overfilled quickly. Since systemd-boot is merely a efi boot menu and not full-blown loader, I've switched to grub and configured it to store only its efi program on a efi partition and reside everything else on a root partition (or whatever you'd like). This way kernels, initrds and grub stuff are put on a large partition and efi partition stores only grub efi binary of 122Kb :smile:

So here's my setup.

In /etc/nixos/hardware-configuration.nix I've replaced fileSystems."/boot" with fileSystems."/boot/efi"

/etc/nixos/configuration.nix became this:

  #boot.loader.systemd-boot.enable = true;
  boot.loader.efi.canTouchEfiVariables = true;

  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub = {
    efiSupport = true;
    #efiInstallAsRemovable = true; # in case canTouchEfiVariables doesn't work for your system
    device = "nodev";
  };

P.S.: nixos-generate-config already hints this approach, but only for BIOS mode systems currently (I believe it's for BIOS->EFI migration purpose): https://github.com/NixOS/nixpkgs/blob/6c8b819c99a85276f9d3ebdefdb039235321c646/nixos/modules/installer/tools/nixos-generate-config.pl#L536

I failed to garbage collect as root often enoufh and ended up with more than a few kernels in /boot/kernels. It's easy to see which are the old ones. After moving them out to another partition just in case. Something like this worked for me...

````
cd /boot/kernels
sudo mv lmnsg5sh081zdgr6rrwhhzdkyj0v7ibp-linux-4.9.25-bzImage /tmp/
sudo touch lmnsg5sh081zdgr6rrwhhzdkyj0v7ibp-linux-4.9.25-bzImage

a couple more of the previous two lines as needed

sudo nixos-rebuild switch
``` If you skip thetouch` then nixos-rebuild will regenerate the missing files before the point where the new boot configuration file is generated and it won't work. Your partition will still be full. The zero size files do the trick.

As an aside I tried to resize the /boot partition, but gparted failed to make the extra space visible to the fat32 partion that /boot is on. Maybe a later version of gparted will enable this approach to work.

I found this mail list -- https://nixos.org/nix-dev/2016-September/021832.html

After I run this command,

/run/current-system/bin/switch-to-configuration boot

I got back my /boot space.

Hm ... running /run/current-system/bin/switch-to-configuration boot results in

OSError: [Errno 28] No space left on device

for me.

I just now ran into this when doing sudo nixos-rebuild switch --upgrade:

building Nix...
building the system configuration...
Traceback (most recent call last):
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 210, in <module>
    main()
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 197, in main
    write_entry(*gen, machine_id)
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 81, in write_entry
    kernel = copy_from_profile(profile, generation, "kernel")
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 57, in copy_from_profile
    copy_if_not_exists(store_file_path, "/boot%s" % (efi_file_path))
  File "/nix/store/3gd2nlig389d4rp94prqfjf2n33rlwia-systemd-boot-builder.py", line 21, in copy_if_not_exists
    shutil.copyfile(source, dest)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 122, in copyfile
    copyfileobj(fsrc, fdst)
  File "/nix/store/wx2jazwszwyqpwqj4ghkwn19n1h1ncva-python3-3.6.3/lib/python3.6/shutil.py", line 82, in copyfileobj
    fdst.write(buf)
OSError: [Errno 28] No space left on device
warning: error(s) occurred while switching to the new configuration

I have plenty of space on every partition, except for the /boot partition which is completely full.

Setting boot.loader.grub.configurationLimit to a lower number didn't help.

After deleting old generations using sudo nix-collect-garbage --delete-older-than 60d and then running sudo nixos-rebuild switch it now works.

Why not just switch to grub which can store kernels, initrds and other things on root partition instead of small efi partition? https://nixos.wiki/wiki/Bootloader#How_to_deal_with_full_.2Fboot_in_case_of_EFI

The challenge has always been synchronising multiple ESPs. I would want something redundant, because I have 2 drives, but I also want them to be synchronised, but only if the boot succeeded. I hacked up something using activation scripts and systemd script, but it's quite brittle.

@gnidorah I’m personally using efi-stub, which grub does not support.

@rubenmoor on my UEFI system had success with the command supplied by @rick68 only after I ran it as sudo. Cleared out 250MB+ of old files from /boot/EFI/nixos/

Yes, system-changing commands need to be ran with root privileges. EDIT: I'm afraid even the documentation doesn't say such things explicitly.

I just ran into this whilst upgrading to 18.09 and not sure how to "Fix" this. Can I manually safely remove old kernels?

I again ran into this whilst upgrading from linuxPackages_4_18 to linuxPackages_4_19 and again stuck with a broken system. We should really tackle this, it's very bad UX.

If anybody runs into this:

run

nixos-rebuild boot

instead of

nixos-rebuild switch

and it will remove the old kernels

Can there be a configurationLimit option available to systemd-boot as well?

I've pushed 224a6562a4880195afa5c184e755b8ecaba41536 to master which adds boot.loader.systemd-boot.configurationLimit exactly as the existing one for grub.

Current status:

  • systemd-boot will cleanup entries first before copying new ones, so just setting the limit will do the right thing
  • grub still needs the workaround for deleting old entries if you run out of disk space

Possible TODOs:

  • [ ] catch copying to throw an exception and instruct to set the limit
  • [ ] get grub to first delete entries before copying new ones

I'm facing this issue on my raspberry pi 3. Basically, the /boot partition is so small (only 30m) that I can't even build a single generation. Any suggestion on how to fix this?

I'm facing this issue on my raspberry pi 3. Basically, the /boot partition is so small (only 30m) that I can't even build a single generation. Any suggestion on how to fix this?

Same on my RPi3.

$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/mmcblk0p1  120M  120M     0 100% /boot

Most of the space is taken up by /boot/nixos:

$ du -sch /boot/nixos/*
4.1M    /boot/nixos/3n2b8fvv6xaqpdccfcgg2z7sp28sz4j8-initrd-initrd
2.9M    /boot/nixos/508hwfj07vqvla0l27g3x5ync5mschzv-linux-4.14.10-dtbs
26M /boot/nixos/508hwfj07vqvla0l27g3x5ync5mschzv-linux-4.14.10-Image
4.1M    /boot/nixos/aibyylj1h4bim45zgw2s0gwcsvfadk34-initrd-initrd
6.6M    /boot/nixos/bwgjnapvj32i9x7g35mp86567awxf9lq-initrd-initrd
4.0M    /boot/nixos/j9gqa43mayrl7j9mmpjl0hyii4yb49mn-linux-4.19.42-dtbs.tmp.6140
29M /boot/nixos/j9gqa43mayrl7j9mmpjl0hyii4yb49mn-linux-4.19.42-Image
2.5M    /boot/nixos/rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-dtbs
25M /boot/nixos/rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-Image
4.4M    /boot/nixos/vcd0w3n8qqvgnb5ic950nq8z3mnbi86w-initrd-initrd
108M    total

None of the suggested commands work for me, e.g.:

$ sudo nixos-rebuild boot
building Nix...
building the system configuration...
cat: write error: No space left on device
warning: error(s) occurred while switching to the new configuration

I resorted to deleting a couple of old items, freeing up some 30 MB:

$ cd /boot/nixos
$ sudo rm -rf rk340hsfa1mc9x6q7f3yr1m5lr73kfb1-linux-4.13.2-*

And now sudo nixos-rebuild switch works. :slightly_smiling_face:

For the Raspberry Pi folks chiming in: boot.loader.generic-extlinux-compatible.configurationLimit should probably be set lower (looks like it's the corrollary to the grub setting people mention above).

You will likely need to manually delete some files in /boot/nixos/ before you can sudo nixos-rebuild switch -- perhaps look for names that suggest part of an older linux kernel.

Unfortunately it looks like the total size of a single boot configuration is ~50M, so my 130MB boot partition probably won't do much good.

$ ls /boot/nixos
66i7fz4ssgh90pw352qm1wd6yig7k1z3-linux-5.6.12-dtbs  66i7fz4ssgh90pw352qm1wd6yig7k1z3-linux-5.6.12-Image  6cw641p81man2k0p4iavwbvwf8j5pzzd-initrd-linux-5.6.12-initrd
$ du -sh /boot/nixos/
53M /boot/nixos/

On the other hand, it looks like the wiki has been updated to recommend against using a separate boot partition on NixOS >= 19.09:

# File systems configuration for using the installer's partition layout
  fileSystems = {
    # Prior to 19.09, the boot partition was hosted on the smaller first partition
    # Starting with 19.09, the /boot folder is on the main bigger partition.
    # The following is to be used only with older images.
    /*
    "/boot" = {
      device = "/dev/disk/by-label/NIXOS_BOOT";
      fsType = "vfat";
    };
    */
    "/" = {
      device = "/dev/disk/by-label/NIXOS_SD";
      fsType = "ext4";
    };
  };

Also see the section on the same page: Disable use of /boot partition

Is there a workaround?
I don't know which .efi files are belong to deleted generations, so even don't know what can be delete manually.

nixos-rebuild boot doesn't work on my machine with a full /boot partition.

UPDATE: My fault. It seems that I need to delete even more generations to get enough space.
After deleting enough system generations, backup /boot/EFI/nixos/ somewhere else, clear up all the .efi files inside /boot/EFI/nixos/, and do nix-rebuild boot. It will generate the .efi's for all the remaining entries, including the newly-built one.

The EFI files shouldn't take much space, you would rather be looking for old kernel images and initrds.

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have been deleted.

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have NOT been deleted.

Fixed that for you.

FWIW, I usually just go by kernel version numbers rather than trying to match up hashes. Every time I filled up /boot I had enough ancient kernels and initrds lying around that it was easy to find some I knew I wouldn't need any more.

The EFI files shouldn't take much space, you would rather be looking for old kernel images and initrds.

You can check ls -l /nix/var/nix/profiles/system-*/kernel for the hashes of files only in generations that have been deleted.

Thanks a lot! They are ...-bzImage.efi and ...-initrd.efi.

I ran into this today, but /boot was somehow not mounted at all.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sid-kap picture sid-kap  Â·  3Comments

copumpkin picture copumpkin  Â·  3Comments

yawnt picture yawnt  Â·  3Comments

tomberek picture tomberek  Â·  3Comments

ghost picture ghost  Â·  3Comments