Nixpkgs: grub efi fails on latest nixos-unstable

Created on 19 May 2019  Β·  48Comments  Β·  Source: NixOS/nixpkgs

Issue description

After upgrading bootloader breaks with

error: symbol `grub_file_filters' not found
Entering rescue mode

Steps to reproduce

Technical details

  • system: "x86_64-linux"
  • host os: Linux 4.19.44, NixOS, 19.09.git.2439b30 (Loris)
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.2.2
bug regression blocker nixos

Most helpful comment

so I finally now have it booting normally and all I did afaik is change efiInstallAsRemovable to true
and canTouchEfiVariables to false instead of the inverse.
Seems like I can perfectly dual boot now and I'm on 20.03

Oh I also changed the efi mount point around and I think that may have cleaned up the efi directory, though it's now back to where I started.

All 48 comments

Hi gnidorah,
I recently had the same issue.

A nixos-rebuild switch failed with some error about a full filesystem.
In my case the cause was a full efivars partition. This left me a broken boot like you have.

See https://github.com/NixOS/nixpkgs/issues/27821 for more info.

This is how I restored my system:

  • Put the NixOS installer on a usb stick
  • Boot into the NixOS installer
  • rm /sys/firmware/efi/efivars/dump-* (don't delete anything else in that directory)
  • Reboot again into the NixOS installer
  • Mount your partitions to the correct locations in /mnt/
  • nixos-enter
  • echo 'nameserver 8.8.8.8' >> /etc/resolv.conf to fix network access.
  • nixos-rebuild boot --install-bootloader (install-bootloader might be optional, no idea)
  • This command should complete successfully.
  • reboot

This is all I had to do. However, for now I also disabled boot.loader.efi.canTouchEfiVariables which was initially set by nixos-generate-config. I have no idea what this setting does when enabled. It might be related.

I hope this helps,
Tom

@TomSmeets Unfortunately its not the case. My config is the following:

  fileSystems."/boot/efi" =
    { device = "/dev/disk/by-uuid/58D0-2B0F";
      fsType = "vfat";
      options = [ "defaults,noauto" ];
    };
  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub = {
    efiSupport = true;
    device = "nodev";
  };

So fat partition only stores

/boot/efi
└── EFI
    β”œβ”€β”€ BOOT
    β”‚Β Β  └── BOOTX64.EFI
    └── grub
        └── grubx64.efi

3 directories, 2 files

While system partition (/boot folder) stores everything else so I never run into #23926
It worked great for year or so until I got above issue. The latest nixos-unstable that currently works for me is bc94dcf500286495e3c478a9f9322debc94c4304 Perhaps its a time for bisect :disappointed:

Also a small hint. If you use recent NixOS installation usb stick, then you could boot from it, choose refind in menu, then choose bootx64 entry it will load directry to your nixos installation, so there is no real need for nixos-enter tricks

Done bisect. This is the commit that broke my layout:
https://github.com/NixOS/nixpkgs/commit/df4d0fab2f62fc1ce1d904dbbfd29e5c66da67bf
grub: 2.02 -> 2.04-rc1
cc @volth @NeQuissimus

Then the next step should be either bisecting grub or reviewing its changelog.
Probably some configuration options need to be adjusted to reflect the version update.

I have this error as well. I had to boot from a stick and rescue my system about a week ago.
I just tried today to update channels and the exact same bug popped up. Luckily I haven't shut down my machine yet. Did anybody find a fix?

Follow up:
I accidentally updated yet again and had to go through the annoying rescue process another time.
I finally just cut grub out altogether and now it boots fine.
I'm sad though, I had a nicely customized grub loader that I am going to dearly miss :(

If you run into this issue in the future follow these steps:

  1. Boot with a live USB
  2. Mount as if you were installing nixos:
# Change mounts to your actual `nixos` and `boot` partitions.
mount /dev/sda1 /mnt
mount /dev/sda2 /mnt/boot
vim /mnt/etc/nixos/configuration.nix
  1. Remove grub and fall back to the minimal loader.
    My new boot.loader looks like:
boot.loader = {
    systemd-boot.enable = true;
    efi.canTouchEfiVariables = true;
    efi.efiSysMountPoint = "/boot";
};
  1. nixos-install --root /mnt
  2. Go get a coffee.

@BadDecisionsAlex
There is no need in steps 2,4 if you're booting using unstable live USB

Remove grub and fall back to the minimal loader

Sorry, but no. I want to keep EFI partition as small as possible and I also want to run garbage collection as seldom as possible. For now I just reverted commit https://github.com/NixOS/nixpkgs/commit/df4d0fab2f62fc1ce1d904dbbfd29e5c66da67bf locally

TBH I don't understand why we are pulling release candidates for such critical components as boot loaders.

@gnidorah I completely agree that our repo should roll back.

If you have other notes about my rescue process please let me know. I was just hoping to leave breadcrumbs for anybody else who bumps into the issue; but I am by no means an expert here.

2.04 released, if it is reproducible on the released version, then the fix should be somewhere in configs

@volth Once there, I will test it using following configuration https://nixos.wiki/wiki/Bootloader#Keeping_kernels.2Finitrd_on_the_main_partition

Tried grub 2.04 locally and it didn't work too.

I'm leaving this issue open, but since there is now a solution for out of space problem for systemd-boot https://github.com/NixOS/nixpkgs/issues/23926#issuecomment-504687391 I've switched to that

I use grub on EFI and can't repro this bug.

My boot.loader.efi.canTouchEfiVariables is set to false.

I just did a fresh install on unstable (on ZFS root).
@obadz setting boot.loader.efi.canTouchEfiVariables to false initially resulted in no boot devices.

But essentially what was on the wiki worked:

  boot.supportedFilesystems = [ "zfs" ];
  boot.tmpOnTmpfs = true;
  boot.loader.systemd-boot.enable = false;
  boot.loader.efi.canTouchEfiVariables = true;
  boot.loader.efi.efiSysMountPoint = "/boot/efi";
  boot.loader.grub.efiSupport = true;
  boot.loader.grub.device = "nodev";

partition scheme

DISK=/dev/disk/by-id/<my disk>
sgdisk --zap-all $DISK
sgdisk -n2:1M:+512M -t2:EF00 $DISK
sgdisk -n1:0:0 -t1:BF01 $DISK

mkfs.vfat $DISK-part2

zpool create ...

And boot with grub efi seems to work fine

I switched yesterday from 19.03 to the 19.09-release channel on my tablet and the boot now complains with the same message. This is one of those x64 tablets that has a i686 EFI and thus requires boot.loader.grub.forcei686.

In the meantime, I'm booting via 32-bit USB grub with

configfile (hd1,3)/grub/grub.cfg

I tried the rm /sys/firmware/efi/efivars/dump-*, same result.

What does grub 2.04 offer? Couldn't it just be rolled back to 2.02 in the release-19.09 branch?

I fixed my boot by just moving away all files in /boot and nixos-rebuild --install-bootloader switch.

I just had the same issue after switching to 19.09 and switching back to 19.03.

Timeline:

  • nix-channel --remove nixos
  • nix-channel add https://nixos.org/channels/nixos-19.09 nixos
  • nixos-rebuild switch
  • reboot now
  • booted fine, but there was an unrelated problem, so decided to rollback
  • nixos-rebuild switch --rollback
  • reboot now
  • got the error: symbol 'grub_file_filters' not found" error

I don't use EFI.

Solved it by reinstalling the bootloader with a usb stick.

In order to avoid breaking boots on the 19.09 release, I'm in favour of rolling back to 2.02 on 19.09 before the release (but leaving 2.04 on master).

I'm unsure how to address this problem in general. A naive solution based on my incomplete understanding of the issue seems to me to have versioning for grub's module directories, e.g. have /boot/grub-2.02/x86_64-efi and /boot/grub-2.04/x86_64-efi distinct so that both versions of GRUB can work. Overall though, this seems like another case of bootloaders being hard to upgrade, a problem that @samueldr has been thinking about a fair bit iirc. Maybe you can say something about this?

Reverted to 2.02: 4eb9725, 862f05c

I can't reproduce this. Using EFI on latest master has no issues. I'm going to close this and if someone else can reproduce they can re-open and provide details for reproduction.

okay this is happening to be me right now after switching to unstable and running
sudo nixos-rebuild switch --upgrade

this is my complete configuration:

{ config, pkgs, ... }:

{
  imports =
    [
      ./hardware-configuration.nix
    ];
  boot.loader = {
    efi = {
      canTouchEfiVariables = true;
      efiSysMountPoint = "/boot";
    };
    grub = {
       devices = [ "nodev" ];
       enable = true;
       version = 2;
       useOSProber = true; 
       efiSupport = true;
       efiInstallAsRemovable = false;
    };
  };
  networking.hostName = "xxxxx";
  networking.networkmanager.enable = true;
  networking.useDHCP = false;
  networking.interfaces.eno1.useDHCP = true;
  console.font = "Lat2-Terminus16";
  console.keyMap = "us";
  i18n.defaultLocale = "en_US.UTF-8";
  time.timeZone = "Europe/Berlin";
  nixpkgs.config.packageOverrides = pkgs: {
    nur = import (builtins.fetchTarball "https://github.com/nix-community/NUR/archive/master.tar.gz") {
      inherit pkgs;
    };
  };
  environment.systemPackages = with pkgs; [ wget vim curl git firefox python3 ntfs3g alacritty ark unzip ];
  programs.gnupg.agent = { enable = true; enableSSHSupport = true; };
  services.pcscd.enable = true;
  services.udev.packages = [
    pkgs.yubikey-personalization
    pkgs.libu2f-host
  ];
  services.openssh.enable = true;
  services.printing.enable = true;
  sound.enable = true;
  hardware.pulseaudio.enable = true;
  services.xserver.enable = true;
  services.xserver.layout = "us";
  services.xserver.xkbOptions = "eurosign:e";
  services.xserver.videoDrivers = [ "amdgpu" ];
  services.xserver.displayManager.sddm.enable = true;
  services.xserver.desktopManager.plasma5.enable = true;
  services.xserver.windowManager.i3.enable = true;
  users.users.xxxx = {
    isNormalUser = true;
    extraGroups = [ "wheel" "networkmanager" ];
    openssh.authorizedKeys.keys = [
      "ssh-rsa xxxxx"
    ];
  };
  system.stateVersion = "19.09";
}

i am currently booted into this system using the installer and refind.

Edit: switching back to stable solved it for me

@disassembler did you test on an installation that was using 2.02 before 2.04?

I've tested in a VM, starting with the following config:

{ config, pkgs, ... }: {
  imports = [ ./hardware-configuration.nix ];
  boot.loader.grub = {
    enable = true;
    efiSupport = true;
    device = "nodev";
  };
  boot.efi.canTouchEfiVariables = true;
  system.stateVersion = "19.09";
}
  • Installing 19.09 (grub 2.02)
  • Booting said 19.09
  • Upgrading to 20.03beta (nix-channel --add https://nixos.org/channels/nixos-20.03 && nixos-rebuild boot --upgrade)
  • Rebooting
  • Going back to 19.09 with grub installed as removable (add efiInstallAsRemovable = true; and remove canTouchEfiVariables, rm -r /boot/* && nix-channel --rollback && nixos-rebuild boot)
  • Rebooting
  • Going to 20.03beta again (nixos-rebuild boot --upgrade)
  • Rebooting

tl;dr, I've been unable to reproduce this. @ZerataX do you still have this problem if you upgrade to the beta or unstable again?

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/go-no-go-meeting-nixos-20-03-markhor/6495/16

just tried again to upgrade to unstable and again get the grub_file_filters not found error

$ nixos-version
19.09.2370.e10c65cdb35 (Loris)
$ sudo nix-channel --add https://nixos.org/channels/nixos-unstable nixos
$ sudo nixos-rebuild switch --upgrade
$ reboot

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/go-no-go-meeting-nixos-20-03-markhor/6495/19

That was probably already discussed somewhere, but:

Is there any problem with rolling back to 2.02 again? That version works, right?

elementary OS actually still uses 2.02 (5.1.2 release from 2020-02-07) also Fedora 31 from 2019-10-29, but even the two last debian releases use 2.04. but having a broken bootloader is worse than using older versions than debian

We could also implement this logic

IF efiSupport = true; THEN package 2.02 ELSE 2.04

True, or it could be based on stateVersion.

I just upgraded 19.09 -> 20.03 this morning and ran into this issue. I attempted to reinstall the bootloader using the little script on https://nixos.wiki/wiki/Bootloader, but no luck.

Here's my relevant config:
https://gist.github.com/emptyflask/7aa04f800321c2483574f8985e26bea0

Still trying to restore my system...

I noticed that /boot/EFI/NixOS-boot/grubx64.efi hasn't been touched -- it still has a timestamp of Mar 15 2019. Shouldn't that be replaced?

@emptyflask in case it helps:

Workaround for booting with GRUB 2.04:

  • Boot the NixOS installation image (or something else that has/is rEFInd) via e.g. an USB stick
  • Select Boot Fallback boot loader from EFI
  • Now you should see the usual GRUB boot menu (with all NixOS generations, etc.)
  • After a successful boot: Reverting to GRUB 2.02 should permanently fix the problem(?)

At least that worked for me (but: This is from my memory and online screenshots, therefore I might have missed some steps). I think these steps where posted in a related issue, but I didn't find the link again :o

Regarding the GRUB regression (2.02 -> 2.04)

The issue is most likely hardware related. E.g. in my case Boot Fallback boot loader from EFI should boot the same EFI binary as without rEFInd (but I forgot to verify that last time). So this might actually be some weird issue during the transition to GRUB during the boot process (and therefore only affecting some devices).

That actually does help, I was able to boot into my normal system using rEFInd. Thanks!

So to summarize the issue above, there are 3 components:

  • the firmware (BIOS or UEFI)
  • the grub core image
  • grub modules

The issue boils down to your firmware and your OS disagreeing about where the grub core image is.

The firmware points to the grub core image. The core image loads modules. The interface between core and modules is not stable.

If your OS is installing core image and modules to location A, but your firmware is loading the core image from location B, then at some point the (old, non-updated) core image at location B is not able to load the (new, updated) modules at location A.

The problem has always been there (mismatched core and modules), but you hadn't noticed until the interface between the two broke.

It does appear that there might be some duplication in /boot, maybe from attempting to use boot.loader.grub.efiInstallAsRemovable? I'm particularly wondering about /boot/EFI/NixOS-boot/grubx64.efi, nothing seems to touch it, and it's got a timestamp from a year ago.

(I've omitted a bunch of modules and Microsoft-related things)

.
β”œβ”€β”€ background.png
β”œβ”€β”€ converted-font.pf2
β”œβ”€β”€ EFI
β”‚Β Β  β”œβ”€β”€ Boot
β”‚Β Β  β”‚Β Β  └── bootx64.efi
β”‚Β Β  β”œβ”€β”€ grub
β”‚Β Β  β”‚Β Β  └── grubx64.efi
β”‚Β Β  β”œβ”€β”€ Microsoft
β”‚Β Β  β”‚Β Β  └── Boot
β”‚Β Β  β”œβ”€β”€ nixos
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 09iivcwr1b2ijxxa4z7bcnpfjwq9cap7-initrd-linux-4.19.101-initrd.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 502dhxra32pxv99zm63m2si006bdaijc-linux-4.19.109-bzImage.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ aqdfl1zp8d767nng5w3wh3wv54npjb8y-initrd-linux-5.4.33-initrd.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ dpp68ayvn1xmx9d8wld1fjp8ax6lz2k5-initrd-linux-4.19.109-initrd.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ h9w801h7y09a315xi7x4pskpn8y4i7xf-initrd-linux-4.19.113-initrd.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ia4zbwrkcigbiil3vhhfwjji0gn7m9yr-linux-4.19.96-bzImage.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ jg9gfc84svh76nkj2am2jxq977bqd2k8-linux-4.19.113-bzImage.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ k7f7l104af1ny3sliwpxybzf6dy5060l-linux-4.19.101-bzImage.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ l3389n3q8cas7z5ybbwga251hwx9m6gv-initrd-linux-5.4.33-initrd.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ li3v6p9mqsspm0zgglzba2szab2zdmiv-linux-5.4.33-bzImage.efi
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ w8yq408lfmszipyjxl9swbajjmsmkyza-initrd-linux-4.19.113-initrd.efi
β”‚Β Β  β”‚Β Β  └── xmknk1cjh8kgpl6j1n95a7b2fhmk7saz-initrd-linux-4.19.96-initrd.efi
β”‚Β Β  └── NixOS-boot
β”‚Β Β      └── grubx64.efi
β”œβ”€β”€ grub
β”‚Β Β  β”œβ”€β”€ fonts
β”‚Β Β  β”œβ”€β”€ grub.cfg
β”‚Β Β  β”œβ”€β”€ grubenv
β”‚Β Β  β”œβ”€β”€ locale
β”‚Β Β  β”œβ”€β”€ state
β”‚Β Β  └── x86_64-efi
β”‚Β Β      β”œβ”€β”€ core.efi
β”‚Β Β      └── grub.efi
β”œβ”€β”€ kernels
β”‚Β Β  β”œβ”€β”€ aqdfl1zp8d767nng5w3wh3wv54npjb8y-initrd-linux-5.4.33-initrd
β”‚Β Β  β”œβ”€β”€ jg9gfc84svh76nkj2am2jxq977bqd2k8-linux-4.19.113-bzImage
β”‚Β Β  β”œβ”€β”€ l3389n3q8cas7z5ybbwga251hwx9m6gv-initrd-linux-5.4.33-initrd
β”‚Β Β  β”œβ”€β”€ li3v6p9mqsspm0zgglzba2szab2zdmiv-linux-5.4.33-bzImage
β”‚Β Β  └── w8yq408lfmszipyjxl9swbajjmsmkyza-initrd-linux-4.19.113-initrd
β”œβ”€β”€ loader
β”‚Β Β  β”œβ”€β”€ entries
β”‚Β Β  β”‚Β Β  └── nixos-generation-154.conf
β”‚Β Β  └── loader.conf
└── System Volume Information

If you're not dual-booting, it should be safe to remove all of /boot (keep a backup in order to be able to reproduce the error again), then rerun nixos-rebuild boot. AFAIU, that should either (a) make everything work correctly, since the grub image will only be in one place or (b) break your boot (have your rEFInd USB stick at the ready!). The latter should only happen if either (i) both canTouchEfiVariables and efiInstallAsRemovable are set to false or (ii) your firmware is broken (disregards the boot order specified by the OS) or (iii) your firmware is not configured to use the fallback path (bootx64.efi). In any case, I'd consider removing the bad state as the right solution here.

@lheckemann I removed all of /boot and then nixos-rebuild boot. I am not able to boot anymore without usb. If boot with usb i stuck at grub (GNU GRUB version 2.04 Minimal BASH-line editing is supported. ...).
At this point I tried:
grub> set root=(hd1,gpt1)
grub> linux /efi/nixos/hash-linux-<number>-bzImage.efi .... root=LABEL=NIXOS_ISO (i previously set this label to my usb)
grub> initrd /efi/nixos/initrd-linux-<number> ....
grub> boot
I end up at this problem https://github.com/NixOS/nixpkgs/issues/6265:
like

timed out waiting for device /dev/root, trying to mount anyway.
mounting /dev/root on /iso...
mount: mounting /dev/root on /mnt-root/iso failed: No such file or directory

An error occurred in stage 1 of the boot process, which must mount the root filesystem on /mnt-root' and then start stage 2. Press one of the following keys:
  r) to reboot immediately
  *) to ignore the error and continue

If I continue i end up in kernel panic.
Any ideas?

Update1: Found an old USB with nixos-19.03 with GRUB 2.02. Seems like the installer is working. rEFInd aint working. This means i have to totally fresh install nixos to upgrade it then to 20.03. which is not the optimal option, but ok! Keep you updated.

Update2: This actually could be an option too: https://gist.github.com/chris-martin/4ead9b0acbd2e3ce084576ee06961000

@saggzz sorry about that! Do you have either or both of canTouchEfiVariables or efiInstallAsRemovable set?

As for reinstalling: you don't have to do a totally fresh install. You can boot the installation ISO, mount the filesystems, and run nixos-install. It will rebuild the system profile and set up the bootloader etc. without wiping anything. If you pass -I nixpkgs=channel:nixos-20.03 it should directly install NixOS 20.03, even from the 19.03 installer image. The one thing you may need to do after installation is make sure that the channel is set correctly (sudo nix-channel --add https://nixos.org/channels/nixos-20.03 nixos && sudo nix-channel --update) so that you don't accidentally downgrade later.

EDIT: you can also get a shell in the initramfs to try and mount your root filesystem manually by passing boot.shell_on_fail on the kernel command line. Then you can try and enter the system by pressing f at the prompt and then using the following commands:

mount /dev/sda2 /mnt-root # substitute device name as appropriate
exec switch_root /mnt-root /nix/var/nix/profiles/system/init

@saggzz when I had problems with GRUB, I followed the steps here to re-install.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nixos-grub-doesnt-see-windows/6811/4

@lheckemann thank you for the hints. Either canTouchEfiVariables nor efiInstallAsRemovable had been set. I tried to nixos-install and mount the filesystems, but no filesystems were found. This brought me to the conclusion that something was totally messed up there. I fresh installed nixos und used the -I nixpkgs=channel:nixos-20.03 option - worked!
@asymmetric thank you, i switched to systemd-boot for now!

so I finally now have it booting normally and all I did afaik is change efiInstallAsRemovable to true
and canTouchEfiVariables to false instead of the inverse.
Seems like I can perfectly dual boot now and I'm on 20.03

Oh I also changed the efi mount point around and I think that may have cleaned up the efi directory, though it's now back to where I started.

I do confirm that grub2 2.04 boots NixOS in the following scenario:

  • nixos-version: 19.09.1320.4ad6f1404a8

  • nix { boot.loader.systemd-boot.enable = false; boot.loader.efi.efiSysMountPoint = "/boot/efi"; boot.loader.efi.canTouchEfiVariables = false; boot.loader.grub.efiInstallAsRemovable = true; boot.loader.grub.device = "nodev"; }
  • zstd btrfs compression affected files: init, initrd
  • packageOverrides: grub2 from unstable

Notably, I was unable to boot with grub2 2.02 until adding the override. _zstd compression support appeared in grub2 2.04._

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/good-filesystem-for-the-nix-store/3566/10

Today I got this upgrading to 20.03. I tried a couple times following @lheckemann steps:

  • boot the installation ISO
  • mount the filesystems
  • run nixos-install

with no success until I removed the content of /boot(recreating it anew).

Does this kind of error happen also in unstable channels or it is solved there?

Glad to hear removing all of /boot helped. This isn't fixed anywhere because it's a bit difficult to detect β€” on EFI systems, where it's most likely to occur, we could check the BootCurrent EFI variable and the corresponding boot entry, then throw a warning if it doesn't match the current bootloader installation path. Someoneβ„’ would need to implement that though :)

Thank you again @lheckemann

In fact, after recreating it, I had to fix it manually (it created EFI/EFI/etc). Not sure if this is strictly related to the issue or I just messed it during attempts :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  Β·  3Comments

domenkozar picture domenkozar  Β·  3Comments

retrry picture retrry  Β·  3Comments

lverns picture lverns  Β·  3Comments

copumpkin picture copumpkin  Β·  3Comments