Dietpi: x86_64 | Boot fails after grub upgrade

Created on 29 Jul 2020  路  23Comments  路  Source: MichaIng/DietPi

ADMIN EDIT

A grub bootloader upgrade is available which can leave your VM or PC in an unbootable state. To fix this, grub has to be told where to install the new grub bootloader. By default the root partition is /dev/sda1 and the bootloader is loaded from the underlying virtual disk /dev/sda. The fix would be accordingly:

debconf-set-selections <<< 'grub-pc grub-pc/install_devices multiselect /dev/sda'
apt update
apt install --reinstall grub-pc

But if you manually created your image or customised the root/boot partition setup, then this might be different. Check lsblk to see your currently attached disks. In case create a snapshot before doing any dietpi-update/dietpi-software/apt-get call which includes the grub upgrade.
VMware is NOT affected, since the entry there is /dev/sda by default.

Background

The Debian installer stores the boot disk via its ID to the debconf database, which looks like this on VirtualBox:

/dev/disk/by-id/ata-VBOX_HARDDISK_VB9b1ac4e2-831bcefc

The problem is that this is a hardware identifier, not one that is stored inside the disk image (e.g. like the file system UUIDs). So when you load the appliance on (or flash the image to) another machine, this identifier will be different, as the underlying hardware (virtual or physical) has changed. Since there does not exist any unique software identifier for a parent disk, `/dev/sda' is the best guess that works in all cases where the image is started off a regular (physical or virtual) IDE/SATA/USB drive.


Hi @MichaIng

today I was running some software installations on my DietPi VM. As usual during dietpi-software installations, apt update && apt upgrade are called. Fist, I was not taking note as I was not expecting something special. However, after dietpi-software installations finished and my VM rebooted, I was not able to connect. Luckily it's quite easy to check what happen on a VM 馃槈 and my VM was hanging with some GRUB error messages grub-calloc not found. I reset the VM and tried using dietpi-software again, resulting on the same issue. Therefore I was going to run apt update && apt upgrade manually. The following packages are listed as upgradable.

root@DietPiVM1:~# apt list --upgradable
Listing... Done
grub-common/stable 2.02+dfsg1-20+deb10u1 amd64 [upgradable from: 2.02+dfsg1-20]
grub-pc-bin/stable 2.02+dfsg1-20+deb10u1 amd64 [upgradable from: 2.02+dfsg1-20]
grub-pc/stable 2.02+dfsg1-20+deb10u1 amd64 [upgradable from: 2.02+dfsg1-20]
grub2-common/stable 2.02+dfsg1-20+deb10u1 amd64 [upgradable from: 2.02+dfsg1-20]

During installation I got an interactive screen for Package configuration

picture

I guess it's suppressed if apt is called from G_AGUG. Not sure if this is just an effect on my VM. But at least something I would like to share with you 馃槂

External Bug Information Solution available x86_64

Most helpful comment

For existing systems, it should be save to run apt update && apt upgrade manually. If needed grub will popup with an interactive dialog to select boot partition. Anyway as you are on VMWare, you can create a snapshot before running the update. If thinks goes wrong, you are able to restore.

All 23 comments

Hi many thanks for reporting. I recognised the update, will just try it on VirtualBox.

Actually the UUID should not change since it is indeed not trivial to reconfigure grub then and one cannot simply edit some config files (was tinkering with this for quite a while). Let me see if I get the same on VirtualBox. Which software do you use?


Verified, dammit. Not sure how this happened as a changed UUID should as said break boot completely in the first place. The grub configs are based on UUIDs thoroughly. And by default it now skips installing grub completely or what instead of flashing it to the root device drive which is true in 99,999% or cases and would still allow boot in all other cases. I'd call this an external bug. Will test on VMware.

Automated install:

grub-install: error: cannot find a GRUB drive for /dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f. Check your device.map.

But sadly this does not make the install exit with error code, even that it is boot-critical 馃槥.

well, I was able to finalise GRUB update by selecting /dev/sda during apt upgrade. But yeah, that's not working if you run apt without interactive screen 馃

Hmm, not sure if the UUID changed, but the "disk id" did:

2020-07-30 19:45:40 root@VM-Buster:~# l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root  9 Jul 30 19:39 ata-VBOX_HARDDISK_VB83654da6-23e3f789 -> ../../sda
lrwxrwxrwx 1 root root 10 Jul 30 19:39 ata-VBOX_HARDDISK_VB83654da6-23e3f789-part1 -> ../../sda1

The UUID btw is not stored in any VirtualBox config file. All drives have some UUID-style IDs there but those are only for the frontend software, reasonably as otherwise every snapshot should lead to a new UUID. The final file system UUID is stored regularly on disk data and hence should not have change at any time.

Checking /boot/grub/grub.cfg contains the correct UUID at many places but I don't see this disk identifier from the error message anywhere.

Found it:

2020-07-30 21:33:38 root@VM-Buster:~# debconf-get-selections | grep grub
grub-pc grub-pc/install_devices_failed_upgrade  boolean true
grub-pc grub-pc/install_devices_empty   boolean false
# Remove GRUB 2 from /boot/grub?
grub-pc grub-pc/postrm_purge_boot_grub  boolean false
grub-pc grub-pc/install_devices multiselect     /dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f
grub-pc grub-pc/kopt_extracted  boolean false
grub-pc grub2/kfreebsd_cmdline  string
grub-pc grub-pc/hidden_timeout  boolean false
grub-pc grub-pc/timeout string  0
grub-pc grub-pc/mixed_legacy_and_grub2  boolean true
grub-pc grub-pc/install_devices_disks_changed   multiselect     /dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f
grub-pc grub2/update_nvram      boolean true
grub-pc grub2/force_efi_extra_removable boolean false
grub-pc grub-pc/chainload_from_menu.lst boolean true
grub-pc grub2/linux_cmdline     string  net.ifnames=0
grub-pc grub2/kfreebsd_cmdline_default  string  quiet
grub-pc grub-pc/install_devices_failed  boolean false
grub-pc grub2/linux_cmdline_default     string  consoleblank=0 quiet

This selection must have been done during Debian installer run since AFAIK there was no other grub update meanwhile. I wonder why this is not stored as UUID but with this strange symlink as identifier 馃. No idea how this might change or in which situation.

@Joulinar
Can you compare if those values match your VM? Should be the case when you downloaded the current image (January). If this is the case we could do a relatively simple fix by checking the UUID or this above identifier, and if this matches ours, we can be sure that the user has not done any custom disk/partition setup and can safely fix the debconf entry. (debconf-get-selections requires debconf-utils package)

WMware is not affected, there only grub-pc grub-pc/install_devices multiselect /dev/sda is given which works fine. I'd prefer UUID-based entry, however no idea how the Debian installer or package defaults choose that since both images where done exactly the same way with exactly the same installer 馃.

@MichaIng
this is how it looks for the VM1, where I did the grub update yesterday

before grub update

root@DietPiVM1:~# debconf-get-selections | grep grub
grub-pc grub-pc/install_devices_disks_changed   multiselect
grub-pc grub2/kfreebsd_cmdline_default  string  quiet
grub-pc grub2/update_nvram      boolean true
grub-pc grub-pc/install_devices_failed_upgrade  boolean true
grub-pc grub-pc/mixed_legacy_and_grub2  boolean true
grub-pc grub-pc/chainload_from_menu.lst boolean true
grub-pc grub-pc/install_devices_failed  boolean false
grub-pc grub2/force_efi_extra_removable boolean false
grub-pc grub2/linux_cmdline_default     string  quiet
grub-pc grub-pc/kopt_extracted  boolean false
# Remove GRUB 2 from /boot/grub?
grub-pc grub-pc/postrm_purge_boot_grub  boolean false
grub-pc grub-pc/install_devices multiselect     /dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f
grub-pc grub-pc/hidden_timeout  boolean false
grub-pc grub-pc/install_devices_empty   boolean false
grub-pc grub-pc/timeout string  5
grub-pc grub2/linux_cmdline     string
grub-pc grub2/kfreebsd_cmdline  string
root@DietPiVM1:~#

after grub update

root@DietPiVM1:/mnt/samba/VM# debconf-get-selections | grep grub
grub-pc grub-pc/timeout string  0
grub-pc grub-pc/install_devices multiselect     /dev/disk/by-id/ata-QEMU_HARDDISK_QM00005
grub-pc grub-pc/kopt_extracted  boolean false
grub-pc grub-pc/install_devices_disks_changed   multiselect     /dev/disk/by-id/ata-QEMU_HARDDISK_QM00005
grub-pc grub2/kfreebsd_cmdline  string
grub-pc grub-pc/chainload_from_menu.lst boolean true
grub-pc grub2/linux_cmdline_default     string  consoleblank=0 quiet
grub-pc grub-pc/hidden_timeout  boolean false
grub-pc grub-pc/install_devices_empty   boolean false
# Remove GRUB 2 from /boot/grub?
grub-pc grub-pc/postrm_purge_boot_grub  boolean false
grub-pc grub2/linux_cmdline     string  net.ifnames=0
grub-pc grub-pc/install_devices_failed_upgrade  boolean true
grub-pc grub2/kfreebsd_cmdline_default  string  quiet
grub-pc grub-pc/install_devices_failed  boolean false
grub-pc grub2/update_nvram      boolean true
grub-pc grub2/force_efi_extra_removable boolean false
grub-pc grub-pc/mixed_legacy_and_grub2  boolean true
root@DietPiVM1:/mnt/samba/VM#

I tried to setup a new VM as well but it fails because the first run triggers apt upgrade immediately. As expected, no chance to hold back grub update.

New image creation:
Unbenannt
Here the device is selected during Debian install selections.

After DietPi-PREP, this is still correct:

root@DietPi:~# ls -l /dev/disk/by-id/ata-VBOX_HARDDISK_VB258cc6ea-86257ac4
lrwxrwxrwx 1 root root 9 Jul 30 21:01 /dev/disk/by-id/ata-VBOX_HARDDISK_VB258cc6ea-86257ac4 -> ../../sda

It is stored as well in debconf that way:

root@DietPi:~# debconf-get-selections | grep grub
grub-pc grub2/kfreebsd_cmdline  string
grub-pc grub2/update_nvram      boolean true
grub-pc grub-pc/install_devices_disks_changed   multiselect
grub-pc grub-pc/install_devices_empty   boolean false
grub-pc grub-pc/kopt_extracted  boolean false
grub-pc grub2/linux_cmdline     string
grub-pc grub-pc/install_devices multiselect     /dev/disk/by-id/ata-VBOX_HARDDISK_VB258cc6ea-86257ac4
grub-pc grub-pc/timeout string  5
grub-pc grub2/force_efi_extra_removable boolean false
grub-pc grub-pc/install_devices_failed  boolean false
grub-pc grub2/linux_cmdline_default     string  quiet
grub-pc grub2/kfreebsd_cmdline_default  string  quiet
grub-pc grub-pc/mixed_legacy_and_grub2  boolean true
grub-pc grub-pc/install_devices_failed_upgrade  boolean true
# Remove GRUB 2 from /boot/grub?
grub-pc grub-pc/postrm_purge_boot_grub  boolean false
grub-pc grub-pc/hidden_timeout  boolean false
grub-pc grub-pc/chainload_from_menu.lst boolean true

Now that I see yours ata-QEMU_HARDDISK_QM00005 it seems that this identifier is given from the emulator. So this is doomed to be wrong. Most likely this even changes when exporting and importing the appliance. This must be instead changed to a UUID-based identifier....盲盲hm, nope not possible since this must point to the disk, not the partition or file system 馃.

No chance, it must be /dev/sda like it is on VMware. Unlikely that someone changes the root drive the /dev/sdb or such, so this is the best we can do. Damn thing that there is no unique identifier for a drive/block device without file system. /dev/disk/by-path/ and /dev/disk/by-id/ both depend on the machine and port you mount the drive/image:

debconf-set-selections <<< 'grub-pc grub-pc/install_devices multiselect /dev/sda'

Lol while packing the image, there was just another grub upgrade coming in. It is not even documented yet: https://packages.debian.org/buster/grub-pc

ii  grub-pc        2.02+dfsg1-20+deb10u2 amd64        GRand Unified Bootloader, version 2 (PC/BIOS version)

I would not wonder if this has something to do with our issue, lets see 馃槃.

Related: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966575
But maintainer is out of order 馃槃.

EDIT: Nope, fix is for: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966554

Fixed for new images: https://github.com/MichaIng/DietPi/commit/05a3eee0103f06d5389f5faa2cb1192b26237552
From what I see there is no 100% reliable way. This path might be wrong in case of LVM or when flashed to real or virtual (e)MMC device or such. The only thing that definitely stays correct is the UUID but this does not exist for the parent device, only the partition/filesystem. So /dev/sda is just the most common best guess we can use for all x64 systems. It is a pain that grub does not either guess better (although you might not want it to override another bootloader non-interactively either) or fail the upgrade. Best would be to check for proper debconf database entry first, and if called non-interactively (DEBIAN_FRONTEND=noninteractive) and the entry is invalid, break install in the first place so that the system is not left in a unbootable state.

The question is if/what we do with existing systems. For now I'll create a MOTD to direct VirtualBox users here.

Let's see what the maintainer will come up with, once he is back from holidays 馃ぃ

馃埊 VMware

Indeed on VMware the Debian installer does not offer to select the grub target via ID but just via regular path /dev/sda:
vmware

The drive does not even have an ID, the related by-id path does not exist:

root@DietPi:~# ls -al /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root  60 Jul 31 19:31 .
drwxr-xr-x 6 root root 120 Jul 31 19:31 ..
lrwxrwxrwx 1 root root   9 Jul 31 19:31 ata-VMware_Virtual_IDE_CDROM_Drive_10000000000000000001 -> ../../sr0

The one that exist is the CD/DVD the installer ISO was mounted to 馃槃.

So it seems that the Debian installer (or grub package defaults) use the hardware ID of the disk, if available, else the regular device path as fallback. This behaviour indeed induces this issue in every case of image creation/export, since the stored ID is invalid on the new machine.

But what we can do is updating this entry as part of dietpi-firstboot so that it stays as well correct when the image is bootet from a non /dev/sd* disk.

I decided to fix the issue carefully for our users, at least for those which run the latest images. We can do it safely there since we know the exact hardware ID that is stored for the disk and can replace this very specific string in debconf database. We will prevent a large number of users from running into a non-bootable system. We cannot break something, since this very specific hardware ID is 100% wrong anyway.
For VirtualBox, MOTD fix is live:

[[ -w '/var/cache/debconf/config.dat' ]] && grep -q '/dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f' /var/cache/debconf/config.dat && sed -i 's|/dev/disk/by-id/ata-VBOX_HARDDISK_VBeda89797-79ca820f|/dev/sda|' /var/cache/debconf/config.dat

For Native PC BIOS:

[[ -w '/var/cache/debconf/config.dat' ]] && grep -q '/dev/disk/by-id/ata-M4-CT128M4SSD2_000000001151090024A1' /var/cache/debconf/config.dat && sed -i 's|/dev/disk/by-id/ata-M4-CT128M4SSD2_000000001151090024A1|/dev/sda|' /var/cache/debconf/config.dat
  • Lol the identifier reveals the SSD (Samsung M4 128M) where this image was build on 馃ぃ.

Directly altering the file is MUCH faster then executing the debconf commands and the only way to check for current entry without installing debconf-utils first 馃槈.

Is this fix needed for VMWare images? Do I need to run this prior to a dietpi-update or apt upgrade commands?

[[ -w '/var/cache/debconf/config.dat' ]] && grep -q '/dev/disk/by-id/ata-M4-CT128M4SSD2_000000001151090024A1' /var/cache/debconf/config.dat && sed -i 's|/dev/disk/by-id/ata-M4-CT128M4SSD2_000000001151090024A1|/dev/sda|' /var/cache/debconf/config.dat

I am using v6.31.2 and I have pending apt updates, but I am concerned about blowing up my system XD Thanks.

I updated all our x86_64 Images, also VMware was not affected since it does have any hardware device IDs for the virtual drives.

For existing systems, it should be save to run apt update && apt upgrade manually. If needed grub will popup with an interactive dialog to select boot partition. Anyway as you are on VMWare, you can create a snapshot before running the update. If thinks goes wrong, you are able to restore.

Anyway as you are on VMWare, you can create a snapshot before running the update. If thinks goes wrong, you are able to restore.

For VirtualBox definitely true. I create a snapshot whenever I need to shutdown anyway, respectively update the existing one. But how do you do that on VMware (workstation player)? I never found a snapshot feature there which is one of the reasons I stick with VirtualBox. Of course one can copy the vmdk (+vmx) but then that doubles disk space and is still more manual keyboard action compared to a built-in snapshot on VirtualBox 馃.

AHH damnit. You are right. It's a Pro feature and not possible on player. I thought it was possible. Yeah my VMs are hosted on a Synology box. I should have verified my answer before. Sorry for that. Anyway coping the files is than the safest way.

Perfectly fine, I was just curious if I'd overseen something. Would have been a reason to reconsider switching to VMware player 馃檪.

I can confirm that the issue affects vm on ESXi 6.7.0 U3
An update of DietPi from v6.30.0 to v6.31.2 does not work with GRUB config.

you could run apt update && apt upgrade manually, before running dietpi-update. If needed grub will popup with an interactive dialog to select boot partition.

All done our side, but I keep this issue open and attach to next milestone as I want to track the solution on Debian side: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966575

Still no update from the bug report, but more users reporting their stories of running into this. We'll need to keep this open and communicated until a new version of grub handles changed drive IDs more gracefully. We set the mostly working /dev/sda via MOTD previously and on v6.32 update now, but that does not work for any x86 system on NVMe, eMMC or other non-IDE/SATA/USB drive, or multi OS platforms where DietPi (grub) is stored on e.g. /dev/sdb. I'm still wondering why this was never an issue before, was it?

Finally, the bug has been solved 馃槂:

The install is simply skipped, but that is perfectly fine compared to an unbootable system.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MichaIng picture MichaIng  路  3Comments

k-plan picture k-plan  路  3Comments

Fourdee picture Fourdee  路  3Comments

Fourdee picture Fourdee  路  3Comments

pfeerick picture pfeerick  路  3Comments