Zfs: zpool.cache not updated when adding a pool

Created on 29 Mar 2019  路  14Comments  路  Source: openzfs/zfs

System information


Type | Version/Name
--- | ---
Distribution Name | Ubuntu
Distribution Version | 18.04
Linux Kernel | 4.18.0-16-lowlatency #17~18.04.1-Ubuntu SMP PREEMPT Tue Feb 12 16:37:17 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Architecture | x86_64
ZFS Version | 0.7.9-3ubuntu6
SPL Version | 0.7.9-3ubuntu2

Adding a pool newpool/newfs prevents booting.

STEPS TO RECREATE

  1. Install ZFS root per this: https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS
  2. Bring up new system.
  3. Add a second drive /dev/sdb and add zpool with "zpool create -f sdb sdb"
  4. add ZFS to zpool sdb with "zfs create sdb/newfs"
  5. set a mountpoint with: zfs set mountpoint=/path/to/mountpoint/ sdb/newfs
  6. reboot.
  7. system does not mount the new sdb/newfs on boot or see the zpool.

Attempts at remediation:

  1. Tried setting mountpoint=legacy and updating fstab and that fails, forcing an unclean boot into maintenance mode that requires running "zpool import -a" to continue.
  2. Tried updating /etc/zfs/zpool.cache by running "zpool set cachefile=/etc/zfs/zpool.cache sdb"
  3. Tried adding /etc/modprobe.d/zfs.conf containing "zfs_autoimport_disable=0"
  4. Tried adding a second systemd service per section 4.10 of https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS
  5. Tried running update-initramfs -u -k all after the pool had been imported to no avail.

nothing shows up in syslog indicating an error.

Defect Question

Most helpful comment

I've hit the same error following Ubuntu 18.04 Root on ZFS how-to, albeit in automated fashion, made with this automated script.

After examination of boot sequence and module/zfs/spa_config.c source code, I've found that currently in-core state is always pushed to the disk, there is no disk-to-in-core sync ever made.
As a result, after loading initramfs and mounting root filesystem, any pool import even with cachefile=none results in syncing current in-core state to disk, zapping any previous pools, stored before reboot inside /etc/zfs/zpool.cache.
In our particular case for Ubuntu, it happens when zpool-import-bpool.service runs /sbin/zpool import -N -o cachefile=none bpool command. You can see it for yourself, just add these two lines to zpool-import-bpool.service

ExecStartPre=/bin/sh -c 'ls -al /etc/zfs/'
ExecStartPost=/bin/sh -c 'ls -al /etc/zfs/'

Next, import any user pool you want, reboot and check the service output via systemctl status -l zfs-import-bpool. You will see the cache content zeroed after bpool cachefile import. Immediately after bpool import, in-core updated state is synced to disk, but because both rpool and bpool pools are imported with cachefile=none options, there will be zero exports, as coded in spa_write_cachefile function, look here for code.

Temporary remediation for this bug would be a simple update to the zfs-import-bpool.service, like this:

ExecStartPre=/bin/sh -c '[ -f /etc/zfs/zpool.cache ] && mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache || true'
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
ExecStartPost=/bin/sh -c '[ -f /etc/zfs/preboot_zpool.cache ] && mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache || true'

With this approach, if zpool.cache is present, it will survive bpool load, and will be taken into account with zpool-import-cache.service, which essentially first loads cache config in-core and then it re-synced back with updated txn ids.

As for permanent solution and pull request, i'm a bit lost here, because there is an open ticket for retiring the cache.
IMHO best approach would be to give initrd scripts possibility to do sync on it's own.

So, we either can add new command to zpool utility to resync SPA from disk, should boot sequence ever need this, then add it to initrd scripts for zfs appropriate command, or extend spa_write_cachefile function to check on write, whether root system just has been mounted, and if this is the case, first resync to in-core any pools from cachefile residing on rpool, provided pools are non-stalled.

All 14 comments

Followed the same guide https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS, and see the same problem on two systems running Ubuntu-18.04. Spend hours trying to figure out the problem, no success. Pools are not imported after a reboot.

/dev/sdX naming is not persistent across reboots and is not recommended for production pools. It may be causing conflicts preventing pool import. You can manually import using unique by-id names.

zpool import poolname -d /dev/disk/by-id/

/dev/sdX naming is not persistent across reboots and is not recommended for production pools. It may be causing conflicts preventing pool import. You can manually import using unique by-id names.

zpool import poolname -d /dev/disk/by-id/

tried that several times, same behavior. Tried also to remove cache file, it was recreated and filled by import command, but the pool is still not imported after reboot.

Also,
I followed the same guide https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS on Ubuntu 19.04 (even if not recommanded ...). Base system is OK, bpool / rpool are correctly imported.
Any pool created after are importable, but not automatically imported at boot.

Tryed also to change settings in /etc/default/zfs to bypass /etc/zfs/zpool.cache, but it had no effects.

FWIW, I'm seeing similar behavior on a Debian 10 system. I was moving from one pool to another (some history in https://github.com/zfsonlinux/zfs/issues/9107), and after getting everything in order I've realised the old pool isn't being imported.

I'm still using a cache file, and zfs-import-cache.service says:

Aug 03 23:19:48 boxoob systemd[1]: Starting Import ZFS pools by cache file...
Aug 03 23:19:48 boxoob zpool[4331]: no pools available to import
Aug 03 23:19:48 boxoob systemd[1]: Started Import ZFS pools by cache file.

upon starting. My /etc/zfs/zpool.cache has the boot pool (which is imported explicitly earlier) and the root pool (which is imported properly by initramfs) and the old pool. Or at least, I think that last pool is present in the cache file, judging by the presence of the disks hosting it in the output of strings /etc/zfs/zpool.cache. As far as I can tell the corresponding file in the initramd is the same.

I'm going to get rid of the cache file, in favour of explicit import via names in /etc/default/zfs, we'll see how this will go.

Also,
I followed the same guide https://github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS on Ubuntu 19.04 (even if not recommanded ...). Base system is OK, bpool / rpool are correctly imported.
Any pool created after are importable, but not automatically imported at boot.

Tryed also to change settings in /etc/default/zfs to bypass /etc/zfs/zpool.cache, but it had no effects.

Same for me, I resolved it creating a service to import the pool manually:

[Unit]
Description=Import data pool
Before=zfs-import-scan.service
Before=zfs-import-cache.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -d /dev/disk/by-id data

[Install]
WantedBy=zfs-import.target

I want to say I'm running into this same issue as well (using the same ZFS on root guide). I also solved it by creating a new systemd service to import the pool, but I'm not sure why it's necessary. I think the reason this is happening is that my /etc/zfs/zpool.cache file is recreated on boot for some reason. If I run strings /etc/zfs/zpool.cache immediately after boot I get (with hostname redacted):

rpool
version
name
rpool
state
pool_guid
errata
hostname
*******************
com.delphix:has_per_vdev_zaps
vdev_children
vdev_tree
type
root
guid
children
type
disk
guid
path
/dev/disk/by-id/nvme-PC401_NVMe_SK_hynix_1TB_EJ86N550010106HEB-part4
whole_disk
metaslab_array
metaslab_shift
ashift
asize
is_log
create_txg
com.delphix:vdev_zap_leaf
com.delphix:vdev_zap_top
features_for_read
com.delphix:hole_birth
com.delphix:embedded_data

After I import my other pool (named array), it becomes:

array
version
name
array
state
pool_guid
errata
hostname
*******************
com.delphix:has_per_vdev_zaps
vdev_children
vdev_tree
type
root
guid
children
type
mirror
guid
metaslab_array
metaslab_shift
ashift
asize
is_log
create_txg
com.delphix:vdev_zap_top
children
type
disk
guid
path
/dev/disk/by-id/wwn-0x5000cca252ccbc54-part1
whole_disk
create_txg
com.delphix:vdev_zap_leaf
type
disk
guid
path
/dev/disk/by-id/wwn-0x5000cca252ccbe62-part1
whole_disk
create_txg
com.delphix:vdev_zap_leaf
features_for_read
com.delphix:hole_birth
com.delphix:embedded_data
rpool
version
name
rpool
state
pool_guid
errata
hostname
*******************
com.delphix:has_per_vdev_zaps
vdev_children
vdev_tree
type
root
guid
children
type
disk
guid
path
/dev/disk/by-id/nvme-PC401_NVMe_SK_hynix_1TB_EJ86N550010106HEB-part4
whole_disk
metaslab_array
metaslab_shift
ashift
asize
is_log
create_txg
com.delphix:vdev_zap_leaf
com.delphix:vdev_zap_top
features_for_read
com.delphix:hole_birth
com.delphix:embedded_data

However, after I restart, but before I manually import my array pool the cache file is back to what it was.

I can't figure out why, but I wonder if when the bpool is imported if it is causing the cache file to be cleared. The service to do this from the wiki is:

    # vi /etc/systemd/system/zfs-import-bpool.service
    [Unit]
    DefaultDependencies=no
    Before=zfs-import-scan.service
    Before=zfs-import-cache.service

    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/sbin/zpool import -N -o cachefile=none bpool

    [Install]
    WantedBy=zfs-import.target

    # systemctl enable zfs-import-bpool.service

I have another install that followed this guide before the bpool was included, and I have had no such problems.

I don't know how to edit the zfs-import-bpool.service file to debug without breaking my system, so I can't confirm.

Same issue as described, it seems that when the rpool is imported it overwrites the zpool.cache file and removes any other pools from the cache.
My workaround was to comment out ConditionPathExists=!/etc/zfs/zpool.cache from /lib/systemd/system/zfs-import-scan.service and enable that service

$ sudo vim /lib/systemd/system/zfs-import-scan.service
[Unit]
Description=Import ZFS pools by device scanning
Documentation=man:zpool(8)
DefaultDependencies=no
Requires=systemd-udev-settle.service
Requires=zfs-load-module.service
After=systemd-udev-settle.service
After=zfs-load-module.service
After=cryptsetup.target
Before=dracut-mount.service
Before=zfs-import.target
#ConditionPathExists=!/etc/zfs/zpool.cache

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -aN -o cachefile=none

[Install]
WantedBy=zfs-import.target

$ sudo systemctl enable zfs-import-scan

I have same problem on 18.04

Same problem on Debian Buster 10.2

followed this guide:
https://github.com/zfsonlinux/zfs/wiki/Debian-Buster-Root-on-ZFS

zpool.cache does not survive a reboot.

Same issue as described, it seems that when the rpool is imported it overwrites the zpool.cache file and removes any other pools from the cache.
My workaround was to comment out ConditionPathExists=!/etc/zfs/zpool.cache from /lib/systemd/system/zfs-import-scan.service and enable that service

$ sudo vim /lib/systemd/system/zfs-import-scan.service
[Unit]
Description=Import ZFS pools by device scanning
Documentation=man:zpool(8)
DefaultDependencies=no
Requires=systemd-udev-settle.service
Requires=zfs-load-module.service
After=systemd-udev-settle.service
After=zfs-load-module.service
After=cryptsetup.target
Before=dracut-mount.service
Before=zfs-import.target
#ConditionPathExists=!/etc/zfs/zpool.cache

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -aN -o cachefile=none

[Install]
WantedBy=zfs-import.target

$ sudo systemctl enable zfs-import-scan

Same issue here on 19.10. Your workaround kind of works but although my additional pools are imported, it does not appear to be mounting the datasets correctly. zfs list shows the relevant datasets but they are missing from a df -h and not viewable.

I have to run zfs mount -a to get them to actually mount after every boot.

Can I add zfs mount -a somewhere into that service for it to run automatically?

I've hit the same error following Ubuntu 18.04 Root on ZFS how-to, albeit in automated fashion, made with this automated script.

After examination of boot sequence and module/zfs/spa_config.c source code, I've found that currently in-core state is always pushed to the disk, there is no disk-to-in-core sync ever made.
As a result, after loading initramfs and mounting root filesystem, any pool import even with cachefile=none results in syncing current in-core state to disk, zapping any previous pools, stored before reboot inside /etc/zfs/zpool.cache.
In our particular case for Ubuntu, it happens when zpool-import-bpool.service runs /sbin/zpool import -N -o cachefile=none bpool command. You can see it for yourself, just add these two lines to zpool-import-bpool.service

ExecStartPre=/bin/sh -c 'ls -al /etc/zfs/'
ExecStartPost=/bin/sh -c 'ls -al /etc/zfs/'

Next, import any user pool you want, reboot and check the service output via systemctl status -l zfs-import-bpool. You will see the cache content zeroed after bpool cachefile import. Immediately after bpool import, in-core updated state is synced to disk, but because both rpool and bpool pools are imported with cachefile=none options, there will be zero exports, as coded in spa_write_cachefile function, look here for code.

Temporary remediation for this bug would be a simple update to the zfs-import-bpool.service, like this:

ExecStartPre=/bin/sh -c '[ -f /etc/zfs/zpool.cache ] && mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache || true'
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
ExecStartPost=/bin/sh -c '[ -f /etc/zfs/preboot_zpool.cache ] && mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache || true'

With this approach, if zpool.cache is present, it will survive bpool load, and will be taken into account with zpool-import-cache.service, which essentially first loads cache config in-core and then it re-synced back with updated txn ids.

As for permanent solution and pull request, i'm a bit lost here, because there is an open ticket for retiring the cache.
IMHO best approach would be to give initrd scripts possibility to do sync on it's own.

So, we either can add new command to zpool utility to resync SPA from disk, should boot sequence ever need this, then add it to initrd scripts for zfs appropriate command, or extend spa_write_cachefile function to check on write, whether root system just has been mounted, and if this is the case, first resync to in-core any pools from cachefile residing on rpool, provided pools are non-stalled.

I can confirm the failure when following https://github.com/openzfs/zfs/wiki/Debian-Buster-Root-on-ZFS. Two pools are not imported at boot, even after poking the cache as described at https://github.com/openzfs/zfs/wiki/FAQ#generating-a-new-etczfszpoolcache-file

Note that the version pinning recommended in the first link results in several packages being held back along with the ZFS-related packages. Unfortunately, bad experiences trying to upgrade Debian and having boot fail have me sticking with those recommendations at this time.

I have applied the changes indicated by @andrey42 to zfs-import-bpool.service and it resolves the _operational_ issue. _(I'm not convinced that the root cause shouldn't be addressed.)_

I did check the OpenZFS Debian Buster Root on ZFS page prior to the creation of the system in question. Updating that page may prove helpful to others.

$ sudo apt show $(dpkg --get-selections | fgrep zfs | cut -f 1) | egrep -A1 '^Package:'

Package: libzfs2linux
Version: 0.8.3-1~bpo10+1
--
Package: zfs-dkms
Version: 0.8.3-1~bpo10+1
--
Package: zfs-initramfs
Version: 0.8.3-1~bpo10+1
--
Package: zfs-zed
Version: 0.8.3-1~bpo10+1
--
Package: zfsutils-linux
Version: 0.8.3-1~bpo10+1
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster

I experienced a very similar issue (maybe identical). I never observed anything being output to the cache file.

The particular disks I was using were cannibalized from an old HP ProLiant server with smart array. Maybe this had something to do with it? I was able to create a pool with the drives as vdevs just fine but re-importing them (even using /dev/disk/by-id) did not work in my case. I also tried recreating the GPT partition table on the drives using gparted, no luck.

I was able to fix the problem by wiping the first few gigs of the drive, then recreating everything

sudo dd if=/dev/zero of=/dev/disk/by-id/ata-ST2000LM015-2E8174_ZDZ5M9JY bs=4M status=progress
6329204736 bytes (6.3 GB, 5.9 GiB) copied, 19 s, 333 MB/s^C
1519+0 records in
1519+0 records out
6371147776 bytes (6.4 GB, 5.9 GiB) copied, 54.7358 s, 116 MB/s
sudo dd if=/dev/zero of=/dev/disk/by-id/ata-ST2000LM015-2E8174_ZDZ5PWG1 bs=4M status=progress
6274678784 bytes (6.3 GB, 5.8 GiB) copied, 19 s, 330 MB/s^C
1506+0 records in
1506+0 records out
6316621824 bytes (6.3 GB, 5.9 GiB) copied, 53.353 s, 118 MB/s

After this I am able to import and export many times with no issues. In addition my cache file finally contains data. My theory is that smart array may have put something on the disk that was causing issues.

Was this page helpful?
5 / 5 - 1 ratings