Zfs: Failed to start Import ZFS pools by cache file

Created on 13 Oct 2015 · 46Comments · Source: openzfs/zfs

This happened while booting up
ss 2015-10-13 at 05 51 04

This shows up after i type the command "systemctl status zfs-import-cache.service"
ss 2015-10-13 at 05 52 38

Sometimes my zpool imports at boot, sometimes it doesn't import at boot and i have to reboot for it to work, rather than doing zpool import every time manually.

root@fxception:~# lsb_release -da
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 8.2 (jessie)
Release: 8.2
Codename: jessie
root@fxception:~# hostnamectl
Static hostname: fxception
Icon name: computer-desktop
Chassis: desktop
Operating System: Debian GNU/Linux 8 (jessie)
Kernel: Linux 3.16.0-4-amd64
Architecture: x86-64
root@fxception:~#

Source

fxceptionnet

Most helpful comment

I had a similar issue which was solved after removing /etc/zfs/zpool.cache.
The problem was that I destroyed the zpool and recreated it under a different name which was somehow not reflected in the cache file so the import tried to use the old pool's name.

klingtnet on 30 Mar 2017

❤5 🎉4

All 46 comments

This shows up after i type the command "systemctl status zfs-import-cache.service"

What does the same using "zfs-import-scan.service" tell you?

This shows up when typing the command "systemctl status zfs-share.service"

Unfortunately, the current sharesmb code isn't "intelligent" enough to realize that a share is already shared, so it tries to share something that samba have already shared.

This happens, usually, when the machine isn't shutdown correctly (as in, "unshare -a" isn't run), leaving share files for samba behind. So when next time samba runs, the files are already there, samba shares them and "share -a" does not realize this and gets an error from samba because it's trying to share something that's already shared...

FransUrbo on 14 Oct 2015

root@fxception:/home/percy/sickbeard# systemctl status zfs-import-scan.service
● zfs-import-scan.service - Import ZFS pools by device scanning
Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; static)
Active: inactive (dead)
start condition failed at Tue 2015-10-13 06:17:11 PDT; 16h ago
ConditionPathExists=!/etc/zfs/zpool.cache was not met

fxceptionnet on 14 Oct 2015

Ok, bummer. Guess that would have been to easy :(.

Ok, so the scan service can't run because there's a cache file (perfectly ok!) and the cache service can't run because it can't find the Storage pool. _PROBABLY_ because the devices couldn't be found…

To verify, run this: zdb -C | grep path:.

If you only see /dev/... links there, you might want to import the pool using /dev/disk/by-id.

Just make sure the pool isn't already imported: zpool list Storage. If it is, just export it: zpool export Storage.

Then import it using the by-id dir: zpool import -d /dev/disk/by-id -N Storage.

That should update the cache file with the new links and you should be good to go.

FransUrbo on 14 Oct 2015

root@fxception:/home/percy/sickbeard# zdb -C | grep path:
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH2LK1B-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH284RB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK1334PCGANTSS-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCG0YHYB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH287GB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH2896B-part1'

root@fxception:/home/percy/sickbeard# zpool list Storage
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Storage 21.8T 17.2T 4.53T - 4% 79% 1.00x ONLINE -

I already imported the pool originallywith disk/by-id. Sometimes the pool loads on boot, sometimes it doesn't but when it does load.... it works fine. If that makes any sense. As well as the errors you see on boot in those screenshots.

fxceptionnet on 14 Oct 2015

Dang! You're not making this easy! :)

The error is: cannot import 'Storage': no such pool or dataset. _WHY_ that is, I don't know. I had two ideas as of why, but they both turned out to be wrong.. Let's hope someone else have any ideas.

You could always try to move the cache file out of the way and then scan would run instead. If that also fails, then I'm finally out of ideas...

FransUrbo on 14 Oct 2015

You also need to set the cachefile property to none to avoid recreating it at next import...

FransUrbo on 14 Oct 2015

Could it possibly be that i don't have the right version of ZFS for my distribution?

I did follow this guide on the website http://zfsonlinux.org/debian.html. I am thinking about just backing up my data and reinstalling debian and zfs. Unless someone can that has this problem already fixed it and can help on how it was resolved.

fxceptionnet on 14 Oct 2015

Extremely doubtful. If you can import it on the shell _and_ it works at boot most (some?) of the times, AND it works after a reboot, then it can't be wrong version or version mismatch between module and userland tools… It it was, then it would fail _all_ the time.

How often does/doesn't it work at boot? Every other time, every now and then or 'sometimes'?

FransUrbo on 14 Oct 2015

every other time, i would say.

fxceptionnet on 14 Oct 2015

Ok, that's important information. Not that it helps. Yet, but let me think about it and maybe I can figure out what's going on…

When it _doesn't_ work, it still works every time to import it manually in the shell? Does it work to run (_start_) the service again (from the shell)? Don't know if you have to _stop_ it first, before trying to _start_ it, but...

FransUrbo on 14 Oct 2015

Well, when it doesn't work, i just reboot to get it fixed, I don't manually import it in shell.

Not sure what you're trying to say here "Does it work to run (start) the service again (from the shell)? Don't know if you have to stop it first, before trying to start it, but..."

fxceptionnet on 14 Oct 2015

As in:

systemctl stop zfs-import-cache.service
systemctl start zfs-import-cache.service

Next time it doesn't work, try running these two and see if that works. If they work, and the next time it doesn't work at boot, try:

zpool import -c /etc/zfs/zpool.cache -aN

If all this work (as in _manually from the shell_ even though it failed at boot), we might be closer to the problem...

FransUrbo on 14 Oct 2015

What we're trying to figure out is if it's something in the boot process that is causing this, or if it's something with ZoL.

If it works manually in the shell, even though it didn't in the boot, then it's something in the boot process. If it didn't work manually, then it is (probably/likely) something with ZoL.

FransUrbo on 14 Oct 2015

On my system I also occasionally have this boot error

 cannot import '...': no such pool or dataset

where ... is one of my pools. My distribution (ArchLinux) is not using cachefile, instead it does this https://aur.archlinux.org/cgit/aur.git/tree/zfs-utils.initcpio.hook?h=zfs-utils-git (FWIW, I am not happy with this approach and will switch to using cachefile soon). I "fix" the error by warm reboot of the machine, it normally works well on 2nd boot.

Not sure if it helps.

Bronek on 14 Oct 2015

@Bronek So you only get that problem when cold booting? @fxception Is this the same for you?

FransUrbo on 14 Oct 2015

@FransUrbo No, typing reboot in shell will cause the problem to occur.

fxceptionnet on 14 Oct 2015

@fxception Ok, thanx. That's what we call a "warm" reboot.

But my primary idea is still that there's something that stops the import from recognize the devices. Probably because the devices isn't "there" (yet). The only way to find that out is to run those tests I mentioned earlier.

FransUrbo on 14 Oct 2015

I will do that for you tomorrow night after work, I'm about to head to bed.

fxceptionnet on 14 Oct 2015

@FransUrbo my memory is little hazy on this and I cannot say that it never happens on warm boot. It occasionally might, but definitely not so frequently as cold boots. I remember that configuring SAS controller to wait few seconds for each HDD made these error appear much less frequently than they used to happen.

Bronek on 14 Oct 2015

@Bronek That's ok. Even if it's "mostly" on cold boots, that's still an argument for my theory.

FransUrbo on 14 Oct 2015

@FransUrbo

Okay, So.... I just rebooted the machine 2 times with no errors on boot and zpool status showed up as online, I rebooted again after that and got the error with the zpool status showing "no pools available" Then I proceeded to do this as mentioned.

ss 2015-10-14 at 11 06 48

Do you want me to reboot again to see if I get that error? And if I do run "zpool import -c /etc/zfs/zpool.cache -aN" ??

fxceptionnet on 15 Oct 2015

Do you want me to reboot again to see if I get that error? And if I do run "zpool import -c /etc/zfs/zpool.cache -aN" ??

Nah, that's ok. Running "zpool import ..." is basically what you're already done...

So it DO seem like it's something in your/the boot process. SOMEWHAT good news.

You need to figure out why your device nodes aren't available when the import runs.

Is it started before udevd for example?

FransUrbo on 15 Oct 2015

I have NO idea... I'm not very linux savvy, still learning. I know how to check logs, but I don't know what to look for.

fxceptionnet on 15 Oct 2015

Hmm.... My zpool dissapeared after a while after doing those commands, I just noticed I couldn't access it. But when typing zpool status it would show its online.

fxceptionnet on 15 Oct 2015

That's probably because you didn't mount the filesystems. Try systemctl start zfs-mount.service and systemctl start zfs-share.service.

FransUrbo on 15 Oct 2015

It would still show pool is online, but i cannot access it?

fxceptionnet on 15 Oct 2015

What do you mean "cannot access it"?

FransUrbo on 15 Oct 2015

as in.... when i goto my shared folders on my pc, none of my strorage is showing.... the samba share folders are there, but nothing is in there.

fxceptionnet on 15 Oct 2015

I think we're reaching the end of the usability of this issue tracker. We're now into support, which we don't do here. Please take it to the mailing list or the IRC channel (reference this issue if you need).

FransUrbo on 15 Oct 2015

So, regards to ZFS not loading at boot, is this something that can be fixed? Or is that what I should be asking in IRC?

fxceptionnet on 15 Oct 2015

Depends on what's causing it. But technically, it's (probably) not a ZoL problem. It will be difficult to support you get to the root cause of this here. Better you get support on IRC/mailing list. IF they/someone still think this is an issue, please post the conclusion here.

FransUrbo on 15 Oct 2015

@fxception Was there any resolution to this?

FransUrbo on 19 Oct 2015

I fixed it, But I noticed something strange, not sure if it was supposed to happen or not.

So what I did was fresh install debian jessie 8.2, install zfs, and then reboot. after rebooting.... i noticed my pool was already imported, without having to "zpool import -d /dev/disk/by-id Storage". Was that supposed to happen? other than that, I didn't get any errors, and haven't gotten any since.

fxceptionnet on 19 Oct 2015

After the fresh install, did you go through the "Install SPL and ZFS" part again? If not, that's NOT supposed to happen!! :)

FransUrbo on 19 Oct 2015

I followed the install guide on zfsonlinux.org's website debian section.

wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_6_all.deb
dpkg -i zfsonlinux_6_all.deb
apt-get update
apt-get install debian-zfs

fxceptionnet on 19 Oct 2015

Ah, ok. Then all is good. Of-the-shelf (so to speak) work just fine.

Don't know why you had problems earlier, but since it's fixed now (although in a very unorthodox way :), we can close this.

FransUrbo on 19 Oct 2015

So the automatic importing without me doing so is normal? If so, how did it happen?

fxceptionnet on 19 Oct 2015

Yes, the packages sets up everything you need automatically. It will detect if you're using systemd or init and initialize the startup accordingly (btw, the Debian GNU/Linux packages are the only one that does this :).

The import part will automatically import any zpool it finds, unless you specifically tell it NOT to import a pool/pools.

So this is "expected behaviour". Some have complained that my scripts are a little TO smart, and I guess that's correct in 1% (or there about) of the cases :).

FransUrbo on 19 Oct 2015

Pefect, If i come across any issues, i will head to IRC, and if cannot be resolved, will post an issue. Much appreciated @FransUrbo for your help!

Thank you!

fxceptionnet on 19 Oct 2015

No worries. Could you please close this then? It's a shame that we never managed to figure out exactly what went wrong, but if it happens again (to you or someone else), we might have some hint here.

@Bronek you said you had a similar issue, did you ever get yours fixed?

FransUrbo on 19 Oct 2015

@FransUrbo no, I just restart computer and it "fixes itself" on warm boot. Did not try to look deeper yet.

Bronek on 19 Oct 2015

@Bronek Ok. If you ever get the time or interest in trying to find out, let us know :).

@fxception In the meantime, could you please close this issue?

FransUrbo on 20 Oct 2015

klingtnet on 30 Mar 2017

❤5 🎉4

In case someone looks at this issue. I've found a robust fix for my machine, it is to enforce synchronous scsi scan via this:

# cat /etc/modprobe.d/zfs.conf
# Enforce synchronous scsi scan, to prevent zfs driver loading before disks are available
options scsi_mod scan=sync

Bronek on 13 May 2017

👍2 🎉1

@klingtnet You're a hero. This resolved my issue after having to restart my server several times in a row, not always cleanly.

XenHat on 27 Jul 2017

👍2

I've just made a script to actually change method of importing the POOL. Use it at your own risk.

Please change the POOLNAME before running it (ncdata).

Feel free to use it and improve it: https://github.com/nextcloud/vm/blob/master/static/change-to-zfs-mount-generator.sh

Thought it might come in handy for some people here.

enoch85 on 1 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can't override 'mountpoint' when mounting a filesystem

FransUrbo · 4Comments

Q: Is data from an encrypted dataset stored on l2arc encrypted?

adamdmoss · 3Comments

encryption allocates 10x space for directory entries

jakeogh · 3Comments

suspicious code in dsl_scan_ds_clone_swapped, related to sequential scan

avg-I · 3Comments

Trying to build the 'zhack' disable feature commit from a few years ago

RNCTX · 3Comments