Zfs: Failed to start Import ZFS pools by cache file

Created on 13 Oct 2015  Â·  46Comments  Â·  Source: openzfs/zfs

This happened while booting up
ss 2015-10-13 at 05 51 04

This shows up after i type the command "systemctl status zfs-import-cache.service"
ss 2015-10-13 at 05 52 38

Sometimes my zpool imports at boot, sometimes it doesn't import at boot and i have to reboot for it to work, rather than doing zpool import every time manually.

root@fxception:~# lsb_release -da
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 8.2 (jessie)
Release: 8.2
Codename: jessie
root@fxception:~# hostnamectl
Static hostname: fxception
Icon name: computer-desktop
Chassis: desktop
Operating System: Debian GNU/Linux 8 (jessie)
Kernel: Linux 3.16.0-4-amd64
Architecture: x86-64
root@fxception:~#

Most helpful comment

I had a similar issue which was solved after removing /etc/zfs/zpool.cache.
The problem was that I destroyed the zpool and recreated it under a different name which was somehow not reflected in the cache file so the import tried to use the old pool's name.

All 46 comments

This shows up after i type the command "systemctl status zfs-import-cache.service"

What does the same using "zfs-import-scan.service" tell you?

This shows up when typing the command "systemctl status zfs-share.service"

Unfortunately, the current sharesmb code isn't "intelligent" enough to realize that a share is already shared, so it tries to share something that samba have already shared.

This happens, usually, when the machine isn't shutdown correctly (as in, "unshare -a" isn't run), leaving share files for samba behind. So when next time samba runs, the files are already there, samba shares them and "share -a" does not realize this and gets an error from samba because it's trying to share something that's already shared...

root@fxception:/home/percy/sickbeard# systemctl status zfs-import-scan.service
â—Ź zfs-import-scan.service - Import ZFS pools by device scanning
Loaded: loaded (/lib/systemd/system/zfs-import-scan.service; static)
Active: inactive (dead)
start condition failed at Tue 2015-10-13 06:17:11 PDT; 16h ago
ConditionPathExists=!/etc/zfs/zpool.cache was not met

Ok, bummer. Guess that would have been to easy :(.

Ok, so the scan service can't run because there's a cache file (perfectly ok!) and the cache service can't run because it can't find the Storage pool. _PROBABLY_ because the devices couldn't be found…

To verify, run this: zdb -C | grep path:.

If you only see /dev/... links there, you might want to import the pool using /dev/disk/by-id.

Just make sure the pool isn't already imported: zpool list Storage. If it is, just export it: zpool export Storage.

Then import it using the by-id dir: zpool import -d /dev/disk/by-id -N Storage.

That should update the cache file with the new links and you should be good to go.

root@fxception:/home/percy/sickbeard# zdb -C | grep path:
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH2LK1B-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH284RB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK1334PCGANTSS-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCG0YHYB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH287GB-part1'
path: '/dev/disk/by-id/ata-HGST_HDN724040ALE640_PK2334PCH2896B-part1'

root@fxception:/home/percy/sickbeard# zpool list Storage
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Storage 21.8T 17.2T 4.53T - 4% 79% 1.00x ONLINE -

I already imported the pool originallywith disk/by-id. Sometimes the pool loads on boot, sometimes it doesn't but when it does load.... it works fine. If that makes any sense. As well as the errors you see on boot in those screenshots.

Dang! You're not making this easy! :)

The error is: cannot import 'Storage': no such pool or dataset. _WHY_ that is, I don't know. I had two ideas as of why, but they both turned out to be wrong.. Let's hope someone else have any ideas.

You could always try to move the cache file out of the way and then scan would run instead. If that also fails, then I'm finally out of ideas...

You also need to set the cachefile property to none to avoid recreating it at next import...

Could it possibly be that i don't have the right version of ZFS for my distribution?

I did follow this guide on the website http://zfsonlinux.org/debian.html. I am thinking about just backing up my data and reinstalling debian and zfs. Unless someone can that has this problem already fixed it and can help on how it was resolved.

Extremely doubtful. If you can import it on the shell _and_ it works at boot most (some?) of the times, AND it works after a reboot, then it can't be wrong version or version mismatch between module and userland tools… It it was, then it would fail _all_ the time.

How often does/doesn't it work at boot? Every other time, every now and then or 'sometimes'?

every other time, i would say.

Ok, that's important information. Not that it helps. Yet, but let me think about it and maybe I can figure out what's going on…

When it _doesn't_ work, it still works every time to import it manually in the shell? Does it work to run (_start_) the service again (from the shell)? Don't know if you have to _stop_ it first, before trying to _start_ it, but...

Well, when it doesn't work, i just reboot to get it fixed, I don't manually import it in shell.

Not sure what you're trying to say here "Does it work to run (start) the service again (from the shell)? Don't know if you have to stop it first, before trying to start it, but..."

As in:

systemctl stop zfs-import-cache.service
systemctl start zfs-import-cache.service

Next time it doesn't work, try running these two and see if that works. If they work, and the next time it doesn't work at boot, try:

zpool import -c /etc/zfs/zpool.cache -aN

If all this work (as in _manually from the shell_ even though it failed at boot), we might be closer to the problem...

What we're trying to figure out is if it's something in the boot process that is causing this, or if it's something with ZoL.

If it works manually in the shell, even though it didn't in the boot, then it's something in the boot process. If it didn't work manually, then it is (probably/likely) something with ZoL.

On my system I also occasionally have this boot error

 cannot import '...': no such pool or dataset

where ... is one of my pools. My distribution (ArchLinux) is not using cachefile, instead it does this https://aur.archlinux.org/cgit/aur.git/tree/zfs-utils.initcpio.hook?h=zfs-utils-git (FWIW, I am not happy with this approach and will switch to using cachefile soon). I "fix" the error by warm reboot of the machine, it normally works well on 2nd boot.

Not sure if it helps.

@Bronek So you only get that problem when cold booting? @fxception Is this the same for you?

@FransUrbo No, typing reboot in shell will cause the problem to occur.

@fxception Ok, thanx. That's what we call a "warm" reboot.

But my primary idea is still that there's something that stops the import from recognize the devices. Probably because the devices isn't "there" (yet). The only way to find that out is to run those tests I mentioned earlier.

I will do that for you tomorrow night after work, I'm about to head to bed.

@FransUrbo my memory is little hazy on this and I cannot say that it never happens on warm boot. It occasionally might, but definitely not so frequently as cold boots. I remember that configuring SAS controller to wait few seconds for each HDD made these error appear much less frequently than they used to happen.

@Bronek That's ok. Even if it's "mostly" on cold boots, that's still an argument for my theory.

@FransUrbo

Okay, So.... I just rebooted the machine 2 times with no errors on boot and zpool status showed up as online, I rebooted again after that and got the error with the zpool status showing "no pools available" Then I proceeded to do this as mentioned.

ss 2015-10-14 at 11 06 48

Do you want me to reboot again to see if I get that error? And if I do run "zpool import -c /etc/zfs/zpool.cache -aN" ??

Do you want me to reboot again to see if I get that error? And if I do run "zpool import -c /etc/zfs/zpool.cache -aN" ??

Nah, that's ok. Running "zpool import ..." is basically what you're already done...

So it DO seem like it's something in your/the boot process. SOMEWHAT good news.

You need to figure out why your device nodes aren't available when the import runs.

Is it started before udevd for example?

I have NO idea... I'm not very linux savvy, still learning. I know how to check logs, but I don't know what to look for.

Hmm.... My zpool dissapeared after a while after doing those commands, I just noticed I couldn't access it. But when typing zpool status it would show its online.

That's probably because you didn't mount the filesystems. Try systemctl start zfs-mount.service and systemctl start zfs-share.service.

It would still show pool is online, but i cannot access it?

What do you mean "cannot access it"?

as in.... when i goto my shared folders on my pc, none of my strorage is showing.... the samba share folders are there, but nothing is in there.

I think we're reaching the end of the usability of this issue tracker. We're now into support, which we don't do here. Please take it to the mailing list or the IRC channel (reference this issue if you need).

So, regards to ZFS not loading at boot, is this something that can be fixed? Or is that what I should be asking in IRC?

Depends on what's causing it. But technically, it's (probably) not a ZoL problem. It will be difficult to support you get to the root cause of this here. Better you get support on IRC/mailing list. IF they/someone still think this is an issue, please post the conclusion here.

@fxception Was there any resolution to this?

I fixed it, But I noticed something strange, not sure if it was supposed to happen or not.

So what I did was fresh install debian jessie 8.2, install zfs, and then reboot. after rebooting.... i noticed my pool was already imported, without having to "zpool import -d /dev/disk/by-id Storage". Was that supposed to happen? other than that, I didn't get any errors, and haven't gotten any since.

After the fresh install, did you go through the "Install SPL and ZFS" part again? If not, that's NOT supposed to happen!! :)

I followed the install guide on zfsonlinux.org's website debian section.

wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_6_all.deb
dpkg -i zfsonlinux_6_all.deb
apt-get update
apt-get install debian-zfs

Ah, ok. Then all is good. Of-the-shelf (so to speak) work just fine.

Don't know why you had problems earlier, but since it's fixed now (although in a very unorthodox way :), we can close this.

So the automatic importing without me doing so is normal? If so, how did it happen?

Yes, the packages sets up everything you need automatically. It will detect if you're using systemd or init and initialize the startup accordingly (btw, the Debian GNU/Linux packages are the only one that does this :).

The import part will automatically import any zpool it finds, unless you specifically tell it NOT to import a pool/pools.

So this is "expected behaviour". Some have complained that my scripts are a little TO smart, and I guess that's correct in 1% (or there about) of the cases :).

Pefect, If i come across any issues, i will head to IRC, and if cannot be resolved, will post an issue. Much appreciated @FransUrbo for your help!

Thank you!

No worries. Could you please close this then? It's a shame that we never managed to figure out exactly what went wrong, but if it happens again (to you or someone else), we might have some hint here.

@Bronek you said you had a similar issue, did you ever get yours fixed?

@FransUrbo no, I just restart computer and it "fixes itself" on warm boot. Did not try to look deeper yet.

@Bronek Ok. If you ever get the time or interest in trying to find out, let us know :).

@fxception In the meantime, could you please close this issue?

I had a similar issue which was solved after removing /etc/zfs/zpool.cache.
The problem was that I destroyed the zpool and recreated it under a different name which was somehow not reflected in the cache file so the import tried to use the old pool's name.

In case someone looks at this issue. I've found a robust fix for my machine, it is to enforce synchronous scsi scan via this:

# cat /etc/modprobe.d/zfs.conf
# Enforce synchronous scsi scan, to prevent zfs driver loading before disks are available
options scsi_mod scan=sync

@klingtnet You're a hero. This resolved my issue after having to restart my server several times in a row, not always cleanly.

I've just made a script to actually change method of importing the POOL. Use it at your own risk.

Please change the POOLNAME before running it (ncdata).

Feel free to use it and improve it: https://github.com/nextcloud/vm/blob/master/static/change-to-zfs-mount-generator.sh

Thought it might come in handy for some people here.

Was this page helpful?
0 / 5 - 0 ratings