Test setup:
Fedora 19, zfs 0.6.2 release, 6 disks in RAIDZ2 with SSD as L2ARC device.
This setup will incur a 50% overhead in pool allocation (4 data disks, 2 checksum)
Allocated pool space before experiment:
8996065210368
Create a 10G zvol:
zfs create -V 10G -o compression=off tank/backup/test
Fully write the volume once:
dd_rescue -Z 0 /dev/zvol/tank/backup/test
Allocated pool space after write:
9028783779840
As can be seen the difference is 32718569472 bytes (~30GB), double the expected amount of ~15GB.
"zfs get all" output of the zvol after writing:
NAME PROPERTY VALUE SOURCE tank/backup/test type volume - tank/backup/test creation Tue Oct 22 23:12 2013 - tank/backup/test used 20.3G - tank/backup/test available 4.88T - tank/backup/test referenced 20.3G - tank/backup/test compressratio 1.00x - tank/backup/test reservation none default tank/backup/test volsize 10G local tank/backup/test volblocksize 8K - tank/backup/test checksum on default tank/backup/test compression off local tank/backup/test readonly off default tank/backup/test copies 1 default tank/backup/test refreservation 10.3G local tank/backup/test primarycache all default tank/backup/test secondarycache metadata inherited from tank/backup tank/backup/test usedbysnapshots 0 - tank/backup/test usedbydataset 20.3G - tank/backup/test usedbychildren 0 - tank/backup/test usedbyrefreservation 0 - tank/backup/test logbias latency default tank/backup/test dedup off default tank/backup/test mlslabel none default tank/backup/test sync standard default tank/backup/test refcompressratio 1.00x - tank/backup/test written 20.3G - tank/backup/test snapdev hidden default
As a second test a 10GB file was created on a zfs filesystem on the same pool.
Allocated pool space before file creation:
8996065517568
Allocated pool space after file creation:
9012204183552
The difference is 16138665984 bytes (~15GB), the expected amount.
Looks like the same as #548. Your normal file will have likely used the default 128k block size as opposed to the default 8k block size for zvols.
Sounds reasonable. I don't expect high throughput from the zvol, so using larger sectors might be an option, even when this lowers the IOPS rate.
And yes, the normal zfs filesystems have 128k blocks.
So here's the thing. Each 8k volume block has its own parity disks. That's how RAID-Z works. If you have 4k AF disks (zdb shows "ashift: 12") then what happens is each 8k block consists of 2x disks of data, and 2x disks of parity. You're expecting that generally there's 4x disks of data and 2x disks of parity all the way across the pool. There isn't.
Switch to using a block size of 16k or larger and you should be in better shape for space usage, at the expense of having a worse read-copy-write cycle for when smaller writes occur on the volume. With a filesystem's default of 128k you're already covered there.
Thanks for the quick explanation @dweeezil and @DeHackEd. Since this is just a matter of documentation I've marked it as such and will close the issue.
I just wanted to confirm that setting volblocksize=16k on the zvol made the problem disappear. Thanks to @dweeezil and @DeHackEd for the help.
Most helpful comment
So here's the thing. Each 8k volume block has its own parity disks. That's how RAID-Z works. If you have 4k AF disks (zdb shows "ashift: 12") then what happens is each 8k block consists of 2x disks of data, and 2x disks of parity. You're expecting that generally there's 4x disks of data and 2x disks of parity all the way across the pool. There isn't.
Switch to using a block size of 16k or larger and you should be in better shape for space usage, at the expense of having a worse read-copy-write cycle for when smaller writes occur on the volume. With a filesystem's default of 128k you're already covered there.