Type | Version/Name
--- | ---
Distribution Name | Gentoo
Distribution Version | ~amd64
Linux Kernel | 5.0.9-gentoo
Architecture | x86_64
ZFS Version | 0.8.0-rc5
SPL Version | 0.8.0-rc5
Making 100k directories on a non-encrypted volume uses ~20M. Copying that tree to an encrypted volume uses ~205M. Reversing the order has the same result. Both volumes are on the same pool.
$ cat spacetest.sh
#!/bin/sh
set -o nounset
set -o errexit
set -x
if [ "$(whoami)" != root ]
then
echo -e "\nPlease run this script as root! Exiting.\n" >&2
exit 1
fi
losetup -l | grep "/dev/loop0" && { echo "/dev/loop0 is in use" ; exit 1 ; }
timestamp=`date +%s`
mkdir "${timestamp}"
cd "${timestamp}" || exit 1
test_pool_file="test_pool_${timestamp}"
dd if=/dev/zero of="${test_pool_file}" bs=1G count=1 || exit 1
dd if=/dev/urandom of=temp_zfs_key bs=32 count=1 || exit 1
key_path=`readlink -f temp_zfs_key`
losetup /dev/loop0 "${test_pool_file}" /dev/loop0 || exit 1
zpool create -O atime=off -O compression=lz4 -O mountpoint=none "${test_pool_file}" /dev/loop0 || exit 1
zfs create -o mountpoint=/"${test_pool_file}"/spacetest "${test_pool_file}"/spacetest || exit 1
zfs create -o encryption=on -o keyformat=raw -o keylocation=file://"${key_path}" -o mountpoint=/"${test_pool_file}"/spacetest_enc "${test_pool_file}"/spacetest_enc || exit 1
df -h | grep "${test_pool_file}" || exit 1
cd /"${test_pool_file}"/spacetest || exit 1
/bin/ls -alh || exit 1
# empty dirs 20M -> 205M
# > ~400000 -> "No space left on device"
python3 -c "import os; import time; import uuid; target=str(time.time()); os.makedirs(target); os.chdir(target); [os.makedirs(uuid.uuid4().hex) for _ in range(100000)]" || exit 1
# empty files 20M -> 106M
#python3 -c "import os; import time; import uuid; target=str(time.time()); os.makedirs(target); os.chdir(target); [os.mknod(uuid.uuid4().hex) for _ in range(100000)]" || exit 1
# broken symlinks 25M -> 101M
#python3 -c "import os; import time; import uuid; target=str(time.time()); os.makedirs(target); os.chdir(target); [os.symlink(uuid.uuid4().hex, uuid.uuid4().hex) for _ in range(100000)]" || exit 1
df -h | grep "${test_pool_file}" || exit 1
/bin/ls -alh || exit 1
cp -ar * /"${test_pool_file}"/spacetest_enc/
df -h | grep "${test_pool_file}"
zfs get all | grep "${test_pool_file}"
$ ./spacetest.sh
++ whoami
+ '[' root '!=' root ']'
+ losetup -l
+ grep /dev/loop0
++ date +%s
+ timestamp=1558402378
+ mkdir 1558402378
+ cd 1558402378
+ test_pool_file=test_pool_1558402378
+ dd if=/dev/zero of=test_pool_1558402378 bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.01976 s, 1.1 GB/s
+ dd if=/dev/urandom of=temp_zfs_key bs=32 count=1
1+0 records in
1+0 records out
32 bytes copied, 9.4631e-05 s, 338 kB/s
++ readlink -f temp_zfs_key
+ key_path=/home/cfg/filesystem/zfs/1558402378/temp_zfs_key
+ losetup /dev/loop0 test_pool_1558402378 /dev/loop0
+ zpool create -O atime=off -O compression=lz4 -O mountpoint=none test_pool_1558402378 /dev/loop0
+ zfs create -o mountpoint=/test_pool_1558402378/spacetest test_pool_1558402378/spacetest
+ zfs create -o encryption=on -o keyformat=raw -o keylocation=file:///home/cfg/filesystem/zfs/1558402378/temp_zfs_key -o mountpoint=/test_pool_1558402378/spacetest_enc test_pool_1558402378/spacetest_enc
+ df -h
+ grep test_pool_1558402378
test_pool_1558402378/spacetest 832M 128K 831M 1% /test_pool_1558402378/spacetest
test_pool_1558402378/spacetest_enc 832M 128K 831M 1% /test_pool_1558402378/spacetest_enc
+ cd /test_pool_1558402378/spacetest
+ /bin/ls -alh
total 4.5K
drwxr-xr-x 2 root root 2 May 20 18:33 .
drwxr-xr-x 4 root root 4.0K May 20 18:33 ..
+ python3 -c 'import os; import time; import uuid; target=str(time.time()); os.makedirs(target); os.chdir(target); [os.makedirs(uuid.uuid4().hex) for _ in range(100000)]'
+ df -h
+ grep test_pool_1558402378
test_pool_1558402378/spacetest 832M 20M 813M 3% /test_pool_1558402378/spacetest
test_pool_1558402378/spacetest_enc 813M 128K 813M 1% /test_pool_1558402378/spacetest_enc
+ /bin/ls -alh
total 16M
drwxr-xr-x 3 root root 3 May 20 18:33 .
drwxr-xr-x 4 root root 4.0K May 20 18:33 ..
drwxr-xr-x 100002 root root 98K May 20 18:33 1558402385.0805695
+ cp -ar 1558402385.0805695 /test_pool_1558402378/spacetest_enc/
+ df -h
+ grep test_pool_1558402378
test_pool_1558402378/spacetest 627M 20M 607M 4% /test_pool_1558402378/spacetest
test_pool_1558402378/spacetest_enc 812M 205M 607M 26% /test_pool_1558402378/spacetest_enc
+ grep test_pool_1558402378
+ zfs get all
test_pool_1558402378 type filesystem -
test_pool_1558402378 creation Mon May 20 18:33 2019 -
test_pool_1558402378 used 225M -
test_pool_1558402378 available 607M -
test_pool_1558402378 referenced 24K -
test_pool_1558402378 compressratio 1.00x -
test_pool_1558402378 mounted no -
test_pool_1558402378 quota none default
test_pool_1558402378 reservation none default
test_pool_1558402378 recordsize 128K default
test_pool_1558402378 mountpoint none local
test_pool_1558402378 sharenfs off default
test_pool_1558402378 checksum on default
test_pool_1558402378 compression lz4 local
test_pool_1558402378 atime off local
test_pool_1558402378 devices on default
test_pool_1558402378 exec on default
test_pool_1558402378 setuid on default
test_pool_1558402378 readonly off default
test_pool_1558402378 zoned off default
test_pool_1558402378 snapdir hidden default
test_pool_1558402378 aclinherit restricted default
test_pool_1558402378 createtxg 1 -
test_pool_1558402378 canmount on default
test_pool_1558402378 xattr on default
test_pool_1558402378 copies 1 default
test_pool_1558402378 version 5 -
test_pool_1558402378 utf8only off -
test_pool_1558402378 normalization none -
test_pool_1558402378 casesensitivity sensitive -
test_pool_1558402378 vscan off default
test_pool_1558402378 nbmand off default
test_pool_1558402378 sharesmb off default
test_pool_1558402378 refquota none default
test_pool_1558402378 refreservation none default
test_pool_1558402378 guid 11573579168963844048 -
test_pool_1558402378 primarycache all default
test_pool_1558402378 secondarycache all default
test_pool_1558402378 usedbysnapshots 0B -
test_pool_1558402378 usedbydataset 24K -
test_pool_1558402378 usedbychildren 225M -
test_pool_1558402378 usedbyrefreservation 0B -
test_pool_1558402378 logbias latency default
test_pool_1558402378 objsetid 54 -
test_pool_1558402378 dedup off default
test_pool_1558402378 mlslabel none default
test_pool_1558402378 sync standard default
test_pool_1558402378 dnodesize legacy default
test_pool_1558402378 refcompressratio 1.00x -
test_pool_1558402378 written 24K -
test_pool_1558402378 logicalused 112M -
test_pool_1558402378 logicalreferenced 12K -
test_pool_1558402378 volmode default default
test_pool_1558402378 filesystem_limit none default
test_pool_1558402378 snapshot_limit none default
test_pool_1558402378 filesystem_count none default
test_pool_1558402378 snapshot_count none default
test_pool_1558402378 snapdev hidden default
test_pool_1558402378 acltype off default
test_pool_1558402378 context none default
test_pool_1558402378 fscontext none default
test_pool_1558402378 defcontext none default
test_pool_1558402378 rootcontext none default
test_pool_1558402378 relatime off default
test_pool_1558402378 redundant_metadata all default
test_pool_1558402378 overlay off default
test_pool_1558402378 encryption off default
test_pool_1558402378 keylocation none default
test_pool_1558402378 keyformat none default
test_pool_1558402378 pbkdf2iters 0 default
test_pool_1558402378 special_small_blocks 0 default
test_pool_1558402378/spacetest type filesystem -
test_pool_1558402378/spacetest creation Mon May 20 18:33 2019 -
test_pool_1558402378/spacetest used 19.6M -
test_pool_1558402378/spacetest available 607M -
test_pool_1558402378/spacetest referenced 19.6M -
test_pool_1558402378/spacetest compressratio 1.00x -
test_pool_1558402378/spacetest mounted yes -
test_pool_1558402378/spacetest quota none default
test_pool_1558402378/spacetest reservation none default
test_pool_1558402378/spacetest recordsize 128K default
test_pool_1558402378/spacetest mountpoint /test_pool_1558402378/spacetest local
test_pool_1558402378/spacetest sharenfs off default
test_pool_1558402378/spacetest checksum on default
test_pool_1558402378/spacetest compression lz4 inherited from test_pool_1558402378
test_pool_1558402378/spacetest atime off inherited from test_pool_1558402378
test_pool_1558402378/spacetest devices on default
test_pool_1558402378/spacetest exec on default
test_pool_1558402378/spacetest setuid on default
test_pool_1558402378/spacetest readonly off default
test_pool_1558402378/spacetest zoned off default
test_pool_1558402378/spacetest snapdir hidden default
test_pool_1558402378/spacetest aclinherit restricted default
test_pool_1558402378/spacetest createtxg 6 -
test_pool_1558402378/spacetest canmount on default
test_pool_1558402378/spacetest xattr on default
test_pool_1558402378/spacetest copies 1 default
test_pool_1558402378/spacetest version 5 -
test_pool_1558402378/spacetest utf8only off -
test_pool_1558402378/spacetest normalization none -
test_pool_1558402378/spacetest casesensitivity sensitive -
test_pool_1558402378/spacetest vscan off default
test_pool_1558402378/spacetest nbmand off default
test_pool_1558402378/spacetest sharesmb off default
test_pool_1558402378/spacetest refquota none default
test_pool_1558402378/spacetest refreservation none default
test_pool_1558402378/spacetest guid 14103251210182056828 -
test_pool_1558402378/spacetest primarycache all default
test_pool_1558402378/spacetest secondarycache all default
test_pool_1558402378/spacetest usedbysnapshots 0B -
test_pool_1558402378/spacetest usedbydataset 19.6M -
test_pool_1558402378/spacetest usedbychildren 0B -
test_pool_1558402378/spacetest usedbyrefreservation 0B -
test_pool_1558402378/spacetest logbias latency default
test_pool_1558402378/spacetest objsetid 259 -
test_pool_1558402378/spacetest dedup off default
test_pool_1558402378/spacetest mlslabel none default
test_pool_1558402378/spacetest sync standard default
test_pool_1558402378/spacetest dnodesize legacy default
test_pool_1558402378/spacetest refcompressratio 1.00x -
test_pool_1558402378/spacetest written 19.6M -
test_pool_1558402378/spacetest logicalused 9.79M -
test_pool_1558402378/spacetest logicalreferenced 9.79M -
test_pool_1558402378/spacetest volmode default default
test_pool_1558402378/spacetest filesystem_limit none default
test_pool_1558402378/spacetest snapshot_limit none default
test_pool_1558402378/spacetest filesystem_count none default
test_pool_1558402378/spacetest snapshot_count none default
test_pool_1558402378/spacetest snapdev hidden default
test_pool_1558402378/spacetest acltype off default
test_pool_1558402378/spacetest context none default
test_pool_1558402378/spacetest fscontext none default
test_pool_1558402378/spacetest defcontext none default
test_pool_1558402378/spacetest rootcontext none default
test_pool_1558402378/spacetest relatime off default
test_pool_1558402378/spacetest redundant_metadata all default
test_pool_1558402378/spacetest overlay off default
test_pool_1558402378/spacetest encryption off default
test_pool_1558402378/spacetest keylocation none default
test_pool_1558402378/spacetest keyformat none default
test_pool_1558402378/spacetest pbkdf2iters 0 default
test_pool_1558402378/spacetest special_small_blocks 0 default
test_pool_1558402378/spacetest_enc type filesystem -
test_pool_1558402378/spacetest_enc creation Mon May 20 18:33 2019 -
test_pool_1558402378/spacetest_enc used 204M -
test_pool_1558402378/spacetest_enc available 607M -
test_pool_1558402378/spacetest_enc referenced 204M -
test_pool_1558402378/spacetest_enc compressratio 1.00x -
test_pool_1558402378/spacetest_enc mounted yes -
test_pool_1558402378/spacetest_enc quota none default
test_pool_1558402378/spacetest_enc reservation none default
test_pool_1558402378/spacetest_enc recordsize 128K default
test_pool_1558402378/spacetest_enc mountpoint /test_pool_1558402378/spacetest_enc local
test_pool_1558402378/spacetest_enc sharenfs off default
test_pool_1558402378/spacetest_enc checksum on default
test_pool_1558402378/spacetest_enc compression lz4 inherited from test_pool_1558402378
test_pool_1558402378/spacetest_enc atime off inherited from test_pool_1558402378
test_pool_1558402378/spacetest_enc devices on default
test_pool_1558402378/spacetest_enc exec on default
test_pool_1558402378/spacetest_enc setuid on default
test_pool_1558402378/spacetest_enc readonly off default
test_pool_1558402378/spacetest_enc zoned off default
test_pool_1558402378/spacetest_enc snapdir hidden default
test_pool_1558402378/spacetest_enc aclinherit restricted default
test_pool_1558402378/spacetest_enc createtxg 8 -
test_pool_1558402378/spacetest_enc canmount on default
test_pool_1558402378/spacetest_enc xattr on default
test_pool_1558402378/spacetest_enc copies 1 default
test_pool_1558402378/spacetest_enc version 5 -
test_pool_1558402378/spacetest_enc utf8only off -
test_pool_1558402378/spacetest_enc normalization none -
test_pool_1558402378/spacetest_enc casesensitivity sensitive -
test_pool_1558402378/spacetest_enc vscan off default
test_pool_1558402378/spacetest_enc nbmand off default
test_pool_1558402378/spacetest_enc sharesmb off default
test_pool_1558402378/spacetest_enc refquota none default
test_pool_1558402378/spacetest_enc refreservation none default
test_pool_1558402378/spacetest_enc guid 2447044588944125459 -
test_pool_1558402378/spacetest_enc primarycache all default
test_pool_1558402378/spacetest_enc secondarycache all default
test_pool_1558402378/spacetest_enc usedbysnapshots 0B -
test_pool_1558402378/spacetest_enc usedbydataset 204M -
test_pool_1558402378/spacetest_enc usedbychildren 0B -
test_pool_1558402378/spacetest_enc usedbyrefreservation 0B -
test_pool_1558402378/spacetest_enc logbias latency default
test_pool_1558402378/spacetest_enc objsetid 134 -
test_pool_1558402378/spacetest_enc dedup off default
test_pool_1558402378/spacetest_enc mlslabel none default
test_pool_1558402378/spacetest_enc sync standard default
test_pool_1558402378/spacetest_enc dnodesize legacy default
test_pool_1558402378/spacetest_enc refcompressratio 1.00x -
test_pool_1558402378/spacetest_enc written 204M -
test_pool_1558402378/spacetest_enc logicalused 102M -
test_pool_1558402378/spacetest_enc logicalreferenced 102M -
test_pool_1558402378/spacetest_enc volmode default default
test_pool_1558402378/spacetest_enc filesystem_limit none default
test_pool_1558402378/spacetest_enc snapshot_limit none default
test_pool_1558402378/spacetest_enc filesystem_count none default
test_pool_1558402378/spacetest_enc snapshot_count none default
test_pool_1558402378/spacetest_enc snapdev hidden default
test_pool_1558402378/spacetest_enc acltype off default
test_pool_1558402378/spacetest_enc context none default
test_pool_1558402378/spacetest_enc fscontext none default
test_pool_1558402378/spacetest_enc defcontext none default
test_pool_1558402378/spacetest_enc rootcontext none default
test_pool_1558402378/spacetest_enc relatime off default
test_pool_1558402378/spacetest_enc redundant_metadata all default
test_pool_1558402378/spacetest_enc overlay off default
test_pool_1558402378/spacetest_enc encryption aes-256-ccm -
test_pool_1558402378/spacetest_enc keylocation file:///home/cfg/filesystem/zfs/1558402378/temp_zfs_key local
test_pool_1558402378/spacetest_enc keyformat raw -
test_pool_1558402378/spacetest_enc pbkdf2iters 0 default
test_pool_1558402378/spacetest_enc encryptionroot test_pool_1558402378/spacetest_enc -
test_pool_1558402378/spacetest_enc keystatus available -
test_pool_1558402378/spacetest_enc special_small_blocks 0 default
Going to a 10G test pool changes the result to 20M -> 150M.
@jakeogh the size discrepancy your observing for an encrypted dataset, while not intuitive, is as expected. There are two major factors at play here resulting in the increased size.
Normally in ZFS all of a filesystem's metadata is compressed when stored on disk (even when compression=off). This can drastically reduce the space requirements, particularly when storing zero-length files. In your non-encrypted directory case all of the dnodes (aka. inodes) can be compressed and stored in only 3.32M of space (see the dsize column below). The top-level directory takes 16M and each empty directory is stored inside the dnode and requires no additional space. This is why it only takes about 20M total (which is quite small for 100,000 files).
Dataset test_pool_1558722045/spacetest [ZPL], ID 144, cr_txg 6, 19.3M, 100007 objects
Object lvl iblk dblk dsize dnsize lsize %full type
0 6 128K 16K 3.32M 512 48.8M 99.98 DMU dnode
2 3 128K 16K 16.0M 512 16.0M 100.00 ZFS directory
100 1 128K 512 0 512 512 100.00 ZFS directory
However, when encryption is enabled it's no longer possible to compress the entire 'DMU dnode' object on disk. This is because we need to be careful to not encrypt portions of the dnode, this allows the pool to be scrubbed without the encryption keys loaded. Stored uncompressed, it now takes 98M in your example plus 16M again for the top-level directory gets us to 114M.
Dataset test_pool_1558722045/spacetest_enc [ZPL], ID 151, cr_txg 8, 212M, 100007 objects
Object lvl iblk dblk dsize dnsize lsize %full type
0 6 128K 16K 98.1M 512 48.8M 99.98 DMU dnode
2 3 128K 16K 16.0M 512 16.0M 100.00 ZFS directory
The second factor I mentioned is what accounts the for additional ~100M of used space. Encrypted filesystems cannot use ZFS's embedded data feature. That means these tiny empty directories cannot be stored inside the dnode like above. They must instead be stored in their own 512b block, and since they're metadata by default two copies are stored requiring 1k per-directory.
Object lvl iblk dblk dsize dnsize lsize %full type
100 1 128K 512 1K 512 512 100.00 ZFS directory
This is close to the worst case scenario in terms of space usage for the encryption feature. It requires approximately 2k of space for every empty directory created, instead of 200b in the unencrypted case (10x).
On the surface that looks pretty bad, but it's worth taking a step back and comparing the usage to say ext4. Assuming you rely on the default ext4 behavior your block size will likely be 4k requiring ~400M for 100,000 files. In the best, case you've manually specified 1k when formatting the filesystem and it will only take 100M.
Thank you @behlendorf for the detailed explanation. Indeed I get ~400M for the same test on ext4.
Most helpful comment
@jakeogh the size discrepancy your observing for an encrypted dataset, while not intuitive, is as expected. There are two major factors at play here resulting in the increased size.
Normally in ZFS all of a filesystem's metadata is compressed when stored on disk (even when
compression=off). This can drastically reduce the space requirements, particularly when storing zero-length files. In your non-encrypted directory case all of the dnodes (aka. inodes) can be compressed and stored in only 3.32M of space (see the dsize column below). The top-level directory takes 16M and each empty directory is stored inside the dnode and requires no additional space. This is why it only takes about 20M total (which is quite small for 100,000 files).However, when encryption is enabled it's no longer possible to compress the entire 'DMU dnode' object on disk. This is because we need to be careful to not encrypt portions of the dnode, this allows the pool to be scrubbed without the encryption keys loaded. Stored uncompressed, it now takes 98M in your example plus 16M again for the top-level directory gets us to 114M.
The second factor I mentioned is what accounts the for additional ~100M of used space. Encrypted filesystems cannot use ZFS's embedded data feature. That means these tiny empty directories cannot be stored inside the dnode like above. They must instead be stored in their own 512b block, and since they're metadata by default two copies are stored requiring 1k per-directory.
This is close to the worst case scenario in terms of space usage for the encryption feature. It requires approximately 2k of space for every empty directory created, instead of 200b in the unencrypted case (10x).
On the surface that looks pretty bad, but it's worth taking a step back and comparing the usage to say ext4. Assuming you rely on the default ext4 behavior your block size will likely be 4k requiring ~400M for 100,000 files. In the best, case you've manually specified 1k when formatting the filesystem and it will only take 100M.