Type | Version/Name
--- | ---
Distribution Name | Arch Linux
Distribution Version | N/A
Linux Kernel | 5.2.9-arch1-1-ARCH
Architecture | x86_64
ZFS Version | git commit f09fda507 (2019-08-16)
SPL Version | ditto
It's my understanding that with the integration of #8965 SIMD support should work again and indeed I see avx2 as the fastest implementation in e.g. fletcher_4_bench, indicating that ZFS is able to use the FPU.
$ cat /proc/spl/kstat/zfs/fletcher_4_bench
5 0 0x01 -1 0 2720293623 156491491640954
implementation native byteswap
scalar 7029091918 5551234944
superscalar 9433541338 7039884753
superscalar4 8806767231 7058469765
sse2 15540142458 9209733432
ssse3 15699311619 14793781457
avx2 26057814841 23605675674
fastest avx2 avx2
But reading a file off of an encrypted filesystem peaks at 500 MB/s and uses all CPU, clearly
indicating AES-NI isn't used. I'd expect well above 1 GB/s (NVMe drive, Intel i7-8750H)
at a much lower CPU load otherwise. If SIMD optimizations are really used for checksum calculations I can't tell, but it seems to me that reading from an unencrypted filesystem produces more CPU load then before, when SIMD support was working (pre 5.0), not sure though.
How would I debug this problem? Is there anything I have to tweak to get SIMD accelerated encryption back again?
I already asked on zfs-discuss but got no enlightening input.
Thanks
Attila
PS
Please refrain from starting any "evil Linux devs" discussion, all has been said in this regard already.
Read a large file off of an encrypted filesystem and monitor throughput and CPU load.
N/A
To test I'd suggest finding (or making) a large encrypted file, bringing the machine to idle, and doing cat $BIG_ENCRYPTED_FILE > /dev/null while running perf top in another window. Within perf find the aes_encrypt_intel function near the top if you can - that should be the accelerated AES routines and you can even annotate it to verify from the disassembly.
If you can't find that function in the output or it's named something different then yeah you're probably not using AES-NI.
Avoiding the ARC for this test is up to you.
Thanks, I'm seeing aes_generic_encrypt() in perf output, so AES-NI is definitely not used. Any idea what the problem might be?
Fletcher 4 is indeed SIMD optimized, cat /unencrypted-ds/largefile >/dev/null gives fletcher_4_avx2_native() in perf output.
To add to this, cat /proc/crypto gives, among others,
name : gcm(aes)
driver : generic-gcm-aesni
module : aesni_intel
priority : 400
refcnt : 1
selftest : passed
internal : no
type : aead
async : yes
blocksize : 1
ivsize : 12
maxauthsize : 16
geniv : <none>
Since I use encryption=aes-256-gcm, I'd expect AES-NI to work.
The ZFS module is dkms compiled, do I need any special library for configure to pick up AES-NI support?
@AttilaFueloep if you're running f09fda5 then no other changes should be needed. You can check which optimized versions are available by checking the contents of the following files.
/sys/module/zfs/parameters/zfs_vdev_raidz_impl/sys/module/zcommon/parameters/zfs_fletcher_4_impl/sys/module/icp/parameters/icp_aes_impl/sys/module/icp/parameters/icp_gcm_implMicro-benchmarks indicating which version was determined to be the fastest are also available for Fletcher 4 and RAIDZ.
cat /proc/spl/kstat/zfs/fletcher_4_benchcat /proc/spl/kstat/zfs/vdev_raidz_benchEncryption is a slightly different story, no micro-benchmarks are run for AES or GCM. When an optimized version is available it is assumed to perform better than the generic code and preferentially used. There is one caveat, the accelerated version won't be used when first decrypting the wrapping key as part of zfs load-key. However, it will be used by the IO pipeline after this initial setup, so you may still see aes_generic_encrypt() appear in the perf top output. You should primarily see aes_encrypt_amd64 or aes_aesni_encrypt for the accelerated versions.
If that's not the case, we'll certainly want to did deeper.
Note: /proc/crypto provides statistics for the kernel's implementations, ZFS provides its own.
@behlendorf First of all, thank you for your detailed explanations. I've no idea how I managed to miss the fact that ZoL uses ilumos crypto, usually I do know that. The ipc module was the bit I missed.
It took me a while to sort this out, but I've a reproducer now.
On a freshly booted system with an mostly idling Desktop run the following ($pool has mountpoint=none, not sure if this matters)
# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 aesni
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq
# zfs create -o encryption=aes-256-gcm -o keyformat=passphrase $pool/foo
Enter passphrase:
Re-enter passphrase:
# zfs set mountpoint=/foo $pool/foo
# dd if=/dev/urandom of=/foo/bar bs=1M count=$((1024*32)) (my arc size is 20G)
# cat /foo/bar >/dev/null
^C
While cat is running do a perf top in another terminal.
Overhead Shared Object Symbol
37.91% [icp] [k] aes_generic_encrypt
30.44% [icp] [k] gcm_pclmulqdq_mul
6.81% [icp] [k] aes_xor_block
5.31% [icp] [k] aes_encrypt_block
3.21% [icp] [k] gcm_mul_pclmulqdq
2.39% [kernel] [k] __x86_indirect_thunk_rax
2.36% [icp] [k] gcm_decrypt_final
aes_aesni_encrypt() doesn't show up, aes_generic_encrypt() stays on top all the time.
If you do an unmount/mount cycle of /foo the implementation used changes from generic to aesni
# zfs unmount $pool/foo
# zfs mount $pool/foo
# cat /foo/bar >/dev/null
Overhead Shared Object Symbol
31.35% [icp] [k] gcm_pclmulqdq_mul
29.12% [icp] [k] aes_aesni_encrypt
6.78% [icp] [k] aes_xor_block
5.62% [icp] [k] aes_encrypt_intel
4.21% [icp] [k] aes_encrypt_block
3.33% [icp] [k] gcm_mul_pclmulqdq
3.25% [kernel] [k] preempt_count_add
2.87% [kernel] [k] preempt_count_sub
2.55% [kernel] [k] __x86_indirect_thunk_rax
2.21% [icp] [k] gcm_decrypt_final
GCM seems to pick up the expected implementation.
@AttilaFueloep I was able to reproduce this issue locally and understand the issue. I'll see about putting together a patch.
I've had Encrypted ZFS slow down _very_ significantly and I think I have the same issue:
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /sys/module/zfs/parameters/zfs_vdev_raidz_impl
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cycle [fastest] original scalar sse2 ssse3 avx2
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /sys/module/zcommon/parameters/zfs_fletcher_4_impl
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ [fastest] scalar superscalar superscalar4 sse2 ssse3 avx2
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /sys/module/icp/parameters/icp_aes_impl
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cycle [fastest] generic x86_64 aesni
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /sys/module/icp/parameters/icp_gcm_impl
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ cycle [fastest] generic pclmulqdq
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /proc/spl/kstat/zfs/fletcher_4_bench
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 5 0 0x01 -1 0 3279387634 342555257913
2 โ implementation native byteswap
3 โ scalar 7035679076 2998336463
4 โ superscalar 8965934583 6416057333
5 โ superscalar4 8414687198 6858908607
6 โ sse2 14925732425 8871957784
7 โ ssse3 14383096130 14117282701
8 โ avx2 24660391165 22193686536
9 โ fastest avx2 avx2
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File: /proc/spl/kstat/zfs/vdev_raidz_bench
โโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 18 0 0x01 -1 0 4599695826 342555694538
2 โ implementation gen_p gen_pq gen_pqr rec_p rec_q rec_r rec_pq rec_pr rec_qr rec_pqr
3 โ original 520363481 306862956 123794189 1338949457 301620445 45436110 116458955 26789302 26037614 18109890
4 โ scalar 1912009164 469128628 198106814 1759170367 529138724 397300883 243088353 203142018 133218276 114297165
5 โ sse2 3088572787 1313742388 688781527 3002038366 992581132 835532305 545011486 523304898 315135892 150529417
6 โ ssse3 3093255318 1300630695 676483229 3063879250 1628912263 1308276558 995395737 886620336 646762789 491185059
7 โ avx2 5792809931 2198491448 1191114729 5569530285 3160246027 2545778233 1794212965 1572169415 1191238109 910534839
8 โ fastest avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2 avx2
โโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
@lovesegfault The issue here is that newly created datasets do not pick up the fastest AES implementation until the dataset is remounted or the machine is rebooted.
What implementation do you see getting used if you follow DeHackEds suggestion or the first part of my reproducer? What bandwidth are you observing on what hardware? I'm seeing 500 MB/s with all cores at 100% regardless of the AES implementation used. It does seem that the GCM calculation are the limiting factor. Currently I'm taking a stab at speeding things up, lets see how this goes.
Thanks for commenting, I should have a patch ready for testing by the end of the week.
Take your time, it's easy to work around.
@AttilaFueloep I use ZFS as my root drive, I can't just unmount/remount, so this issue basically means my whole system is always slow. It's annoying, but at least everything continues to work :)
@lovesegfault
I use ZFS as my root drive
I do as well,still I'm seeing the SIMD versions getting used. It's hard to tell more without knowing any details.
@behlendorf I mentioned in the PR, but I figured this is a better place, it seems to me that somehow the SIMD algos are still not being picked.
https://github.com/zfsonlinux/zfs/pull/9296#issuecomment-530047324
Most helpful comment
@behlendorf First of all, thank you for your detailed explanations. I've no idea how I managed to miss the fact that ZoL uses ilumos crypto, usually I do know that. The
ipcmodule was the bit I missed.It took me a while to sort this out, but I've a reproducer now.
On a freshly booted system with an mostly idling Desktop run the following ($pool has mountpoint=none, not sure if this matters)
While
catis running do aperf topin another terminal.aes_aesni_encrypt()doesn't show up,aes_generic_encrypt()stays on top all the time.If you do an unmount/mount cycle of
/foothe implementation used changes fromgenerictoaesniGCM seems to pick up the expected implementation.