Zfs: QAT compression not working when recordsize larger than 128k

Created on 22 Jan 2020 · 10Comments · Source: openzfs/zfs

System information

Type | Version/Name
--- | ---
Distribution Name | Ubuntu
Distribution Version | 18.04.3 LTS
Linux Kernel | 4.15.0-74-generic
Architecture | x86_64
ZFS Version | 0.8.2-3ubuntu5
SPL Version | 0.8.2-3ubuntu5
QAT Hardware | 1.7
QAT Version | 4.6.0

Describe the problem you're observing

If the recordsize is larger than 128k the QAT device is not used for gzip compression.

recordsize = 128k

~# cat /proc/spl/kstat/zfs/qat
19 1 0x01 17 4624 91518383292 161655968258688
name type data
comp_requests 4 1083920
comp_total_in_bytes 4 142071562240
comp_total_out_bytes 4 41897448645
decomp_requests 4 0
decomp_total_in_bytes 4 0
decomp_total_out_bytes 4 0
dc_fails 4 0
encrypt_requests 4 0
encrypt_total_in_bytes 4 0
encrypt_total_out_bytes 4 0
decrypt_requests 4 0
decrypt_total_in_bytes 4 0
decrypt_total_out_bytes 4 0
crypt_fails 4 0
cksum_requests 4 1299402
cksum_total_in_bytes 4 134997487616
cksum_fails 4 0

recordsize = 1M

~# cat /proc/spl/kstat/zfs/qat
19 1 0x01 17 4624 91518383292 162445364062320
name type data
comp_requests 4 0
comp_total_in_bytes 4 0
comp_total_out_bytes 4 0
decomp_requests 4 0
decomp_total_in_bytes 4 0
decomp_total_out_bytes 4 0
dc_fails 4 0
encrypt_requests 4 0
encrypt_total_in_bytes 4 0
encrypt_total_out_bytes 4 0
decrypt_requests 4 0
decrypt_total_in_bytes 4 0
decrypt_total_out_bytes 4 0
crypt_fails 4 0
cksum_requests 4 1299402
cksum_total_in_bytes 4 134997487616
cksum_fails 4 0

Describe how to reproduce the problem

Set the dataset recordsize to anything higher than 128k and check the QAT stats.

Documentation Performance

Source

JulietDeltaGolf

All 10 comments

This is a known limitation of the QAT implementation since physically contiguous memory is required by the hardware. Attempting to allocate buffers this large would be slow and negatively impact the overall performance. Efficient use of the QAT limits the maximum record size to 128k. That said, I don't see this documented anywhere outside the source code and we should update the man page accordingly.

behlendorf on 22 Jan 2020

@richardelling if you get a chance would you mind updating the qat module options on the wiki. They're no longer to TBD, as of 0.8 there are three tunings: zfs_qat_checksum_disable, zfs_qat_compress_disable, and zfs_qat_encrypt_disable. Thanks!

behlendorf on 22 Jan 2020

This is very nice to know, because with zstd (yeah I know, "here he is again with his zstd comment" :P ) in mind, it changes the playing field quite a bit.

Even though ZSTD would give slightly higher compression than gzip (and QAT) at same record size, if QAT is limited to 128K record size it would mean a significant compression difference in situations where high record sizes could be used and thus important considerations for storage system designers.

Ornias1993 on 30 Jan 2020

module parameters are updated for qat module options on the wiki
please verify the docs, thanks

richardelling on 2 Feb 2020

👍1

Ach, would have been nice to know before buying the QAT hardware :)
A 128k recordsize offers little to no compression at all on my dataset.

@Ornias1993 I've been monitoring ZSTD integration with great interest because it should provide great performance for my use case (DPX image sequences and big recordsize if anybody wonders).

JulietDeltaGolf on 3 Feb 2020

👍1

@JulietDeltaGolf

It's slightly offtopic, but I should work on adding some 128K GZIP vs 1M ZSTD ratio benchmark numbers to the ZSTD PR. To make it clear what expected compression ratio differences between QAT and ZSTD are.

Ornias1993 on 5 Feb 2020

Use it at your own risk, but you can change the QAT_MAX_BUF_SIZE to 1MB in include/sys/qat.h. The QAT will work fine with it as long as there is enough free RAM to allocate contiguous memory (as Brian says). I've been running it that way for almost a year and the performance is good. I limit the ARC to 192GB (out of 256GB total RAM).

Again, YMMV so do it at your own risk. If you run out of RAM you may get I/O errors during reads (I think this is being fixed?). The data are fine on disk, so don't panic. See issue #9784.

luki-mbi on 11 Feb 2020

👍1

@luki-mbi Thanks for the tip. I'll give it a try !

Is the memory taken from ARC space or is it additional memory on top of ARC ?
Just to know if I should limit ARC a little bit more than it is today. (800GiB out of 1TiB)

Also wouldn't it make sense to have it as a parameter instead of a static define then ?

JulietDeltaGolf on 12 Feb 2020

The memory is allocated via kmalloc() in the normal kernel RAM (via the QAT_PHYS_CONTIG_ALLOC macro, and then qat_mem_alloc_contig()) so this is separate from the ARC. For it to succeed there must be enough free RAM for the kernel.

luki-mbi on 12 Feb 2020

@JulietDeltaGolf you can modify QAT_MAX_BUF_SIZE in https://github.com/openzfs/zfs/blob/master/include/sys/qat.h to support the larger blocksize, it is not the hardware's limitation, there are 2 reasons we set the value to 128KB:

Avoid memory waste, there are some intermediate memory which is pre-allocated based on the QAT_MAX_BUF_SIZE.
Provide the optimal performance.

But that doesn't mean QAT can't work with larger buffer size.

/*
 * The minimal and maximal buffer size which are not restricted
 * in the QAT hardware, but with the input buffer size between 4KB
 * and 128KB the hardware can provide the optimal performance.
 */
#define QAT_MIN_BUF_SIZE    (4*1024)
#define QAT_MAX_BUF_SIZE    (128*1024)