Zfs: ZFS native encryption, GCM file size limitations, questions on best cipher encryption mode

Created on 2 Oct 2017  路  8Comments  路  Source: openzfs/zfs

Just stumbled upon the following:

https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340

Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.

Is that something to worry about ?

Also there seem to be a few shortcomings compared to e.g. HMAC:

https://crypto.stackexchange.com/questions/10775/practical-disadvantages-of-gcm-mode-encryption/10808#10808

The authentication part of GCM (GHASH) is weaker than HMAC, GHASH provides maximum 128-bit authentication tag, where as HMAC allows lot longer tags (HMAC-SHA-256 would allow 256-bit authentication tag). In addition, forgery of GHASH tags in some cases is easier than HMAC

I remember Linux (or cryptsetup) offering XTS mode,

the following entry provides some interesting background info on it:

https://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

So XTS wouldn't be better suited ?

Most helpful comment

Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.

We use a new IV for each block of data (by default blocks are 128k each). In addition we also generate a new encryption key for every 40000000 (IIRC) blocks we encrypt and every time you import the pool.

The authentication part of GCM (GHASH) is weaker than HMAC, GHASH provides maximum 128-bit authentication tag, where as HMAC allows lot longer tags (HMAC-SHA-256 would allow 256-bit authentication tag). In addition, forgery of GHASH tags in some cases is easier than HMAC

We do not currently support 256 bit authentication tags. That said, most commonly used applications today (including TLS) only use 128 bit MACs anyway. At the moment this is pretty widely accepted as the industry standard. The attacks you mentioned are very new (as far as I'm aware anyway) and only apply to applications which use a truncated MAC (less than 128 bits). We use the full 128 bits so this is not an issue here. This could represent a weakness in the future, but for right now nobody is aware of any security problems with GCM using 128 bit MACs. This is also the reason we chose CCM as the default mode, simply as a precaution and despite the better performance that GCM offers.

The encryption implementation is designed to allow for newer and better cipher suites as existing ones are broken, which is simply the nature of the cyber security industry.

So XTS wouldn't be better suited ?

XTS is not really a good choice for this particular application. XTS is meant for full-disk encryption which has different requirements than the streaming ciphers we use here. The biggest limitation of cipher suites like XTS is that they cannot use extra disk space to store other encryption parameters. This means that they cannot support randomly generated IVs (which protect against recognizable patterns in your data) or MACs (which ensure that the data you just decrypted was actually something you encrypted and not just garbage that was written to confuse you). We wanted our implementation to protect against both of these things so we opted not to support it.

Let me know if you have any other questions.

All 8 comments

CC: @tcaputi

I hope the questions aren't too trivial (having no knowledge in crypto theory, etc.),

just wanting to make sure ZFS gets the greatest, latest and best suited security :)

Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.

We use a new IV for each block of data (by default blocks are 128k each). In addition we also generate a new encryption key for every 40000000 (IIRC) blocks we encrypt and every time you import the pool.

The authentication part of GCM (GHASH) is weaker than HMAC, GHASH provides maximum 128-bit authentication tag, where as HMAC allows lot longer tags (HMAC-SHA-256 would allow 256-bit authentication tag). In addition, forgery of GHASH tags in some cases is easier than HMAC

We do not currently support 256 bit authentication tags. That said, most commonly used applications today (including TLS) only use 128 bit MACs anyway. At the moment this is pretty widely accepted as the industry standard. The attacks you mentioned are very new (as far as I'm aware anyway) and only apply to applications which use a truncated MAC (less than 128 bits). We use the full 128 bits so this is not an issue here. This could represent a weakness in the future, but for right now nobody is aware of any security problems with GCM using 128 bit MACs. This is also the reason we chose CCM as the default mode, simply as a precaution and despite the better performance that GCM offers.

The encryption implementation is designed to allow for newer and better cipher suites as existing ones are broken, which is simply the nature of the cyber security industry.

So XTS wouldn't be better suited ?

XTS is not really a good choice for this particular application. XTS is meant for full-disk encryption which has different requirements than the streaming ciphers we use here. The biggest limitation of cipher suites like XTS is that they cannot use extra disk space to store other encryption parameters. This means that they cannot support randomly generated IVs (which protect against recognizable patterns in your data) or MACs (which ensure that the data you just decrypted was actually something you encrypted and not just garbage that was written to confuse you). We wanted our implementation to protect against both of these things so we opted not to support it.

Let me know if you have any other questions.

@tcaputi works for me, thanks for the quick reply and to the point, illuminating answers :)

@tcaputi Why not use OCB instead of GCM? Afaik. it is for free and it is superior. It would be nice to have support for different ciphers too, for example serpent.

@inf3rno

Why not use OCB instead of GCM? Afaik. it is for free and it is superior.

The first big reason is that OCB is technically patented. Even if the creator has stated that he will allow open source projects to use it mostly for free, there is still the legal hassle of actually getting that license. From what I have read, this is actually the primary reason why OCB is not more commonly used.

Second, ZFS's encryption implementation is based on The Illumos Crypto Port (ICP) which is basically just a port of the Illumos kernel's crypto library. The Illumos kernel doesn't support OCB as far as I am aware and I (not being a real cryptographer) do not feel comfortable adding the implementation myself.

Lastly (and probably most importantly) the ZFS encryption implementation relies in part on GCM and CCM's support for additional authenticated data (AAD), which OCB does not support.

It would be nice to have support for different ciphers too, for example serpent.

The implementation itself is very flexible and it should be very easy for developers to add support for new block ciphers, modes and secure checksums. The only requirements are that it must be implemented into the ICP first and that the new algorithms must support the same features as the current ones. This will be important in the future as encryption algorithms are constantly being broken and newer better ones are being created. At the moment, I am spending most of my time hardening the implementation first, to make sure that the ZFS-connected parts work as advertised.

The first big reason is that OCB is technically patented. Even if the creator has stated that he will allow open source projects to use it mostly for free, there is still the legal hassle of actually getting that license. From what I have read, this is actually the primary reason why OCB is not more commonly used.

Maybe I am undereducated in the topic, but wouldn't it be just a few mail exchanges with the author and some paperwork?

I (not being a real cryptographer) do not feel comfortable adding the implementation myself.

I can understand that.

Lastly (and probably most importantly) the ZFS encryption implementation relies in part on GCM and CCM's support for additional authenticated data (AAD), which OCB does not support.

Can you elaborate on this please?

I am spending most of my time hardening the implementation first, to make sure that the ZFS-connected parts work as advertised.

Thank you for your work! :-)

Ohh actually I am here just to ask how integrity check goes by natively encrypted ZFS? Will it use the authentication tag of GCM or is it just the same as by regular ZFS, probably CRC? Do we have performance issues because of this?

Maybe I am undereducated in the topic, but wouldn't it be just a few mail exchanges with the author and some paperwork?

The problem there (as far as I'm aware) is that these emails might have to involve lawyers, which is usually a high barrier to entry. We could probably look into it, but since we need this code in the ICP, it would probably make sense to have the people at the Illumos project look into getting a license.

Can you elaborate on this please?

Support for AAD is the ability for an encryption mode to generate something called a message authentication code (MAC) that protects both the encrypted data and additional unencrypted data. The MAC protects data from being modified by an unauthorized user without the authorized user noticing. In ZFS, some data needs to be kept unencrypted so that the filesystem structure is always parseable. This allows us to do things like scrub pools and perform raw sends even when the encryption keys are not loaded on the system. We use AAD to allow us to use a single MAC to protect certain kinds of blocks that contain both encrypted and unencrypted data.

OCB only generates a MAC for the encrypted data, which would mean that we would need to use twice as much space to store another MAC to protect the unencrypted data. We don't currently have room for this in the on-disk format without making a few really tough trade-offs.

Ohh actually I am here just to ask how integrity check goes by natively encrypted ZFS? Will it use the authentication tag of GCM or is it just the same as by regular ZFS, probably CRC? Do we have performance issues because of this?

We use the GCM / CCM tag to provide integrity for most of the data, along with the normal ZFS checksums (none of which are actually CRC). Some metadata is protected with SHA512-HMAC instead. Encryption does include some CPU overhead (usually on the order of 10% for really high IO workloads), but in all the testing I have seen so far, this has never been a bottleneck. We have accelerated assembly code which is used for x86_64 CPUs and further accelerated code for Intel chips supporting the AES-NI instruction set (which is almost all of them these days).

We use the GCM / CCM tag to provide integrity for most of the data, along with the normal ZFS checksums (none of which are actually CRC). Some metadata is protected with SHA512-HMAC instead. Encryption does include some CPU overhead (usually on the order of 10% for really high IO workloads), but in all the testing I have seen so far, this has never been a bottleneck. We have accelerated assembly code which is used for x86_64 CPUs and further accelerated code for Intel chips supporting the AES-NI instruction set (which is almost all of them these days).

Glad to hear that! Then it is a much better solution compared to the CRC on plaintext what btrfs uses.

Was this page helpful?
0 / 5 - 0 ratings