Originally described by @keithc-ca in #9429 comment.
NativeChaCha20Cipher.EngineAEADDec appears to capture the entire input in a ByteArrayOutputStream: if the input is large (think gigabytes) this is not reasonable.
To elaborate:
cipher.engineUpdate or cipher.engineFinal with a ByteBuffer as input, bufferCrypt will be called (as shown here and here).bufferCrypt calls one of the engineUpdate and engineDoFinal functions in NativeChaCha20Cipher.java.doUpdate call. In case of AEAD decryption, the doUpdate copies the input to a buffercipher.engineUpdate continuously with large input sizes (gigabytes) will cause a crash.looked into the code.
doUpdate returns 0 or a 0 byte array. doFinal returns cumulated input size or a byte array with encrypted output. ByteArrayOutputStream buffer when invoke doUpdate, and encrypt all inputs until calling doFinal. doFinal, which means no mater what container, the same issue is reproduced when input is large. ideas about to solve it
ByteArrayOutputStream. If the buffer overflows, doFinal is invoked no matter doFinal is called or not. The idea should be helpful for a cumulated large input, such as calling doUpdate multiple times with small inputs. However, it may have no help about inputing a large size of data at one time.
May i have any suggestions, thanks.
limit the size of ByteArrayOutputStream. If the buffer overflows, doFinal is invoked no matter doFinal is called or not
In case the buffer overflows, you would need to call doUpdate and not doFinal. This is because doFinal will finalize the operation so you can't call doUpdate afterwards. So for example, consider a case when the internal buffer size is 10KB and the user is trying to encrypt 1MB of data (divided into 1KB chunks). The buffer will overflow after few calls for doUpdate. When the buffer overflows, you should not call doFinal because if you do, the cipher will be finalized and it can't encrypt the rest of the data.
As we discussed with @ashbm5 today, a better approach for now is to encrypt the data as soon as the doUpdate receives them. No need to buffer them.
However, it does nothing about inputing a large size of data at one time.
Yes that is still a valid concern.
We're coming up on the v0.21.0 code split on June 7. Will this be ready and merged before then?
@DanHeidinga @keithc-ca We actually have the change ready for review. However, the implementation problem pointed out by Keith - that the input data is stored in a buffer and not processed immediately, which can cause out of memory problems, is also shared with OpenJDK java (non-OpenSSL) implementation.
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java#L1339
This behavior is expected in a ChaCha20 unit test so we will also need to modify a ChaCha20. Strictly speaking, it should be fine to modify since the crypto API allows for both types of behaviors (immediate return or storing in a buffer and returning at some other point in time). However, since this is a change in external API behavior, it may break something unexpected at some future point.
This behavior is expected in a ChaCha20 unit test
So both OpenJDK's implementation and our implementation have the same limitation of needing to buffer the input until it's complete? Sounds like something we shouldn't be changing unless there's a compelling reason to, unless I've missed something
Or we should make the change in OpenJDK and contribute it first.
This is an enhancement. Since it's not delivered before the 0.21 branch, deferring to the next release.
Moving to the backlog.
Most helpful comment
In case the buffer overflows, you would need to call
doUpdateand notdoFinal. This is becausedoFinalwill finalize the operation so you can't calldoUpdateafterwards. So for example, consider a case when the internal buffer size is 10KB and the user is trying to encrypt 1MB of data (divided into 1KB chunks). The buffer will overflow after few calls fordoUpdate. When the buffer overflows, you should not calldoFinalbecause if you do, the cipher will be finalized and it can't encrypt the rest of the data.As we discussed with @ashbm5 today, a better approach for now is to encrypt the data as soon as the
doUpdatereceives them. No need to buffer them.Yes that is still a valid concern.