Brotli is a generic-purpose lossless compression algorithm that compresses data
using a combination of a modern variant of the LZ77 algorithm, Huffman coding
and 2nd order context modeling, with a compression ratio comparable to the best
currently available general-purpose compression methods. It is similar in speed
to deflate but offers more dense compression.
The specification of the Brotli Compressed Data Format is defined in RFC 7932.
Brotli encoding is supported by most web browsers, major web servers, and some CDNs (Content Delivery Networks).
The API surface area for BrotliStream is identical to that of DeflateStream but with added bufferSize
constructors.
```C#
public partial class BrotliStream : System.IO.Stream
{
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel);
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel, bool leaveOpen);
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionLevel compressionLevel, bool leaveOpen, int bufferSize);
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode);
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, bool leaveOpen);
public BrotliStream(System.IO.Stream stream, System.IO.Compression.CompressionMode mode, bool leaveOpen, int bufferSize);
public System.IO.Stream BaseStream { get; }
public override bool CanRead { get; }
public override bool CanSeek { get; }
public override bool CanWrite { get; }
public override long Length { get; }
public override long Position { get; set; }
protected override void Dispose(bool disposing);
public override void Flush();
public override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback asyncCallback, object asyncState);
public override int EndRead(IAsyncResult asyncResult);
public override int Read(byte[] array, int offset, int count);
public override System.Threading.Tasks.Task
public override long Seek(long offset, System.IO.SeekOrigin origin);
public override void SetLength(long value);
public override IAsyncResult BeginWrite(byte[] array, int offset, int count, AsyncCallback asyncCallback, object asyncState);
public override void EndWrite(IAsyncResult asyncResult);
public override void Write(byte[] array, int offset, int count);
public override System.Threading.Tasks.Task WriteAsync(byte[] array, int offset, int count, System.Threading.CancellationToken cancellationToken);
}
### Example Usage
The BrotliStream behavior is the same as that of DeflateStream or GZipStream to allow easily converting DeflateStream/GZipStream code to use BrotliStream.
```C#
public static Stream Compress_Stream(Stream inputStream)
{
var outputStream = new MemoryStream();
var compressor = new BrotliStream(outputStream, CompressionMode.Compress, true);
inputStream.CopyTo(compressor);
compressor.Dispose();
return outputStream;
}
public static Stream Decompress_Stream(Stream inputStream)
{
var outputStream = new MemoryStream();
var decompressor = new BrotliStream(inputStream, CompressionMode.Decompress, true);
decompressor.CopyTo(outputStream);
decompressor.Dispose();
return outputStream;
}
The goal of the streamless implementation is to provide a non-allocating, performant Brotli implementation free from Streams. It contains simple Compress/Decompress operations that return an enum indicating the success of the operation as well as static CompressFully/DecompressFully operations that allow single-pass compression/decompression without the need for a BrotliEncoder/BrotliDecoder instance.
```C#
public struct BrotliDecoder : System.IDisposable
{
public System.Buffers.OperationStatus Decompress(System.ReadOnlySpan
public static bool DecompressFully(System.ReadOnlySpan
public void Dispose() { }
}
public struct BrotliEncoder : System.IDisposable
{
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan
public static bool CompressData(System.ReadOnlySpan
public static bool CompressData(System.ReadOnlySpan
public System.Buffers.OperationStatus CompressFinal(System.Span
public void Dispose() { }
public static int GetMaximumCompressedSize(int inputSize) { throw null; }
public void SetQuality(int quality) { }
public void SetWindow(int window) { }
}
### Design Questions
Should we allow setting the Quality/Window via `Set_` functions of make them constructor variables? They must be set before encoding either way.
#### BrotliEncoder SetQuality/SetWindows vs constructor overloads:
```C#
public struct BrotliEncoder : System.IDisposable
{
...
public void SetQuality(int quality) { }
public void SetWindow(int window) { }
}
public struct BrotliEncoder : System.IDisposable
{
public BrotliEncoder() {}
public BrotliEncoder(int quality, int window) {}
...
}
Should there be an option for intermediate flushes or only for finalize? The main use case of an intermediate Flush is if you want to get more of the outputted bytes but aren鈥檛 yet done supplying input to the compressor.
```C#
// Allow Intermediate Flushes
public partial struct BrotliEncoder : System.IDisposable
{
...
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan
public System.Buffers.OperationStatus CompressFinal(System.Span
...
}
// Disallow Intermediate Flushes
public partial struct BrotliEncoder : System.IDisposable
{
...
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan
public System.Buffers.OperationStatus CompressFinal(System.Span
...
}
#### Allow input to Flush/Finalize?
I prefer the simpler Flush/Finalize that don鈥檛 take input, but the underlying call allows input if we decide that鈥檚 more usable. If we go that route then we could potentially just condense the API down to one function.
```C#
// Do not allow input to Finalize/Flush
public partial struct BrotliEncoder : System.IDisposable
{
...
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; }
public System.Buffers.OperationStatus CompressFinal(System.Span<byte> destination, out int bytesWritten) { bytesWritten = default(int); throw null; }
...
}
// Allow input to Finalize/Flush
public partial struct BrotliEncoder : System.IDisposable
{
...
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; }
public System.Buffers.OperationStatus CompressFinal(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten) { bytesConsumed = default(int); bytesWritten = default(int); throw null; }
...
}
// Allow finalization in the Compress method.
public partial struct BrotliEncoder : System.IDisposable
{
...
public System.Buffers.OperationStatus Compress(System.ReadOnlySpan<byte> source, System.Span<byte> destination, out int bytesConsumed, out int bytesWritten, bool isFinished = false) { bytesConsumed = default(int); bytesWritten = default(int); throw null; }
...
}
```C#
// Static single-pass compress/decompress
BrotliEncoder.TryCompress(...) vs BrotliEncoder.TryCompressData(...) vs BrotliEncoder.CompressFully(...) vs BrotliEncoder.CompressSingle vs BrotliEncoder.CompressData
// Iterative compress/decompress
BrotliEncoderInstance.Compress vs BrotliEncoderInstance.CompressSegment
### Example Usage
```C#
public interface IOutput
{
Span<byte> Buffer { get; };
void Commit(int bytes);
void Resize(int minimumSize);
}
// This code is very naive, but it does illustrate a pipe scenario
public static void Compress_WithState(ReadOnlyMemory<byte>[] inputs, IOutput output)
{
BrotliEncoder encoder;
for(int i=0; i<inputs.Length; i++)
{
var input = inputs[i];
while (!input.IsEmpty)
{
var buffer = output.Buffer;
encoder.Compress(input, buffer, out int bytesConsumed, out int written);
output.Commit(written);
input = input.Slice(bytesConsumed);
}
}
encoder.Flush(output, out int bytesWritten, isFinished: true);
encoder.DIspo
}
public static void Decompress_WithState(ReadOnlySpan<byte>[] inputs, IOutput output)
{
BrotliDecoder decoder;
for(int i=0; i<inputs.Length; i++)
{
var input = inputs[i];
while (!decoder.IsFinished() && !input.IsEmpty)
{
var buffer = output.Buffer;
decoder.Decompress(input, buffer, out int bytesConsumed, out int written);
output.Commit(written);
input = input.Slice(bytesConsumed);
}
}
decoder.Dispose();
}
public static void Compress_WithoutState(ReadOnlySpan<byte> input, Span<byte> output)
{
BrotliEncoder.CompressFully(input, output, out int bytesWritten);
}
public static void Decompress_WithoutState(ReadOnlySpan<byte> input, Span<byte> output)
{
BrotliDecoder.DecompressFully(input, output, out int bytesWritten);
}
The implementation will be based around the c code provided by Google that will be inserted into our existing native Compression libraries (clrcompression (Windows) and System.IO.Compression.Native (Unix). In CoreFX we'll have a managed wrapper to pinvoke into the native brotli implementation and provide the above API around it, same as we do for zlib. See https://github.com/dotnet/corefxlab/issues/1673 for a discussion on the pros of cons of a fully managed implementation and my justification for using the native approach (at least for now). Performance testing to come later, with the implementation PR.
This proposal is an evolution of the CoreFXLab implementation of Brotli.
This is a component of https://github.com/dotnet/corefx/issues/24826
PTAL: @joshfree @KrzysztofCwalina @GrabYourPitchforks @ViktorHofer @stephentoub @terrajobst @ahsonkhan @JeremyKuhne
I don't think the non-streaming samples are illustrating what we are after. They will never go through more than one itteration of the loop. The right sample would be something like the following:
```C#
public interface IOutput
{
Span
void Commit(int bytes);
void Resize(int minimumSize);
}
// This code is very naive, but it does illustrate a pipe scenario
public static void PipeCompress(ReadOnlyMemory
{
BrotliEncoder encoder;
for(int i=0; i<inputes.Length; i++) {
var input = inputs[i];
while (!input.IsEmpty)
{
var buffer = output.Buffer;
encoder.Compress(input, buffer, out int bytesConsumed, out int written);
output.Commit(written);
input = input.Slice(bytesConsumed);
}
}
encoder.Flush(output, out int bytesWritten, isFinished: true);
}
```
The BrotliResult enum is very similar (if not identical) to the OperationStatus enum in the System.Buffers namespace that we use for our Base64 Encoder and Decoder. Can we use OperationStatus as the returned enum in Brotli as well?
https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/OperationStatus.cs
We should make sure we agree on the enum as part of this API review (leftover from previous API review: https://github.com/dotnet/corefx/issues/22412).
cc @karelz
@ahsonkhan where was the OperationStatus approved as API? I asked the question on dotnet/runtime#22845 but didn't get answer :)
The BrotliResult enum is very similar (if not identical) to the OperationStatus enum in the System.Buffers namespace that we use for our Base64 Encoder and Decoder. Can we use OperationStatus as the returned enum in Brotli as well?
Oh, yes, I didn't see that OperationStatus was in CoreFX already. It is interchangeable with BrotliStatus. I'll update the proposal.
I don't think the non-streaming samples are illustrating what we are after. They will never go through more than one itteration of the loop. The right sample would be something like the following:
Updated the spec with the more accurate use-case examples.
I'm not convinced of the value of a struct-based encoder / decoder as opposed to a class-based encoder / decoder. At its core due to the interop you have a native handle under the covers. These handles need to have definite lifetimes, which means that somebody needs to make sure that handles are closed properly and that they're no longer used after they're closed. It's _very_ difficult to make these guarantees with structs, and I'm afraid that we'll lead developers into a pit of failure if we go this direction. Indeed, this is why SafeHandle
and related types exist. I'd personally advocate switching the type to _class_, sacrificing one or two heap allocations per compressor / decompressor instance to avoid these reliability problems.
@GrabYourPitchforks I tend to agree. I'd prefer to see the internal handle represented by a SafeHandle
and the type garbage collectable to ensure proper disposal of the handle in all cases, but that would come at the cost of no longer having the non-allocating benefit of the API. It's definitely the more power user centric of the two approaches and in novice hands will often end in the underlying native types not being cleaned up.
That said, I do see the value in being allocation free and that's why I made it a struct. I tried to lessen the potential for erroneous code by making it an IDisposable struct, though it's still on the developer to actually do the disposal themselves.
I'd be interested to hear other people's opinions on this. Historically, using structs in this manner has been a big no-no and is warned against in almost every article I could find, so whether the minor perf gain is worth the cost is questionable. I'll run some tests and see how much of a hit we take if we make it into a class instead.
Even if there is a measurable cost to switching to _class_ (and that would honestly surprise me), callers can work around it by pooling encoder / decoder instances. There's nothing that requires the type be a _struct_ in order to get high performance.
As an aside, I'm seeing that with this and other proposals we're trying to invent concepts analogous to C++'s deterministic destruction and non-copyable types. If this is something we think might be valuable to C# developers, we should propose these features directly, and then we can create types that are properly implemented on top of these features.
It doesn't appear to have a significant impact that I could see in my testing which isn't too surprising. I'll update the proposal to be class based, unless someone feels strongly that it should be a struct. @KrzysztofCwalina ?
where was the OperationStatus approved as API? I asked the question on dotnet/runtime#22845 but didn't get answer :)
It was implicitly approved as a placeholder to unblock the Base64 APIs and the final review was deferred, as far as I can recall, until we have more APIs (like the text encoders APIs, or in this case, compression APIs) to get a better understanding of the use cases. That is why I brought it up here since now we have more scenarios to confirm the usefulness/correctness of the enum.
We discussed the struct/class and non-allocating/allocating concern in-person and the ending compromise is to leave it as an IDiposable struct, but use a Safehandle for the state pointer.
Does anyone else have any more feedback before I mark this as ready for review?
cc: @terrajobst
Regarding OperationStatus, something to consider is that we already have an enum with a similar name in System.Net.NetworkInformation - OperationalStatus.
@ahsonkhan, good point. Maybe we should call it OperationResult
FYI: First part of the API review discussion on 2017/12/19 was recorded - see https://youtu.be/ZrT0uOsqQlI?t=6541 (5 min duration)
We reviewed the API and made some minor tweaks:
public partial class BrotliStream : Stream
{
public BrotliStream(Stream stream, CompressionLevel compressionLevel);
public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen);
public BrotliStream(Stream stream, CompressionMode mode);
public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen);
public Stream BaseStream { get; }
// We don't think we need those for now. Not having aligns the type with DeflateStream/GzipStream
// public BrotliStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen, int bufferSize);
// public BrotliStream(Stream stream, CompressionMode mode, bool leaveOpen, int bufferSize);
// Overrides omitted for clarity
}
// We discussed making these guys classes and derive from CFO ourselves instead of wrapping a SafeHandle
// but we decided against it due to complexity and only minor savings (like when these are boxed)
public struct BrotliDecoder : IDisposable
{
public OperationStatus Decompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesConsumed, out int bytesWritten);
public void Dispose();
public static bool TryDecompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
}
public struct BrotliEncoder : IDisposable
{
BrotliEncoder(int quality, int window);
public OperationStatus Compress(ReadOnlySpan<byte> source,
Span<byte> destination,
out int bytesConsumed,
out int bytesWritten,
bool isFinalBlock);
public OperationStatus Flush(Span<byte> destination, out int bytesWritten);
public void Dispose();
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static bool TryCompress(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int quality, int window);
public static int GetMaxCompressedLength(int length);
}
FYI: Second part of API review discussion on 2017/12/19 was recorded - see https://www.youtube.com/watch?v=IIKqOagRdWA (1h 43min duration) (video is mostly frozen)
which version of System.IO.Compression has BrotliStream? because I could not find it @ianhays
@guylando BrotliStream is in its own assembly (System.IO.Compression.Brotli.dll) that is included in framework versions of 2.1 preview1 forward.
For example, runtime.linux-x64.Microsoft.NETCore.App.2.2.0-preview1-26508-01.nupkg
has Brotli.
@ianhays what is the minimal nugget package to reference to get it?
because I dont reference Microsoft.NETCore.App (which from what I understand is a bundle of many packages) and instead I reference specific packages in my projects to save importing unnecessary stuff
which from what I understand is a bundle of many packages
Essentially, yeah. A few versions back it was a bundling of packages or a meta-package with a large dependency list, but now it's a primary shipping vehicle that contains most of the framework. We don't ship Brotli as a standalone package because it's included in the framework package and we don't ship it downlevel either.
cc: @weshaggard
@ianhays Do I understand correctly that its a best practice to include Microsoft.NETCore.App instead of separate packages? Our projects are (3 years old) from before the existence of Microsoft.NETCore.App so when upgrading in the past I left them as is and never moved to Microsoft.NETCore.App because sounded better to leave as is.
Do you say that if I will move from separate packages to Microsoft.NETCore.App I will not see any bad impact?
And a follow up question in case we should move to using Microsoft.NETCore.App: is there somewhere a documentation of the packages included in Microsoft.NETCore.App?
@guylando you will be on supported & tested combination of packages, which makes it actually less likely for you to hit issues.
And does Microsoft.AspNetCore.App include it also? or only Microsoft.NetCore.App?
thanks!
Do you say that if I will move from separate packages to Microsoft.NETCore.App I will not see any bad impact?
If you are targeting .NET Core then yes that is the correct thing to do. You don't actually need to reference the package directly in your application, the reference you get is controlled by the TargetFramework you are using. You will need to target netcoreapp2.1 to get these new APIs.
And does Microsoft.AspNetCore.App include it also? or only Microsoft.NetCore.App?
Microsoft.AspNetCore.App depends on Microsoft.NetCore.App so it will be part of the closure.
And is it possible to get Brotli if I target net461?
And is it possible to get Brotli if I target net461?
No we don't ship it as a library package it is only available for .NET Core 2.1+
thanks for the answers!
@weshaggard, @ianhays so brotli is not netstandard? and never will be? it is only on .net core? If I would like to include it into library that is available for netstandard, netcore, full framework it won't be possible? Could you explain decision behind that? the corefx-lab version works with netstandard, why final (preview) release not?
Because of that decision I'm not able to use Brotli in Asp NET Core targeted full framework.
@weshaggard, @ianhays so brotli is not netstandard?
That is correct. It is only supported on .NET Core today.
That isn't to say it will never be part of the .NET Standard or built as a netstandard library but there are various technical issues that come when you ship a OOB (Out-of-Band) library like this when it exists inbox for a platform like .NET Core or .NET Framework, which is why we don't currently support that scenario. If using this Brotli library on .NET Framework is a scenario you would like to see supported I suggest filing a issue for it and the owners of the library will take it under consideration.
@weshaggard Thank you for answer, I'm a bit disappointed because corefx lab version is targeted netstandard1.1.
@msmolka I understand your frustration but there are reasons for our choices. Keep in mind that corefxlab is for prototypes and experimenting so they do what is the quickest to get the job done and then once we decide to actually ship a library it moves to corefx and gets designed with all the different contexts in mind. In this case we decided it is best for Brotli to ship as part of the platform so that we can take advantage of it in our lower networking layer which is also part of the platform. Given that we made it part of the platform we decided not to ship it as an OOB library because of issues that occur with OOBs (see https://github.com/dotnet/corefx/issues/17522 for a hint of some similar troubles we had with System.Net.Http).
Please do file an separate issue if you'd like to see Brotli supported on .NET Framework and we will consider supporting it.
Hello,
Whats wrong with this code:
public static byte[] CompressBrotli(this byte[] data)
{
var destination = new Span<byte>();
var source = new ReadOnlySpan<byte>(data,0, data.Length);
var return = BrotliEncoder.TryCompress(source, destination, out int bw,11,24);
return destination.ToArray();
}
The return is false and destination is empty.
Thanks
@ricardobeckssa The destination
span isn't big enough (it has Length == 0
). You nee to make sure the destination span has enough space for the compressed content.
@khellang how can I know the space before compress? what's space should I to inform?
thanks for yor help!
data.Length
would probably cover it :wink:
@khellang thanks, I did:
public static byte[] CompressBrotli(this byte[] data)
{
var _b = new byte[data.Length].AsSpan();
var source = new ReadOnlySpan<byte>(data,0, data.Length);
BrotliEncoder.TryCompress(source, _b, out int bw,11,24);
return _b.Slice(0,bw).ToArray();
}
FYI, you can abbreviate the following:
var source = new ReadOnlySpan<byte>(data,0, data.Length);
To the following, relying on the implicit cast:
C#
ReadOnlySpan<byte> source = data; // or data.AsSpan();
Most helpful comment
Even if there is a measurable cost to switching to _class_ (and that would honestly surprise me), callers can work around it by pooling encoder / decoder instances. There's nothing that requires the type be a _struct_ in order to get high performance.
As an aside, I'm seeing that with this and other proposals we're trying to invent concepts analogous to C++'s deterministic destruction and non-copyable types. If this is something we think might be valuable to C# developers, we should propose these features directly, and then we can create types that are properly implemented on top of these features.