runtime 🚀 - HTTP2: Huffman decoding implementation is inefficient

What were your thoughts regarding how this might work? I took a look at the code, but it wasn't immediately apparent that the current implementation could be improved much. Here are my observations:

By definition, the most frequently used characters are encoded using the smallest number of bits, so the loop should terminate the majority of the time with a low iteration count given that the table is ordered by length.
Each entry in the decoding table is a different bit length, so it's not possible to turn this into some kind of hash table. You probably could, but keying into it would require extracting the first 5 bits from the value you are decoding and then checking the table followed by extracting the first 6 bits and checking, etc. which is no different than the loop. Also, not all bit lengths exist (9, 16, 17, 18, 19 & 29) so you would want to skip trying to check for values using those, which the current loop does nicely since it uses a table of bit lengths.
The other hash table option would be to generate a massive "rainbow table" like structure where all combinations of bits beyond the significant ones up to your key length are also hashed. In that case, a single 5 bit entry in the existing table would include hashed entries for every possible combination of extra bits out to, say, the max of 30. No more looping through bit lengths to key into the table, but the size would be enormous. Note that I'm assuming all combinations would be unique, but I haven't given this much thought since it's obviously a non-starter.

The only thing I saw in the code that I thought could be improved was the repeated calculation of codeMax even though that value is essentially a constant for each entry in the decoding table. Expanding the table to be a 3-tuple that included codeMax would avoid that, but the cost there is likely very insignificant.

Mind you, I'm new to the code so I could be way off base here. Looking to see if you could clarify this a bit.

jbhensley on 25 Aug 2018

@geoffkizer thoughts?

danmosemsft on 25 Aug 2018

IMHO one of the best implementations of Huffman en/decoder is available here: https://github.com/Cyan4973/FiniteStateEntropy

Its written in C but it should be one of the best examples of Huffman implementations. It is in library together with Finite State Entropy en/decoder written using Asymmetric Numeral Systems (Huffman is special case of ASN). ASN is a separate story and I plan to submit API proposal to include zstd compression in .NET Core and will write more than..

4creators on 25 Aug 2018

👎1

The other hash table option would be to generate a massive "rainbow table" like structure where all combinations of bits beyond the significant ones up to your key length are also hashed.

It doesn't need to be a single, massive hash table. It can be a multi-level table. Take the first 8 bits (or 10, or whatever) and use it as the key to the first-level table. This results in either a decoded value and a bit length, or a reference to a second-level table -- if so, take the next 8 bits and index into the second table, etc.

Choosing the best key size for each table involves a tradeoff -- smaller key sizes mean smaller tables but more lookups, and vice-versa. You can tune this as appropriate. With 8 bits, your main table is only 512 bytes (256 entries, 1 byte for decoded value, 1 byte for either bit length or next table id), and the most common values (including all alphanumerics) will be decoded in a single table lookup.

geoffkizer on 26 Aug 2018

I explored that approach briefly as well, but it just seemed like we were back to looping again. A mock up to see if there is any real perf differential would be the way to go here. I might be able to take a crack at that next week.

jbhensley on 26 Aug 2018

There's no looping if the value is encoded in 8 bits or less. (Or whatever key size you choose.) And this covers the most common values, including all alphanumerics and common punctuation.

geoffkizer on 27 Aug 2018

Agreed, encoded values up to the chosen key size would be found immediately. Couple more questions:

Is there a compiled list of common request headers you guys have that can be used for perf testing? If not, I can put together my own.
What would be the preference for constructing the hash table in a production environment? Just use a collection initializer so the object is always ready or construct on first use? I assume we'd want to sync lock in the cause of COFU?

jbhensley on 27 Aug 2018

Is there a compiled list of common request headers you guys have that can be used for perf testing? If not, I can put together my own.

Not really.

I think a collection initializer is fine. If necessary, it's easy to switch this to a Lazy<T> later.

geoffkizer on 27 Aug 2018

The codeMax optimization THAT @jbhensley suggested has an immediate ~10% benefit in micro benchmark testing:

 Method |     Mean |    Error |   StdDev | Scaled | ScaledSD |
------- |---------:|---------:|---------:|-------:|---------:|
    New | 86.32 ns | 1.864 ns | 1.915 ns |   0.89 |     0.05 |
   Core | 97.49 ns | 1.950 ns | 5.069 ns |   1.00 |     0.00 |

Here's the core change, note the manually-computed additional deltaLength member:

private static readonly (int codeLength, int deltaLength, int[] codes)[] s_decodingTable = new[]
{
    (05, 0, new[] { 48, 49, 50, 97, 99, 101, 105, 111, 115, 116 }), // deltaLength must be 0 for related optimizations to work
    (06, 1, new[] { 32, 37, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57, 61, 65, 95, 98, 100, 102, 103, 104, 108, 109, 110, 112, 114, 117 }),
    (07, 1, new[] { 58, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 106, 107, 113, 118, 119, 120, 121, 122 }),
    (08, 1, new[] { 38, 42, 44, 59, 88, 90 }),
    (10, 2, new[] { 33, 34, 40, 41, 63 }),
    (11, 1, new[] { 39, 43, 124 }),
    (12, 1, new[] { 35, 62 }),
    (13, 1, new[] { 0, 36, 64, 91, 93, 126 }),
    (14, 1, new[] { 94, 125 }),
    (15, 1, new[] { 60, 96, 123 }),
    (19, 4, new[] { 92, 195, 208 }),
    (20, 1, new[] { 128, 130, 131, 162, 184, 194, 224, 226 }),
    (21, 1, new[] { 153, 161, 167, 172, 176, 177, 179, 209, 216, 217, 227, 229, 230 }),
    (22, 1, new[] { 129, 132, 133, 134, 136, 146, 154, 156, 160, 163, 164, 169, 170, 173, 178, 181, 185, 186, 187, 189, 190, 196, 198, 228, 232, 233 }),
    (23, 1, new[] { 1, 135, 137, 138, 139, 140, 141, 143, 147, 149, 150, 151, 152, 155, 157, 158, 165, 166, 168, 174, 175, 180, 182, 183, 188, 191, 197, 231, 239 }),
    (24, 1, new[] { 9, 142, 144, 145, 148, 159, 171, 206, 215, 225, 236, 237 }),
    (25, 1, new[] { 199, 207, 234, 235 }),
    (26, 1, new[] { 192, 193, 200, 201, 202, 205, 210, 213, 218, 219, 238, 240, 242, 243, 255 }),
    (27, 1, new[] { 203, 204, 211, 212, 214, 221, 222, 223, 241, 244, 245, 246, 247, 248, 250, 251, 252, 253, 254 }),
    (28, 1, new[] { 2, 3, 4, 5, 6, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 220, 249 }),
    (30, 2, new[] { 10, 13, 22, 256 })
};

grant-d on 29 Aug 2018

@grant-d What code are you using to test this?

geoffkizer on 29 Aug 2018

Code optimization, without changing the algorithm, gets us to ~23% improvement in the micro-bench.
(This needs real-world testing to get a measure of any true improvement)

    Method |     Mean |     Error |    StdDev | Scaled | ScaledSD | Allocated |
---------- |---------:|----------:|----------:|-------:|---------:|----------:|
 Optimized | 71.61 ns | 0.8184 ns | 0.7656 ns |   0.77 |     0.02 |       0 B |
    Legacy | 93.17 ns | 1.7961 ns | 2.3977 ns |   1.00 |     0.00 |       0 B |

For example, changed this:

            for (var i = 0; i < s_decodingTable.Length && s_decodingTable[i].codeLength <= validBits; i++)
            {
                (var codeLength, var codes) = s_decodingTable[i];

                if (i > 0)
                {
                    codeMax <<= codeLength - s_decodingTable[i - 1].codeLength;
                }

                codeMax += codes.Length;

To this:

            for (var i = 0; i < s_decodingTable.Length; i++)
            {
                // deltaLength is precomputed codeMax delta
                var (codeLength, deltaLength, codes) = s_decodingTable[i];

                // Move check out of for-loop to leverage cached value for codeLength
                if (codeLength > validBits)
                    break;

                // Mitigate the if (i > 0) branch by ensuring s_decodingTable[0].deltaLength==0
                codeMax = (codeMax << deltaLength) + codes.Length;

grant-d on 29 Aug 2018

@geoffkizer see HuffmanBench.cs and related units in this PR

grant-d on 29 Aug 2018

Another example.
This:

                var next = (uint)(src[i] << 24 + lastDecodedBits);
                next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
                next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
                next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0)

Can be optimized as follows, since if the first src.Length branch is false, then there's no way the 2nd & 3rd branches can be true:

                var next = (uint)(src[i] << 24 + lastDecodedBits);
                if (i + 1 < src.Length)
                {
                    next |= (uint)(src[i + 1] << 16 + lastDecodedBits);

                    if (i + 2 < src.Length)
                    {
                        next |= (uint)(src[i + 2] << 8 + lastDecodedBits);

                        if (i + 3 < src.Length)
                            next |= (uint)(src[i + 3] << lastDecodedBits);
                    }
                }

grant-d on 29 Aug 2018

I have code for using the multi-level table and am currently working on cleaning it up a bit and testing. Decoding becomes simple:

```C#
public static int Decode(uint data, int validBits, out int decodedBits)
{
int byteNumber = 3; // grab the most significant byte
uint virtualDictionaryMask = 0; // used to key into different "virtual" dictionaries
DecodingTableEntry entry;
do
{
// extract the working byte
uint workingByte = data >> (byteNumber * 8) & 0xFF;

    // apply virtual dictionary bitmask
    workingByte |= virtualDictionaryMask;

    // key into the dictionary
    if (!s_decodingDictionary.TryGetValue(workingByte, out entry))
        break;  // we should either get an entry or bitmask for the next virtual dictionary. if we get neither then
            // the bit pattern is not consistent with any entry we have

    // if we get a length back then we have found the decoded value
    if (entry.BitLength > 0)
    {   
        if (entry.BitLength > validBits)
            break;  // we only found a value by incorporating bits beyond the the valid remaining length of the data stream

        decodedBits = entry.BitLength;
        return (int)entry.DecodedValue;
    }
    // otherwise, we have found a mask that lets us key into the next virtual dictionary
    else
    {
        virtualDictionaryMask = entry.DecodedValue;
        byteNumber--;
    }

} while (entry.BitLength == 0);

// no luck. signal to caller that we could not decode
decodedBits = 0;
return -1;

}
``Building the table is a bit ugly and will likely need to be done with aLazythat calls a method to construct it since the total size is 3840 entries. We could still use a collection initializer and surround it with a#region` I suppose. Either way, I'm a little concerned about the memory footprint.

Since there would be 15 tables in total, I took the approach of using bitmasks on the key value of a single table to create several "virtual" ones within it. The "root" table has no mask, so you initially key in with the value you want to decode. If you get a bit length back then you have the decoded value. Otherwise, you have a bitmask to apply to your next byte when keying in again. The bitmask represents the next "vitual" table.

jbhensley on 29 Aug 2018

You have more total entries (3840) than necessary.

The fourth-level tables only need 6 bits to index them (max encoded length is 30 bits). That by itself cuts the total number of entries in half, if my math is correct.

Additionally, at least some intermediate tables don't need all 8 bits either. For example, all encodings that start with 11111110 are exactly 10 bits in length, so the second level table here can be exactly 4 entries instead of the 256 you'd need if you indexed with a full 8 bits.

Dealing with all this will complicate the code and isn't necessary to get decent perf results.

geoffkizer on 30 Aug 2018

True, the storage could be optimized. Right now, however, the perf results are not looking very promising. Oddly enough, my test with a pre-computed codeMax is worse which I cannot explain.

Test set consists of 792 lines of actual HTTP request header data. The file is read and Huffman encoded into bytes as part of the setup and then each decoding method gets it's own crack at it, while verifying that correct values are being decoded. I kind of expected something more dramatic than this.

|                                   Method |          Mean |         Error |        StdDev |   Scaled | ScaledSD |   Gen 0 |   Gen 1 | Allocated |
|----------------------------------------- |--------------:|--------------:|--------------:|---------:|---------:|--------:|--------:|----------:|
|                   BuildHuffmanDictionary | 112,665.66 ns | 2,241.0968 ns | 2,096.3232 ns | 6,667.28 |   119.88 | 31.8604 | 10.6201 |  201624 B |
|                   ProcessHeadersWithLoop |      16.90 ns |     0.0089 ns |     0.0074 ns |     1.00 |     0.00 |       - |       - |       0 B |
| ProcessHeadersWithLoopAndComputedCodeMax |      17.39 ns |     0.3766 ns |     0.5637 ns |     1.03 |     0.03 |       - |       - |       0 B |
|             ProcessHeadersWithDictionary |      16.88 ns |     0.3689 ns |     0.3451 ns |     1.00 |     0.02 |       - |       - |       0 B |

For reference, my version of the pre-computed codeMax treats it as a complete constant:
```C#
public static readonly (int codeLength, int codeMax, int[] codes)[] s_decodingTable2 = new[]
{
(5, 10, new[] { 48, 49, 50, 97, 99, 101, 105, 111, 115, 116 }),
(6, 46, new[] { 32, 37, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57, 61, 65, 95, 98, 100, 102, 103, 104, 108, 109, 110, 112, 114, 117 }),
(7, 124, new[] { 58, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 106, 107, 113, 118, 119, 120, 121, 122 }),
(8, 254, new[] { 38, 42, 44, 59, 88, 90 }),
(10, 1021, new[] { 33, 34, 40, 41, 63 }),
(11, 2045, new[] { 39, 43, 124 }),
(12, 4092, new[] { 35, 62 }),
(13, 8190, new[] { 0, 36, 64, 91, 93, 126 }),
(14, 16382, new[] { 94, 125 }),
(15, 32767, new[] { 60, 96, 123 }),
(19, 524275, new[] { 92, 195, 208 }),
(20, 1048558, new[] { 128, 130, 131, 162, 184, 194, 224, 226 }),
(21, 2097129, new[] { 153, 161, 167, 172, 176, 177, 179, 209, 216, 217, 227, 229, 230 }),
(22, 4194284, new[] { 129, 132, 133, 134, 136, 146, 154, 156, 160, 163, 164, 169, 170, 173, 178, 181, 185, 186, 187, 189, 190, 196, 198, 228, 232, 233 }),
(23, 8388597, new[] { 1, 135, 137, 138, 139, 140, 141, 143, 147, 149, 150, 151, 152, 155, 157, 158, 165, 166, 168, 174, 175, 180, 182, 183, 188, 191, 197, 231, 239 }),
(24, 16777206, new[] { 9, 142, 144, 145, 148, 159, 171, 206, 215, 225, 236, 237 }),
(25, 33554416, new[] { 199, 207, 234, 235 }),
(26, 67108847, new[] { 192, 193, 200, 201, 202, 205, 210, 213, 218, 219, 238, 240, 242, 243, 255 }),
(27, 134217713, new[] { 203, 204, 211, 212, 214, 221, 222, 223, 241, 244, 245, 246, 247, 248, 250, 251, 252, 253, 254 }),
(28, 268435455, new[] { 2, 3, 4, 5, 6, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 220, 249 }),
(30, 1073741824, new[] { 10, 13, 22, 256 })
};

And the loop decoder:

```C#
public static int DecodeWithLoopAndCodeMax(uint data, int validBits, out int decodedBits)
{
    for (int i = 0; i < s_decodingTable2.Length && s_decodingTable2[i].codeLength <= validBits; i++)
    {
        (int codeLength, int codeMax, int[] codes) = s_decodingTable2[i];

        int mask = int.MinValue >> (codeLength - 1);
        long masked = (data & mask) >> (32 - codeLength);

        if (masked < codeMax)
        {
            decodedBits = codeLength;
            return codes[codes.Length - (codeMax - masked)];
        }
    }

    decodedBits = 0;
    return -1;
}

jbhensley on 30 Aug 2018

Can you link to the full code?

For one thing, you really should run more iterations. All your measured times are < 20ns, which is super small. I'd run enough iterations to have times above 1s, if possible.

geoffkizer on 30 Aug 2018

I'll get the test project up and link, but that will probably be tomorrow.

jbhensley on 30 Aug 2018

Good idea @jbhensley, amongst other tricks that codeMax calc got it ~35% faster than the original.
I am also interested in your test cases, mine are trivial.
Does ProcessHeadersWithLoop in your tests reflect the original, unchanged, code?

                Method |     Mean |     Error |   StdDev | Scaled | ScaledSD | Allocated |
---------------------- |---------:|----------:|---------:|-------:|---------:|----------:|
 OptimizedCodeAndTable | 49.29 ns | 0.9742 ns | 1.628 ns |   0.65 |     0.02 |       0 B |
          OriginalCode | 75.45 ns | 1.1256 ns | 1.053 ns |   1.00 |     0.00 |       0 B |

grant-d on 30 Aug 2018

Test project is up

I'm new to corefx, so I'm struggling a little with the benchmarking. I'll take any pointers you have in regards to that or the code in general.

jbhensley on 30 Aug 2018

Also, I trimmed duplicate line items out of my headers.txt file so it's a bit smaller now.

jbhensley on 30 Aug 2018

@jbhensley I have created a fork of your project, added my code to it, then cleaned it all up a bit. I have changed the layout of your benchmark, since the enum approach did not allow you to optimize the outer loop. So each algorithm is just a duplicated Huffman<foo>.cs file in its entirety now. I have also added your headers.txt data to all the benchmarks.

Here's the latest results (see the README for details on algorithms).

You are right that the IDictionary approach is slower, I think it's probably due to the overhead of hashing and branching. Perhaps a hard-coded switch for a small subset of high-frequency values might work?

  Method |       Mean |    Error |   StdDev | Scaled | ScaledSD | Allocated |
-------- |-----------:|---------:|---------:|-------:|---------:|----------:|
 DictOpt | 1,281.3 ns | 24.27 ns | 22.70 ns |   1.41 |     0.03 |       0 B |
    Dict | 1,311.6 ns | 24.41 ns | 22.83 ns |   1.45 |     0.03 |       0 B |
 OrigOpt |   822.8 ns | 16.08 ns | 15.79 ns |   0.91 |     0.02 |       0 B |
    Orig |   907.3 ns | 11.78 ns | 10.44 ns |   1.00 |     0.00 |       0 B |

grant-d on 31 Aug 2018

Thanks. I'll take a look at what you've done with this.

My benchmarking probably wasn't as valid, but I was pretty much coming up with the same result. The looping implementation (either original or with codeMax pre-calculated) is able to return the top 10 most frequently used values on the first pass through the loop with only a few bit shift operations and numeric calcs/comparisons. The second pass gets you 26 more. I'm betting that hashing and keying isn't able to keep up with how quickly the loop is able to return the first several levels of most frequently used values. Maybe dictionary is faster for the less frequently used where the loop has to make more iterations, but it's not able to gain enough on those to make up for what it loses on the others.

jbhensley on 31 Aug 2018

Thanks for posting the code.

Don't use a dictionary for the table lookup. Just use an array. It's much cheaper.

geoffkizer on 31 Aug 2018

I think that might be the ticket. I’ll give that it a shot tomorrow. Should simplify the code a bit as well.

jbhensley on 31 Aug 2018

Also -- In case you're not already doing this, you should do a warmup run before the real run, in the same process. Otherwise you may be including things like JIT costs in the run.

And I still think you should do many more iterations. The times above are in the 1ms range. Do 1000x more iterations and get it in the 1s range.

geoffkizer on 31 Aug 2018

Or maybe you are already doing lots of iterations? I'm not familiar with the benchmark framework here, so not quite sure how it's executing the code.

geoffkizer on 31 Aug 2018

Looks like the framework does all of the warmup for you. I tried giving it a large iteration count and it appears to be running each benchmark method more times, but the average runtime of any given iteration is still small. I could probably bring the numbers up if I added iterations in the method itself with a for loop instead of telling the framework to run it more.

https://github.com/dotnet/BenchmarkDotNet

jbhensley on 31 Aug 2018

Unrelated thought:

The fact that the per- symbol Decode method is a separate method isn't ideal. It's only called one place and it's probably large enough that the JIT is not inlining it. MethodImplOptions.AggressiveInlining might help here. Eventually we should probably just fold this into the main Decode routine.

geoffkizer on 31 Aug 2018

Re this code that @grant-d mentioned above:

                var next = (uint)(src[i] << 24 + lastDecodedBits);
                next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
                next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
                next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0)

I think we can improve this. We are re-reading all 4 bytes from the source buffer each time. We shouldn't need to do this. We should just shift the current value (in next) by the number of decoded bits, and then read byte by byte from the source buffer into next until we've got at least 30 bits.

So initially, we would read the first 4 bytes into next, and set validBits = 32. Then at the end of each loop iteration, we shifit next by decodedBits, subtract decodedBits from validBits, and read from the source buffer until validBits is >= 30.

This gets tricky because the smallest encoded value is 5 bits, so it's possible that decodedBits is 5 and validBits is 32, meaning validBits becomes 27 and we need to read another byte to get it over 30. Unfortunately this means we need 33 bits. Changing next from a uint to a ulong is a quick fix for this, and shouldn't actually matter in practice on a 64 bit machine.

In a table-based lookup scheme, we don't actually need validBits to be >= 30; we just need it to be >= whatever the next table lookup size is (commonly 8). So that would simplify this as well.

geoffkizer on 31 Aug 2018

Actually, looking at this code, I don't quite understand how it works in certain cases.

Imagine that lastDecodedBits is 7 after the last loop iteration. Then we are only actually going to read (32-7)=25 bits from the source buffer. If the next encoded value has more than 25 bits, it seems like this logic is broken. Am I missing something here?

geoffkizer on 31 Aug 2018

This points out that we need much better tests for this code :)

We actually don't currently have any tests. At a minimum we should steal the ASP.NET tests. And probably add a bunch of new cases as well. This is all pretty tricky logic and we need thorough tests to ensure we're doing the right thing here.

We'll need these tests before we can check in any perf improvements here.

geoffkizer on 31 Aug 2018

On my fork linked above I already included the ASP/Kestrel units. I tried inlining the second method but it made little difference, I will have another look. The buffered read is worth trying too

grant-d on 31 Aug 2018

It might not matter much. Or it might already be inlined. A profile would tell for sure.

geoffkizer on 31 Aug 2018

On my PR linked above I already included the ASP/Kestrel units.

I saw that. We need to add these to corefx.

geoffkizer on 31 Aug 2018

There is an issue with lastDecodedBits as I described above.

Fixed in dotnet/corefx#32043. I also added a bunch of tests in that PR. Clearly the existing ones weren't sufficient...

geoffkizer on 31 Aug 2018

Me and @grant-d have been messing about with this, and have arrived at a completely different algorithm that should be around O(N) (where N = bits). It's probably at the pit of optimization unless a vectorized algorithm is used - which is likely overkill.

| Method | Mean | Error | StdDev | Scaled | Allocated |
|-------- |---------:|---------:|---------:|-------:|----------:|
| Jump | 606.3 ns | 2.460 ns | 1.920 ns | 0.73 | 0 B |
| OrigOpt | 755.7 ns | 6.260 ns | 4.887 ns | 0.91 | 0 B |
| Orig | 833.2 ns | 2.639 ns | 2.203 ns | 1.00 | 0 B |

The idea is to make a binary tree, conceptually this:

c# class Node { public Node Zero; public Node One; public byte? ZeroByte; public byte? OneByte; }

You traverse the tree, feeding it one bit at a time until either ZeroByte or OneByte contain a value - at which point you emit the byte and reset to the root of the tree. The actual implementation compresses the tree into an array (as binary trees can be represented in flat arrays) of ushort tuples of length two (zero, one). If the MSB of the ushort is set it signifies a terminal/byte (terminate, emit and return to root); otherwise, it indicates the next index in the array to visit.

jcdickinson on 31 Aug 2018

Interesting. I still suspect the table lookup approach will be faster, though. It's similar to your algorithm, except that you basically feed it a byte at a time instead of a single bit.

geoffkizer on 31 Aug 2018

I included the dictionary/lookup table in the benchmark, it (bold) takes longer than the original:

| Method | Mean | Error | StdDev | Median | Scaled | ScaledSD | Allocated |
|-------- |-----------:|---------:|---------:|-----------:|-------:|---------:|----------:|
| Jump | 759.1 ns | 15.12 ns | 36.81 ns | 744.6 ns | 0.70 | 0.03 | 0 B |
| DictOpt | 1,439.8 ns | 28.75 ns | 42.15 ns | 1,423.1 ns | 1.32 | 0.04 | 0 B |
| Dict | 1,426.7 ns | 18.98 ns | 15.85 ns | 1,425.6 ns | 1.31 | 0.02 | 0 B |
| OrigOpt | 921.9 ns | 18.28 ns | 41.26 ns | 908.7 ns | 0.85 | 0.04 | 0 B |
| Orig | 1,087.5 ns | 18.37 ns | 15.34 ns | 1,083.4 ns | 1.00 | 0.00 | 0 B |

The problem is that the dictionary is hit (include negative tests) for every bit in a symbol after the 4th (5th?) bit. The DFA/Jump mechanism avoids this - every test in the tree is a positive test unless an invalid/terminal symbol is encountered.

jcdickinson on 31 Aug 2018

@jbhensley unless you can find a faster algorithm, feel free to take the jump table and get your first netfx PR.

jcdickinson on 31 Aug 2018

I just added the array implementation to the test project. In particular this commit.

Benchmark puts it very close to jump, but slightly behind.

|  Method |       Mean |      Error |     StdDev |     Median | Scaled | ScaledSD | Allocated |
|-------- |-----------:|-----------:|-----------:|-----------:|-------:|---------:|----------:|
|    Jump |   617.8 ns |  9.2048 ns |  8.1598 ns |   613.9 ns |   0.65 |     0.03 |       0 B |
| OrigOpt |   796.7 ns |  5.5610 ns |  4.9297 ns |   798.0 ns |   0.84 |     0.03 |       0 B |
| DictOpt | 1,174.0 ns |  1.4705 ns |  1.3035 ns | 1,173.4 ns |   1.24 |     0.05 |       0 B |
|    Dict | 1,213.7 ns | 24.2705 ns | 43.7647 ns | 1,207.8 ns |   1.28 |     0.07 |       0 B |
|    Orig |   950.6 ns | 18.8892 ns | 37.7237 ns |   925.2 ns |   1.00 |     0.00 |       0 B |
|   Array |   650.4 ns |  0.5128 ns |  0.4546 ns |   650.5 ns |   0.69 |     0.03 |       0 B |

jbhensley on 31 Aug 2018

I haven't touched the overload public static int Decode(byte[] src, int offset, int count, byte[] dst) yet. Looking at that now. The array method processes one byte at a time, so pushing 4 bytes together with

uint next = (uint)(src[i] << 24 + lastDecodedBits);
                next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
                next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
                next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0);

and then splitting them back up again seems like extra work. I'll see if the array implementation can be optimized by refactoring the calling overload.

jbhensley on 31 Aug 2018

@jbhensley sent you a PR with microoptimizations, Array is now _way_ faster - nicely done!

I'm not sure if the decode method with ints is used at all (it is public, so I assumed it was) - the jump mechanism ignores it completely for the byte array overload. Maybe leave it as is and look at the byte array overload independently.

jcdickinson on 31 Aug 2018

Nice job. With some optimizations on Array I get the following results:

  Method |     Mean |     Error |    StdDev | Scaled | Allocated |
-------- |---------:|----------:|----------:|-------:|----------:|
    Jump | 605.8 ns |  4.494 ns |  3.752 ns |   0.68 |       0 B |
 OrigOpt | 739.6 ns | 10.508 ns |  9.830 ns |   0.84 |       0 B |
    Orig | 884.4 ns | 12.059 ns | 10.070 ns |   1.00 |       0 B |
   Array | 562.3 ns |  4.022 ns |  3.565 ns |   0.64 |       0 B |

grant-d on 31 Aug 2018

Nice. I'll take a look at your changes.

jbhensley on 31 Aug 2018

Correction, even better :-)

  Method |     Mean |     Error |    StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
    Jump | 643.3 ns | 12.645 ns | 12.986 ns |   0.72 |     0.02 |       0 B |
 OrigOpt | 739.5 ns | 15.968 ns | 15.683 ns |   0.83 |     0.02 |       0 B |
    Orig | 889.0 ns |  8.995 ns |  8.414 ns |   1.00 |     0.00 |       0 B |
   Array | 498.6 ns |  3.865 ns |  3.615 ns |   0.56 |     0.01 |       0 B |

My changes are in the my fork. Please pull my latest (if you want), then I can start working on yours directly.

grant-d on 31 Aug 2018

Nice work. Great to see the results here!

Re this code:

                int value = s_decodingArray[arrayIndex, workingByte];
                 // if the value is positive then we have a pointer into the encoding table
                if (value > -1)
                {
                    var entry = s_encodingTable[value];
                    if (entry.bitLength > validBits)
                        break;  // we only found a value by incorporating bits beyond the the valid remaining length of the data stream
                     decodedBits = s_encodingTable[value].bitLength;
                    return value;   // the index is also the value
                }

You don't need to use s_encodingTable here at all. Just put the bitLength into the decoding table and you can just do one lookup on it.

geoffkizer on 31 Aug 2018

Good point. I'll make that change.

I still want to take a crack at integrating public static int Decode(byte[] src, int offset, int count, byte[] dst) and public static int Decode(uint data, int validBits, out int decodedBits) or at least rewiring the former to take advantage of how the latter is now processing.

jbhensley on 31 Aug 2018

Re the construction of next on every loop iteration: I agree, there's probably a win here. See my comment above for suggestions: https://github.com/dotnet/corefx/issues/31751#issuecomment-417535191

geoffkizer on 31 Aug 2018

👍1

@grant-d I think I bungled the merge from your fork since I had pulled @jcdickinson's changes earlier and that created conflicts yours. Accidentally pushed to you while trying to resolve. I assume you can just roll that back. Sorry.

jbhensley on 31 Aug 2018

@grant-d Nevermind, I got it worked out now. Sorry for the confusion.

jbhensley on 31 Aug 2018

👍1

@jbhensley I fixed a bug in my changes to Array - it wasn't passing all units.
Also a couple more optimizations on your code.
Fixes pushed to my fork.

  Method |     Mean |     Error |    StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
    Jump | 619.8 ns | 12.096 ns | 19.185 ns |   0.71 |     0.02 |       0 B |
 OrigOpt | 736.1 ns | 10.946 ns |  9.141 ns |   0.84 |     0.01 |       0 B |
    Orig | 876.1 ns |  3.694 ns |  3.275 ns |   1.00 |     0.00 |       0 B |
   Array | 472.2 ns |  4.820 ns |  4.025 ns |   0.54 |     0.00 |       0 B |

grant-d on 31 Aug 2018

@geoffkizer @jbhensley

You don't need to use s_encodingTable here at all. Just put the bitLength into the decoding table and you can just do one lookup on it.

I tried that, it somehow ended up being slower.

jcdickinson on 31 Aug 2018

@jbhensley I have submitted a PR to your repo.
Perf is now 0.47, and all units pass.

grant-d on 1 Sep 2018

Awesome. I'm working on refactoring both Decode overloads into one, but will probably have to pick it back up after the holiday weekend.

jbhensley on 1 Sep 2018

I tried that, it somehow ended up being slower.

I'm surprised by that. Can you post the code?

geoffkizer on 1 Sep 2018

I'm enjoying following this but I'm curious @geoffkizer whether you expect time spent in this code to be material to real scenarios?

danmosemsft on 1 Sep 2018

🎉1

@geoffkizer In addition to your comments regarding this section:

while (i < count)
            {
                uint next = (uint)(src[i] << 24 + lastDecodedBits);
                next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
                next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
                next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0);

There's another issue to address. If the caller passes in a count less than the source buffer size, this code may read from bytes that it's not technically supposed to. If the caller passes in some arbitrarily large buffer with a count of 2, I assume we should honor that and not load from bytes i + 2 and i + 3.

jbhensley on 1 Sep 2018

@danmosemsft I can't speak for @geoffkizer, but our efforts appear to be turning up flaws in the current implementation logic, aside from the perf aspect. See my immediately preceding comment

jbhensley on 1 Sep 2018

@danmosemsft There's some reason to believe it could matter, but I really don't know if it does matter in real scenarios. Our HTTP2 implementation is not ready for real perf analysis.

geoffkizer on 1 Sep 2018

@jbhensley

If the caller passes in a count less than the source buffer size, this code may read from bytes that it's not technically supposed to.

Yeah, good catch! It seems like it would be better to use Span here. That would both simplify the code a bit and enforce the bounds.

geoffkizer on 1 Sep 2018

Ok, I have implemented a unified Decode that integrates both overloads. Check it out in the test project here.

I believe that I have eliminated all of the issues we were discussing. My initial objective was creating something that worked correctly and was easy (as possible) to understand. After that, I tried to eliminate a few variables and clean up. Also tried to include enough comments to help the next guy looking at it because this can be a bit difficult to follow.

The decoding array is being built with a Lazy<T> right now, but that can be easily changed to a collection initializer.

Also, I added a test that loops 1 million times. Each pass, it generates an array of bytes of random length up to 255. It populates the bytes with random values, encodes them, decodes them and verifies that you get back with you put in. Figured that might help us turn up oddball combinations/edge cases.

Tests still needed to ensure it handles invalid inputs correctly.

jbhensley on 3 Sep 2018

@geoffkizer Your thoughts on the rewrite and how far we should go trying to micro-optimize this? In the trade-off between optimization and readability/maintainability and I tend to error on the side of maintainability unless there's a proven reason to go otherwise. Also, is there any concern for the amount of memory we're using for the decoding array? Maybe put it up as a formal PR to enlist more feedback?

jbhensley on 4 Sep 2018

@jbhensley can you maybe make that a ReadOnlySpan<byte> instead - it's more intentional.

Also, maybe use short[,] instead of int[,] - gets us down to ~7.5k runtime overhead (still a concern)

Can you remove the Lazy property (s_decodingArray) entirely and evaluate it just outside the loop in both method bodies instead.

var decodingArray = s_decodingArrayLoader.Value; // <-- Cache Lazy.Value
var srcSpan = new ReadOnlySpan<byte>(src, offset, count); // <--- Note RoS
    ...
    while (sourceIndex < srcSpan.Length)
    {
        ...
        var decodedValue = decodingArray[arrayIndex, workingByte]; // <--- Note cached Lazy.Value

The latter changes gets us down to 0.36 from from 0.43, which is close to be a 3x improvement!

Now:

 Method |     Mean |    Error |   StdDev | Scaled | Allocated |
------- |---------:|---------:|---------:|-------:|----------:|
   Orig | 904.3 ns | 7.200 ns | 6.012 ns |   1.00 |       0 B |
  Array | 322.0 ns | 2.072 ns | 1.837 ns |   0.36 |       0 B |

Was:

  Method |     Mean |     Error |    StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
    Orig | 880.9 ns | 12.820 ns | 11.365 ns |   1.00 |     0.00 |       0 B |
   Array | 378.2 ns |  2.610 ns |  2.441 ns |   0.43 |     0.01 |       0 B |

grant-d on 5 Sep 2018

Awesome work @grant-d. Big gains for simple changes. I'll incorporate this and update.

jbhensley on 5 Sep 2018

No problem. Another small change.

for (int i = 0; i < 4; i++)
{
     uint workingByte = data >> (8 * (3 - i)) & 0xFF; // 24, 16, 8, 0

can be expressed more succinctly (& without the mult) as

for (byte i = 24; i >= 0; i -= 8) // 24, 16, 8, 0
{
    uint workingByte = data >> i & 0xFF;

Similar change on the other override's loop

grant-d on 5 Sep 2018

Your thoughts on the rewrite and how far we should go trying to micro-optimize this? In the trade-off between optimization and readability/maintainability and I tend to error on the side of maintainability unless there's a proven reason to go otherwise.

Readability and maintainability are critical.

As mentioned above, we don't really know how much of an impact Huffman decoding perf has in real world scenarios. As such, micro-optimization doesn't make sense right now. The focus should be on changes that make the code obviously better (for perf or correctness) without introducing additional complexity.

We can always revisit and micro-optimize later, if we get data that shows it's worth it.

geoffkizer on 5 Sep 2018

@geoffkizer what about the 7.5k static runtime table size. It seems a lot for a deep internal component

grant-d on 5 Sep 2018

As an aside, it's debatable whether maintainability is affected by unrolling here. This code:

for (var i = 0; i < 4; i++)
{
    <stuff>
}

becomes this:

for (var i = 0; i < 4; i++)
{
    Refactored(i, ...);
}

[AggressiveInline]
private static Refactored(int i, …) => <stuff>;

becomes this:

Refactored(0, ...);
Refactored(1, ...);
Refactored(2, ...);
Refactored(3, ...);

[AggressiveInline]
private static Refactored(int i, …) => <stuff>;

That being said, I do agree that it's premature to do this right now.

grant-d on 5 Sep 2018

Just to be clear, the Decode method as it currently stands is below in full. There is only one now, although a separate overload still exists in the test project.

I also share the concern regarding memory consumption. We can cut that in half by changing the decoding table to a byte array (currently is a short array), but that requires a separate lookup from the encoding table to get the bit length since byte is not big enough to carry both the decoded value and bit length in one.

Other than that, it feels like we are at the end of this. Anything else or should we offer up a PR?

```C#
public static int Decode(byte[] src, int offset, int count, byte[] dst)
{
// input validation
if (src == null || count == 0)
return 0;

if (offset < 0)
    throw new ArgumentOutOfRangeException(nameof(offset));
if (offset >= src.Length)
    throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
    throw new ArgumentOutOfRangeException(nameof(count));
if (count > src.Length - offset)
    throw new ArgumentOutOfRangeException(nameof(count));
if (dst == null)
    throw new ArgumentNullException(nameof(dst));

// let's narrow thing down to just the part of the source buffer that we've been asked to decode
var sourceSpan = new ReadOnlySpan<byte>(src, offset, count);
int sourceIndex = 0;
int destinationIndex = 0;
int bitOffset = 0;              // symbol bit patterns do not necessarily align to byte boundaries. need to keep track of our offset within a byte
var decodingArray = s_decodingArrayLoader.Value;
while (sourceIndex < sourceSpan.Length)
{   
    int arrayIndex = 0;         // index into the decoding array
    int decodedBits = 0;

    // decoding a symbol will take a maximum of four bytes (longest symbol is represented by a 30 bit pattern)
    for (int i = 0; i <= 24; i += 8)
    {
        // grab the next byte and push it over to make room for bits we have to borrow
        byte workingByte = (byte)(sourceSpan[sourceIndex] << bitOffset);
        // borrow some bits if we need to
        if (bitOffset > 0)
            workingByte |= (byte)((sourceIndex < sourceSpan.Length - 1 ? sourceSpan[sourceIndex + 1] : 0) >> (8 - bitOffset));

        // key into array
        int decodedValue = decodingArray[arrayIndex, workingByte];

        // negative values are a pointer to the next decoding array
        if (decodedValue < 0)
        {
            // move to the next source byte and prepare to key in again
            sourceIndex++;
            arrayIndex = -decodedValue;
        }
        else
        // we have decoded a character
        {
            decodedBits = decodedValue >> 8;    // length is in the second LSB
            decodedValue &= 0xFF;               // value is in the LSB


            // Destination is too small.
            if (destinationIndex == dst.Length)
                throw new HuffmanDecodingException();

            // store the decoded value and increment destinationIndex
            dst[destinationIndex++] = (byte)decodedValue;

            // if we decoded more bits than we are borrowing then advance to next byte
            if (decodedBits - i >= 8 - bitOffset)
                sourceIndex++;

            // calculate and re-align bitOffset
            bitOffset += decodedBits;
            bitOffset &= 7; // same as modulo 8

            break;
        }
    }

    // we checked four bytes did not come up with a valid bit pattern
    if (decodedBits == 0)
        throw new HuffmanDecodingException();

    // check for padding in the last byte before we loop back around and try to decode again
    if (sourceIndex == sourceSpan.Length - 1)
    {

        // see if all of the remaning bits are padding
        int paddingMask = (0x1 << (8 - bitOffset)) - 1;
        if ((sourceSpan[sourceIndex] & paddingMask) == paddingMask)
        {
            // if our bitOffset is zero then all of bits in this byte are set
            // "A padding strictly longer than 7 bits MUST be treated as a decoding error."
            // https://httpwg.org/specs/rfc7541.html#rfc.section.5.2
            if (bitOffset == 0)
                throw new HuffmanDecodingException();

            // otherwise (padding less than 8), just break out of the main loop because we are done
            break;
        }
    }
}

return destinationIndex;    // destinationIndex is incremented each time a value is decoded, so it also represents total decoded byte count

}
```

jbhensley on 5 Sep 2018

Thanks for checking. Only one small change - could you expand the 2nd line here into a full nested if instead. It's hard to parse, which makes maintaining and understanding it hard, especially since only the true case does anything useful.

if (bitOffset > 0)
    workingByte |= (byte)((sourceIndex < sourceSpan.Length - 1 ? sourceSpan[sourceIndex + 1] : 0) >> (8 - bitOffset));

grant-d on 5 Sep 2018

I don't see the link to the PR (?) but I noticed this code above (maybe not yours)
```c#
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset));
if (offset >= src.Length)
throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if (count > src.Length - offset)
throw new ArgumentOutOfRangeException(nameof(count));

The compiler and JIT do not coalesce these exceptions. It is less code if you write eg
```c#
    if (offset < 0 || offset >= src.Length || count < 0 || count > src.Length - offset)
        throw new ArgumentOutOfRangeException(nameof(count));

I see more of this pattern.

Also, many places we cast to uint to combine comparisons, eg., change if (offset < 0 || offset >= src.Length || count < 0 || count > src.Length - offset) to if ((uint)offset >= (uint)src.Length || (uint)count > (uint)(src.Length - offset))

danmosemsft on 5 Sep 2018

Since src.Length can't be negative, you can eliminate two of the compares using this pattern

C# if ((uint)offset >= (uint)src.Length) throw new ArgumentOutOfRangeException(nameof(offset)); if ((uint)count > (uint)(src.Length - offset)) throw new ArgumentOutOfRangeException(nameof(count));

saucecontrol on 5 Sep 2018

Thanks @danmosemsft and @saucecontrol, The PR is up here.

jbhensley on 5 Sep 2018

@AndyAyersMS is there already an issue for the JIT to do the "uint cast" trick above for free? It seems like the pattern would be fairly consistent (and so hopefully easy to recognize) - like offset < 0 || offset >= src.Length

danmosemsft on 6 Sep 2018

what about the 7.5k static runtime table size

This isn't a problem if it's needed, but it would be good to understand why the table is so large and whether it actually needs to be that large.

geoffkizer on 6 Sep 2018

dotnet/coreclr#16363

AndyAyersMS on 6 Sep 2018

To step back here for a minute:

There have been a lot of issues discussed above. I think these fall into the following categories:

(1) Correctness. This is the most important. I have a PR out for one correctness issue here: dotnet/corefx#32043. This PR includes a bunch of unit tests as well. Any additional changes will need to merge with this PR and pass the new tests.

Beyond that, there's at least one other correctness issue I'm aware of -- reading past count. This needs to get fixed as well. My suggestion here is to change Decode so it takes a ReadOnlySpan<byte>, which effectively makes it impossible to read beyond the end of the span.

Are there other correctness issues that were identified that I missed?

(2) Incremental perf improvements. There was some discussion of small changes that make the existing code better and would apply regardless of the actual decoding algorithm. I'm not sure what happened with these? If there are any changes that make sense here, we should make them before making big algorithmic changes. This ensures we have a reasonable baseline for comparing the different algorithmic approaches.

(3) Algorithmic changes. I think we've got some good data about the different approaches, but I don't know quite where all of this ended up. We need to understand the different choices here and the tradeoffs involved (e.g. added memory usage, table construction costs, etc).

Also, it would be good to get the benchmarking code added to the repo. This will be useful in the future to validate we are not regressing and to evaluate any future changes here.

geoffkizer on 6 Sep 2018

Additionally:

The tests I added in the PR linked above are reasonably comprehensive, but they are missing some invalid cases. In particular, we need to generate all cases of a truncated encoding and ensure that the decoder correctly rejects these cases.

geoffkizer on 6 Sep 2018

(2) - I optimized the original code, here's the results (0.84). Do you want me to send a separate PR for that?

  Method |     Mean |     Error |    StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
 CodeOpt | 736.1 ns | 10.946 ns |  9.141 ns |   0.84 |     0.01 |       0 B |
    Orig | 876.1 ns |  3.694 ns |  3.275 ns |   1.00 |     0.00 |       0 B |

grant-d on 6 Sep 2018

I agree that correctness can be fixed aside from any algorithmic change. Unit tests from dotnet/corefx#32043 have been incorporated and are passing. I'll be updating the PR shortly. There was one correction to the code that the tests turned up for invalid inputs.

For perf, it looks like @grant-d has managed to get the original code to scale at 0.84 as per above. The last test with the array method is at 0.36 as per here.

I'm not sure how we attempt to answer whether the trade-off of memory for speed is worth it in this scenario.

jbhensley on 6 Sep 2018

Before we take any PR that changes the algorithm here, we need to do a few things:

(1) Ensure we have comprehensive tests. My PR has a bunch of tests, and I think they are reasonably comprehensive for valid cases. But we need more comprehensive testing of invalid cases. This includes (a) incomplete final bit sequences and (b) EOS appearing anywhere in the encoding. I think that covers all the possible invalid sequences.

(2) Submit a PR for the benchmark code. We should review the code and ensure it's measuring useful things, and we should get it in the repo so that we can use it to validate future changes here.

geoffkizer on 6 Sep 2018

I can work on that. Hopefully have something before Monday.

jbhensley on 6 Sep 2018

One other thought on the algorithmic approach here:

The "jump" algorithm that @jcdickinson posted above could be modified to process more than a single bit at a time. In particular, 4 bits seems like a reasonable number to process in a single operation. This would require 16 different transitions per starting state (versus 2 for a single bit), and since there are 256 starting states, this is probably reasonable in terms of table size.

Since the minimum encoding size is 5 bits, there can be at most one decoded value emitted for every 4 bits walked, so for each transition we need to know (a) what the final tree state is after walking the 4 bits, and (b) whether a symbol was emitted or not.

geoffkizer on 8 Sep 2018

Maybe @jcdickinson and/or @grant-d can work on further optimization of the "jump" method.

I submitted PR dotnet/corefx#32190, which adds a test for emitting EOS in the middle of an otherwise valid sequence.

As for dotnet/corefx#32124, I feel like we should just close this one out as it's not clear at the moment which direction this is going to go and the open PR is probably just tying up other resources.

jbhensley on 9 Sep 2018

@jbhensley @geoffkizer I'm happy to attempt improvements on "jump" (really a DFA) to work over nibbles or bytes if there is interest in optimizing this further.

jcdickinson on 10 Sep 2018

@jcdickinson I think that would be interesting to try, yeah. I would guess that nibbles are the sweet spot; doing bytes seems like it would result in a really large table.

geoffkizer on 11 Sep 2018

Get Outlook for iOShttps://aka.ms/o0ukef

From: Geoff Kizer notifications@github.com
Sent: Monday, September 10, 2018 11:52 pm
To: dotnet/corefx
Cc: Subscribed
Subject: Re: [dotnet/corefx] HTTP2: Huffman decoding implementation is inefficient (#31751)

@jcdickinsonhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjcdickinson&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=wA86TdkooSDhY2a8MZ8INFZZdDB53AA%2F08TlSB2qqfM%3D&reserved=0 I think that would be interesting to try, yeah. I would guess that nibbles are the sweet spot; doing bytes seems like it would result in a really large table.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F31751%23issuecomment-420087149&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=oy%2FqDfvWbCOfzWULDyF8C4veMueSgGgVqPc3Vf6hoRo%3D&reserved=0, or mute the threadhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAdM6-KTNq21ZY6d3iuEFJTJBF0yBYijUks5uZu0igaJpZM4V7jpR&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=f0SNsMoz968YdbD2vg7j0ztao9UGYrE%2Bi%2F2vdlhXtYo%3D&reserved=0.

name,address,icon,phone,website,userComment,lat,lng,tags,gid,4id
"Keely Rafferty Makeup","Portishead House, 29 Nore Rd, Portishead, Bristol BS20 8NN, UK","generic","+44 7734 241255","http://www.keelyrafferty.com/","","51.485311","-2.767996","Https://goo.gl/photos/pMYTnCir","ChIJp1MULPzxcUgRp2Nahs9vnz0"
"Location D'Outils Gps Inc","1282 Av de la Gare, Mascouche, QC, J7K, Canada","generic","+1 (450) 417-1477","","","45.740653","-73.589199","Contacts",""
"Dan Freeman","26 Cobley Croft, Clevedon, United Kingdom","generic","+447538552769","","","51.482252","-2.787124","Contacts",""
"Location D'Outils Gps Inc","1282 Av de la Gare, Mascouche, QC, J7K, Canada","generic","+1 (450) 417-1477","","","45.740653","-73.589199","Contacts",""
"United Kingdom","United Kingdom","civic","","","","52.80110090809378","-1.636962890625","Likes","ChIJM8zfqhnwdUgRMXaz0VNl3ow"
"Tyntesfield House (NT)","Clevedon Rd (near Wraxall Hill), Bristol, BS48 1NT, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","Likes","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"United Kingdom","United Kingdom","civic","","","","52.80110090809378","-1.636962890625","Checkins","ChIJM8zfqhnwdUgRMXaz0VNl3ow"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Vue Cinema","Patchway, Bristol BS10 7SR, UK","movies","+44 871 224 0240","http://www.myvue.com","","51.52343949999999","-2.6040692","Checkins#Https://assets.contentstack.io","ChIJUYB5xIORcUgRBzedC4CSmiw"
"Burger King","The Venue, Merlin Rd, Patchway, Bristol BS10 7SR, UK","restaurant","+44 117 959 0712","http://www.burgerking.co.uk/","","51.52345959999999","-2.6010024","Checkins#Https://assets.contentstack.io","ChIJIV0CBsiRcUgR8wwBC_4vHVU"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.4373958","-2.865220799999999","Checkins#Https://assets.contentstack.io","ChIJlbwvRfXwcUgR2CkZghuzo30"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","Checkins#Https://assets.contentstack.io",""
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","Checkins#Https://assets.contentstack.io","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"The Albion","15 Bristol Rd. Portishead North Somerset BS20 6PZ","generic","","","","51.47631195397998","-2.7668539197926587","Checkins#Https://assets.contentstack.io",""
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","Likes",""
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.4373958","-2.865220799999999","Likes","ChIJlbwvRfXwcUgR2CkZghuzo30"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","Likes","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","Likes","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","What",""
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","Checkins",""
"Sainsbury's","Serbert Rd, Portishead, Bristol BS20 7GD, UK","shopping","+44 1275 750400","https://stores.sainsburys.co.uk/2252/portishead","","51.4839614","-2.7613376","Checkins","ChIJrUeuNojtcUgRDg9ZtuAVw80"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","Checkins","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","What",""
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","36 Hours In...Bath",""
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Checkins","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","Checkins","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Checkins","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","Checkins","ChIJ8QRHo4CRcUgR5c96odklLJE"
"B&M Home Store","3 Centaurus Rd, Patchway, Bristol BS34 5TS, UK","shopping","+44 330 838 9530","https://www.bmstores.co.uk/stores/bristol-patchway-530","","51.5272919","-2.5973424","Checkins","ChIJDVQq7YGRcUgRLa0ly8_ABp0"
"McDonald's","Western Links, Weston-super-Mare BS23 3WL, United Kingdom","restaurant","+44 1934 625340","http://www.mcdonalds.co.uk/","","51.3390109","-2.956917499999999","Checkins","ChIJvYjF49D4cUgRoVqxOjGVwlQ"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","Likes","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","Likes","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Likes","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Sainsbury's","Serbert Rd, Portishead, Bristol BS20 7GD, UK","shopping","+44 1275 750400","https://stores.sainsburys.co.uk/2252/portishead","","51.4839614","-2.7613376","","ChIJrUeuNojtcUgRDg9ZtuAVw80"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"National Trust - Tyntesfield","Wraxall, Tyntesfield BS48 1NX, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","Likes","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","",""
"The Bath Priory","Weston Rd, Bath BA1 2XT, United Kingdom","lodging","+44 1225 331922","http://www.thebathpriory.co.uk/","","51.3900945","-2.382454000000001","36 Hours In...Bath","ChIJxSy0uEiBcUgRBWDUuWOJVsE"
"The Halcyon Hotel Apartments, Henry Street","8 Henry St, Bath BA1 1JT, UK","lodging","+44 1225 585100","http://www.thehalcyon.com/","","51.3799113","-2.3576342","36 Hours In...Bath","ChIJ05aKlhGBcUgRcn9OzuqwEFQ"
"Sub13 - new gin collection bar","4 2ee, Edgar Buildings, George St, Bath BA1 2DU, UK","bar","+44 1225 466667","http://www.sub13.net","","51.3849994","-2.3624013","36 Hours In...Bath","ChIJq6q2oRSBcUgRs6kN6clmP80"
"The Roman Baths","Stall St, Bath BA1 1LZ, United Kingdom","museum","+44 1225 477785","https://www.romanbaths.co.uk/","","51.381072","-2.359619","36 Hours In...Bath","ChIJtTDV3hOBcUgRTSLxFGi-Rg4"
"Colonna & Small's","6 Chapel Row, Bath BA1 1HN, United Kingdom","cafe","+44 7766 808067","http://www.colonnaandsmalls.co.uk/","","51.3827777","-2.3644163","36 Hours In...Bath","ChIJMUKdQBSBcUgRvz8IiqG_Fm8"
"The Bertinet Bakery","1 New Bond St Pl, Bath BA1 1BH, United Kingdom","restaurant","+44 1225 445531","http://www.bertinet.com/","","51.3829721","-2.360359700000001","36 Hours In...Bath","ChIJG_nQlxOBcUgRMCvWK8NtvwU"
"Bath Assembly Rooms","Bennett St, Bath, Avon BA1 2QH, United Kingdom","generic","+44 1225 477173","http://www.nationaltrust.org.uk/bath-assembly-rooms/","","51.386204","-2.3628429","36 Hours In...Bath","ChIJVyYrVWuBcUgRXqedsmf2zJc"
"The Royal Crescent Hotel","16 Royal Crescent, Bath BA1 2LS, United Kingdom","lodging","+44 1225 823333","https://www.royalcrescent.co.uk/?utm_source=google&utm_medium=local&utm_campaign=hotel","","51.387348","-2.368125999999999","36 Hours In...Bath","ChIJZUo6omqBcUgRBxusGzy3GoI"
"The Star Inn","The Street Hullavington Wiltshire SN14 6DU","generic","","","","51.536032213209815","-2.1543502807617183","36 Hours In...Bath",""
"Thermae Bath Spa","The Hetling Pump Room, Hot Bath St, Bath BA1 1SJ, United Kingdom","generic","+44 1225 331234","http://www.thermaebathspa.com/","","51.3803643","-2.3615092","36 Hours In...Bath","ChIJ7b7VUBGBcUgRI-5CyCMdPsM"
"Prior Park Landscape Garden","Ralph Allen Drive Bath Bath and North East Somerset BA2 5AH","generic","","","","51.36479996703179","-2.3475266493227194","36 Hours In...Bath",""
"Costcutter","Avon way North Somerset","generic","","","","51.481604736419456","-2.7814604011169495","What",""
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.43759530000001","-2.864774","","ChIJlbwvRfXwcUgR2CkZghuzo30"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"National Trust - Tyntesfield","Wraxall, Tyntesfield BS48 1NX, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"B&M Home Store","3 Centaurus Rd, Patchway, Bristol BS34 5TS, UK","shopping","+44 330 838 9530","https://www.bmstores.co.uk/stores/bristol-patchway-530","","51.5272919","-2.5973424","","ChIJDVQq7YGRcUgRLa0ly8_ABp0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.43759530000001","-2.864774","","ChIJlbwvRfXwcUgR2CkZghuzo30"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"The Bath Priory","Weston Rd, Bath BA1 2XT, United Kingdom","lodging","+44 1225 331922","http://www.thebathpriory.co.uk/","","51.3900945","-2.382454000000001","","ChIJxSy0uEiBcUgRBWDUuWOJVsE"
"The Halcyon Hotel Apartments, Henry Street","8 Henry St, Bath BA1 1JT, UK","lodging","+44 1225 585100","http://www.thehalcyon.com/","","51.3799113","-2.3576342","","ChIJ05aKlhGBcUgRcn9OzuqwEFQ"
"Sub13 - new gin collection bar","4 2ee, Edgar Buildings, George St, Bath BA1 2DU, UK","bar","+44 1225 466667","http://www.sub13.net","","51.3849994","-2.3624013","","ChIJq6q2oRSBcUgRs6kN6clmP80"
"Colonna & Small's","6 Chapel Row, Bath BA1 1HN, United Kingdom","cafe","+44 7766 808067","http://www.colonnaandsmalls.co.uk/","","51.3827777","-2.3644163","","ChIJMUKdQBSBcUgRvz8IiqG_Fm8"
"The Bertinet Bakery","1 New Bond St Pl, Bath BA1 1BH, United Kingdom","restaurant","+44 1225 445531","http://www.bertinet.com/","","51.3829721","-2.360359700000001","","ChIJG_nQlxOBcUgRMCvWK8NtvwU"
"The Royal Crescent Hotel","16 Royal Crescent, Bath BA1 2LS, United Kingdom","lodging","+44 1225 823333","https://www.royalcrescent.co.uk/?utm_source=google&utm_medium=local&utm_campaign=hotel","","51.387348","-2.368125999999999","","ChIJZUo6omqBcUgRBxusGzy3GoI"
"Thermae Bath Spa","The Hetling Pump Room, Hot Bath St, Bath BA1 1SJ, United Kingdom","generic","+44 1225 331234","http://www.thermaebathspa.com/","","51.3803643","-2.3615092","","ChIJ7b7VUBGBcUgRI-5CyCMdPsM"
"Kirsty Martin","Martcombe Road, Bristol, England, BS20, United Kingdom","generic","+44 7453 655952","","","51.474918","-2.70991","Contacts",""
"Kirsty Martin","Martcombe Road, Bristol, England, BS20, United Kingdom","generic","+44 7453 655952","","","51.474918","-2.70991","Contacts",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead Working Mens Club","Slade Road, Portishead, BS20 6BW, United Kingdom","generic","","","","51.484703248025","-2.7693352316805","Facebook",""
"Royal Victoria Park","Marlborough Ln, Bath BA1 2NQ, UK","generic","+44 1225 394041","http://visitbath.co.uk/things-to-do/royal-victoria-park-p25701","","51.3857067","-2.3708195","Facebook","ChIJNbSuIRWBcUgRDWNdo793S88"
"Swindon Oasis","Swindon, United Kingdom","generic","","","","51.48171056","-2.52573798","Facebook",""
"Oldbury Court Estate","Oldbury Ct Rd, Fishponds, Bristol BS16 2JH, UK","generic","+44 117 922 2000","https://www.bristol.gov.uk/museums-parks-sports-culture/oldbury-court-estate","","51.48785659999999","-2.529845399999999","Facebook","ChIJb9eUOiCQcUgRrq22DD_ICk8"
"Riverside Holiday Village","Bridgwater Rd, Bleadon, Weston-super-Mare BS24 0AN, UK","camping","+44 1934 812342","http://www.westcountryparks.co.uk/riverside-holiday-village/","","51.3069719","-2.9592683","Facebook","ChIJbRYKqW0CckgRixFozu7irpM"
"Blaise Castle","Bristol BS10 7QT, UK","generic","+44 117 903 9818","http://www.friendsofblaise.co.uk/","","51.5025659","-2.637383899999999","Facebook","ChIJfdUWIGaScUgRpZyQr13zOTk"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"High Down Junior School","Down Rd, Portishead, Bristol BS20 6DY, UK","school","+44 1275 843969","http://www.highdownjunior.co.uk/","","51.4801769","-2.7872175","Facebook","ChIJHSDTGQfycUgR1FNO3N3KgWc"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48686459999999","-2.7653641","","ChIJS215oIntcUgRPZbO7CuAaec"
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"The Fleece","12 St Thomas St, Bristol BS1 6JJ, UK","generic","+44 117 945 0996","http://www.thefleece.co.uk/","","51.45230189999999","-2.589315399999999","Facebook","ChIJ_TD91XuOcUgRjMrlhE2_y5o"
"The Bristol Hot Air Baloon Fiesta","Bristol, BS41 9, United Kingdom","generic","","","","51.441254277308","-2.6433306349142","Facebook",""
"Weston-super-Mare","Weston-super-Mare, United Kingdom","generic","","","","51.3458","-2.96778","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Brean Leisure Park","Coast Rd, Brean, Burnham-on-Sea TA8 2QY, UK","generic","+44 1278 751595","http://www.brean.com/","","51.2851298","-3.009675","Facebook","ChIJo-wata8BckgRnLpSPWqfLiY"
"Brean Leisure Park","Coast Rd, Brean, Burnham-on-Sea TA8 2QY, UK","generic","+44 1278 751595","http://www.brean.com/","","51.2851298","-3.009675","Facebook","ChIJo-wata8BckgRnLpSPWqfLiY"
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Redcliffe Bay Seafront","249 Down Road Bristol England BS20 8HY United Kingdom","generic","","","","51.476458095","-2.8004557133333","Facebook",""
"Weston Big Wood","25 Spring Rise Bristol England BS20 6RE United Kingdom","generic","","","","51.475398524608","-2.7761074471516","Facebook",""
"Welcome Family Holiday Park","Dawlish Warren, Dawlish, South Devon EX7 0PH, United Kingdom","camping","+44 345 165 6265","https://www.welcomefamily.co.uk/","","50.6001833","-3.4476097","Facebook","ChIJ73pY0UQKbUgRNpirpCc6-tQ"
"Welcome Family Holiday Park","Dawlish Warren, Dawlish, South Devon EX7 0PH, United Kingdom","camping","+44 345 165 6265","https://www.welcomefamily.co.uk/","","50.6001833","-3.4476097","Facebook","ChIJ73pY0UQKbUgRNpirpCc6-tQ"
"Love Hate ","9 Brendon Road Bristol England BS20 6DJ United Kingdom","home","","","","51.482314952313","-2.78711266877407","Https://goo.gl/photos/pMYTnCir#What#Https://assets.contentstack.io#Youporn#Likes#Meee#Porn hub#Yup#Restaurant#🤑🚘#07538 552769#36 Hours In...Bath#07 38 552769#Bus stop#Checkins#Friend#Freeman#Contacts#36 Hours In...Bath#C3#Facebook#Busstopd#Hotel#Dmap",""
"Lower Down","Portishead, Bristol BS20 6EB, UK","bowling","","","","51.48093799999999","-2.784538","Dan","ChIJ_xac6AfycUgRKmgffCqQHME"
"358999/07/718825/8","thanks","generic","07358552769","","","undefined","undefined","🤑🚘",""
"See","12-23 Gaunts Cl, Portishead, Bristol BS20 8BL, UK","casino","07502220198",".com","Gay","51.4779400858475","-2.79685295939177","Paije",""

kirsty14141 on 11 Sep 2018

There is also a C implementation of huffman from Brotli in CoreFX: huffman.c, huffman.h, and System.IO.Compression.Brotli is referenced by System.Net.Http; could these be consolidated?

am11 on 17 Dec 2018

I doubt that makes sense. The HPACK Huffman encoding is a static encoding that's specific to HPACK.

geoffkizer on 18 Dec 2018

I really enjoyed the conversation and co-operation between you devs to optimize those little things. Just here to say Thanks. 😶
Thank you Open source. Thank you Microsoft

mafshin on 9 Jan 2019

❤2 👍1

@mafshin any interest in ongoing involvement? We have plenty of new API, fixes, and perf work up for grabs. We would be glad to have you.

(Note for those marked both up-for-grabs and api-needs-work the task is to make an API proposal for review. If api-approved the task is to implement the API.)

danmosemsft on 9 Jan 2019

Triage: The idea is described above: https://github.com/dotnet/corefx/issues/31751#issuecomment-419618551

karelz on 9 Oct 2019

Hi @geoffkizer, I was hooked up by this and gave it a try. I have build decoding table and run benchmark for several lookup bits size tables. I have managed to bring tables (static, pregenerated in source code, one dimensional array of ushort) to bellow sizes:
|bytes|lookup bits|
|-|-|
|1008|2|
|1472|3|
|1728|4|
|3904|5|
|5248|6|
|7936|7|
|7680|8|

I have also tried to optimize current algorithm, as I quite like the elegance and simplicity of it.

I have created micro benchmark by BenchmarkDotNet with following results:

| Method | Mean | Error | StdDev | Op/s | Ratio |
|----------------- |------------:|----------:|----------:|-------------:|------:|
| Current | 1,822.29 ns | 13.807 ns | 12.240 ns | 548,760.2 | 1.00 |
| CurrentOptimized | 1,050.57 ns | 5.903 ns | 5.522 ns | 951,867.1 | 0.58 |
| LookupBits2 | 1,674.67 ns | 13.353 ns | 12.491 ns | 597,132.4 | 0.92 |
| LookupBits3 | 1,213.27 ns | 19.844 ns | 18.562 ns | 824,219.2 | 0.66 |
| LookupBits4 | 1,126.86 ns | 15.168 ns | 13.446 ns | 887,419.7 | 0.62 |
| LookupBits5 | 922.54 ns | 8.767 ns | 8.201 ns | 1,083,960.3 | 0.51 |
| LookupBits6 | 762.84 ns | 7.241 ns | 6.419 ns | 1,310,899.4 | 0.42 |
| LookupBits7 | 642.59 ns | 6.567 ns | 5.821 ns | 1,556,201.7 | 0.35 |
| LookupBits8 | 641.35 ns | 4.952 ns | 4.632 ns | 1,559,205.3 | 0.35 |

It runs and decode following strings which I hoped might be close to common usage:

www.example.com
no-cache
upgrade-insecure-requests
mEO7bfwFStBMwJWfW4pmg2XL25AswjrVlfcfYbxkcS2ssduZmiKoipMH9XwoTGkb+Qnq9bcjwWbwDQzsea/vMQ==

Benchmark shows, that there is compromise between size of static decoded table and decoding speed.

It also shows that current optimized algorithm can compete with lookup tables.

I do not feel I have enough context to make decision of what way to proceed as I am not aware of how costly is to have additional 8KB static table.

From my point of view I would go with CurrentOptimized for followign reasons:

code is more readable and maintainable, to me, than lookuptable algorithm which is about 80 lines of shifts and ifs :-)
less memory spent on static decoding table
less chance of bug in code - although all tests we currently have are passing on my implementation of lookuptable it is not guaranteed those UT cover all possible failure modes

Would you recommend to proceed with PR for CurrentOptimized?
This commit has the draft version of proposed changes. Todo is benchmark and one time 'monte carlo' tests comparing existing and new implementation results.

@karelz Can you please assign me to this issue?

rokonec on 9 Oct 2020

It would be good to see perf results that are not in the ~1us range; this is very short and the results are not necessarily representative in general. I'd suggest adding more input to your test.

Assuming that the test is representative, we would want to have this perf test in the tree in order to measure future changes/improvements.

I can't comment on any of the "lookup bits" variations without seeing the code.

If you have an improvement on the existing approach, then we would certainly consider taking that. It would be good to understand which changes are responsible for the improvements, and to what extent.

geoffkizer on 9 Oct 2020

I was using BenchmarkDotNet for those benchmarks. This tool compute number of invocation of benchmark methods to reach sufficient precision. Default behavior, which I used, is described here:
--iterationTime - desired time of execution of an iteration. Used by Pilot stage to estimate the number of invocations per iteration. 500ms by default.
I tried to run body of benchmark method 1e6 times with statistically identical results.

I plan to create and sumbit PR for bechmark into https://github.com/dotnet/aspnetcore/blob/master/src/Servers/Kestrel/perf/Kestrel.Performance/HPackDecoderBenchmark.cs as this seems to already have another perf benchmark related to HPack.

Bellow are links on 'draft' commits where you can see implementation of different algorithms.

Current optimized
https://github.com/rokonec/runtime/commit/2c350184c2ace484f2350a8428d3d64426d5a3b2

Main speed improvements lies by replacing lines
https://github.com/rokonec/runtime/blob/50ac454d8d8a1915188b2a4bb3fff3b81bf6c0cf/src/libraries/Common/tests/System/Net/Http/HuffmanDecoder.cs#L52-L56 by simple load and shift.
As oppose to shift and rotate 5 bytes for every huffman code I preload data into UInt64 variable and use simple shift to extract value based on count of bits left in that buffer.
C# // we need exactly 32 bits of code, if there is less than 32 bits left in source, we have to fill missing bits by zeroes from LSB up uint data = (uint)(bitsInAcc > 32 ? acc >> (bitsInAcc - 32) : acc << (32 - bitsInAcc));

Decoding by lookup table
https://github.com/rokonec/runtime/commit/52229bbc3f948b6ae384e313e77c2120ce67ec69

Static lookup table is omitted for clarity.

rokonec on 9 Oct 2020

We are generally not shy about going the extra mile for perf. I don't think your LUT code is concerning.

I am personally fine with an 8K LUT, but we've recently put in a lot of work to reducing assembly size and I would understand pushback. @stephentoub
any thoughts here?

If it is a problem, then can the LUT be generated quickly at runtime? If yes, then I say lets go with the LookupBits8 version.

It would be useful to know how frequently Huffman-coded headers are used and how much they impact perf of a request. @JamesNK do you have any data here for gRPC?

scalablecory on 12 Oct 2020

All the gRPC spec headers are small so I doubt anyone is huffman encoding them. And they're common between requests so gRPC headers will be compressed with HPack.

JamesNK on 12 Oct 2020

LUT could be generated quickly at runtime with about O(n log n) where n = 257 and body just about 10 fast lines.
It should be possible to do it with just one allocation - the destination LUT ushort[].

rokonec on 13 Oct 2020

LUT could be generated quickly at runtime with about O(n log n) where n = 257 and body just about 10 fast lines.
It should be possible to do it with just one allocation - the destination LUT ushort[].

Lets just do this then.

scalablecory on 13 Oct 2020

👍3

Related: Any plans for huffman encoding? With an optimized implementation then there is probably a header size threshold where huffman encoding large headers makes sense.

JamesNK on 20 Oct 2020

I have created pull request https://github.com/dotnet/runtime/pull/43603 about 2 hours ago with 8 bit LUT.
@JamesNK I would like to benchmarks those large headers, can I have some please?

rokonec on 20 Oct 2020

Related: Any plans for huffman encoding? With an optimized implementation then there is probably a header size threshold where huffman encoding large headers makes sense.

No immediate plans, but it's something to consider.

I thought we had an issue for this, but apparently not...

geoffkizer on 12 Nov 2020

I thought we had an issue for this, but apparently not...

It was closed; iirc we weren't happy with the CPU vs bandwidth tradeoff.

scalablecory on 12 Nov 2020

Runtime: HTTP2: Huffman decoding implementation is inefficient

Most helpful comment

All 104 comments

Related issues