We are currently implementing Huffman decoding by looping over a table with a list of encoded values for each length. (See Huffman.cs)
This doesn't seem particularly efficient.
Instead, we should implement Huffman decoding by doing table lookups, using keys of 8 bits or something along those lines.
Note that Huffman encoding from servers seems to be pretty common, so this may have real-world perf impact. (Or maybe not... as always, measuring is the best way to know.)
What were your thoughts regarding how this might work? I took a look at the code, but it wasn't immediately apparent that the current implementation could be improved much. Here are my observations:
The only thing I saw in the code that I thought could be improved was the repeated calculation of codeMax even though that value is essentially a constant for each entry in the decoding table. Expanding the table to be a 3-tuple that included codeMax would avoid that, but the cost there is likely very insignificant.
Mind you, I'm new to the code so I could be way off base here. Looking to see if you could clarify this a bit.
@geoffkizer thoughts?
IMHO one of the best implementations of Huffman en/decoder is available here: https://github.com/Cyan4973/FiniteStateEntropy
Its written in C but it should be one of the best examples of Huffman implementations. It is in library together with Finite State Entropy en/decoder written using Asymmetric Numeral Systems (Huffman is special case of ASN). ASN is a separate story and I plan to submit API proposal to include zstd compression in .NET Core and will write more than..
The other hash table option would be to generate a massive "rainbow table" like structure where all combinations of bits beyond the significant ones up to your key length are also hashed.
It doesn't need to be a single, massive hash table. It can be a multi-level table. Take the first 8 bits (or 10, or whatever) and use it as the key to the first-level table. This results in either a decoded value and a bit length, or a reference to a second-level table -- if so, take the next 8 bits and index into the second table, etc.
Choosing the best key size for each table involves a tradeoff -- smaller key sizes mean smaller tables but more lookups, and vice-versa. You can tune this as appropriate. With 8 bits, your main table is only 512 bytes (256 entries, 1 byte for decoded value, 1 byte for either bit length or next table id), and the most common values (including all alphanumerics) will be decoded in a single table lookup.
I explored that approach briefly as well, but it just seemed like we were back to looping again. A mock up to see if there is any real perf differential would be the way to go here. I might be able to take a crack at that next week.
There's no looping if the value is encoded in 8 bits or less. (Or whatever key size you choose.) And this covers the most common values, including all alphanumerics and common punctuation.
Agreed, encoded values up to the chosen key size would be found immediately. Couple more questions:
Is there a compiled list of common request headers you guys have that can be used for perf testing? If not, I can put together my own.
Not really.
I think a collection initializer is fine. If necessary, it's easy to switch this to a Lazy<T> later.
The codeMax optimization THAT @jbhensley suggested has an immediate ~10% benefit in micro benchmark testing:
Method | Mean | Error | StdDev | Scaled | ScaledSD |
------- |---------:|---------:|---------:|-------:|---------:|
New | 86.32 ns | 1.864 ns | 1.915 ns | 0.89 | 0.05 |
Core | 97.49 ns | 1.950 ns | 5.069 ns | 1.00 | 0.00 |
Here's the core change, note the manually-computed additional deltaLength member:
private static readonly (int codeLength, int deltaLength, int[] codes)[] s_decodingTable = new[]
{
(05, 0, new[] { 48, 49, 50, 97, 99, 101, 105, 111, 115, 116 }), // deltaLength must be 0 for related optimizations to work
(06, 1, new[] { 32, 37, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57, 61, 65, 95, 98, 100, 102, 103, 104, 108, 109, 110, 112, 114, 117 }),
(07, 1, new[] { 58, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 106, 107, 113, 118, 119, 120, 121, 122 }),
(08, 1, new[] { 38, 42, 44, 59, 88, 90 }),
(10, 2, new[] { 33, 34, 40, 41, 63 }),
(11, 1, new[] { 39, 43, 124 }),
(12, 1, new[] { 35, 62 }),
(13, 1, new[] { 0, 36, 64, 91, 93, 126 }),
(14, 1, new[] { 94, 125 }),
(15, 1, new[] { 60, 96, 123 }),
(19, 4, new[] { 92, 195, 208 }),
(20, 1, new[] { 128, 130, 131, 162, 184, 194, 224, 226 }),
(21, 1, new[] { 153, 161, 167, 172, 176, 177, 179, 209, 216, 217, 227, 229, 230 }),
(22, 1, new[] { 129, 132, 133, 134, 136, 146, 154, 156, 160, 163, 164, 169, 170, 173, 178, 181, 185, 186, 187, 189, 190, 196, 198, 228, 232, 233 }),
(23, 1, new[] { 1, 135, 137, 138, 139, 140, 141, 143, 147, 149, 150, 151, 152, 155, 157, 158, 165, 166, 168, 174, 175, 180, 182, 183, 188, 191, 197, 231, 239 }),
(24, 1, new[] { 9, 142, 144, 145, 148, 159, 171, 206, 215, 225, 236, 237 }),
(25, 1, new[] { 199, 207, 234, 235 }),
(26, 1, new[] { 192, 193, 200, 201, 202, 205, 210, 213, 218, 219, 238, 240, 242, 243, 255 }),
(27, 1, new[] { 203, 204, 211, 212, 214, 221, 222, 223, 241, 244, 245, 246, 247, 248, 250, 251, 252, 253, 254 }),
(28, 1, new[] { 2, 3, 4, 5, 6, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 220, 249 }),
(30, 2, new[] { 10, 13, 22, 256 })
};
@grant-d What code are you using to test this?
Code optimization, without changing the algorithm, gets us to ~23% improvement in the micro-bench.
(This needs real-world testing to get a measure of any true improvement)
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
---------- |---------:|----------:|----------:|-------:|---------:|----------:|
Optimized | 71.61 ns | 0.8184 ns | 0.7656 ns | 0.77 | 0.02 | 0 B |
Legacy | 93.17 ns | 1.7961 ns | 2.3977 ns | 1.00 | 0.00 | 0 B |
For example, changed this:
for (var i = 0; i < s_decodingTable.Length && s_decodingTable[i].codeLength <= validBits; i++)
{
(var codeLength, var codes) = s_decodingTable[i];
if (i > 0)
{
codeMax <<= codeLength - s_decodingTable[i - 1].codeLength;
}
codeMax += codes.Length;
To this:
for (var i = 0; i < s_decodingTable.Length; i++)
{
// deltaLength is precomputed codeMax delta
var (codeLength, deltaLength, codes) = s_decodingTable[i];
// Move check out of for-loop to leverage cached value for codeLength
if (codeLength > validBits)
break;
// Mitigate the if (i > 0) branch by ensuring s_decodingTable[0].deltaLength==0
codeMax = (codeMax << deltaLength) + codes.Length;
@geoffkizer see HuffmanBench.cs and related units in this PR
Another example.
This:
var next = (uint)(src[i] << 24 + lastDecodedBits);
next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0)
Can be optimized as follows, since if the first src.Length branch is false, then there's no way the 2nd & 3rd branches can be true:
var next = (uint)(src[i] << 24 + lastDecodedBits);
if (i + 1 < src.Length)
{
next |= (uint)(src[i + 1] << 16 + lastDecodedBits);
if (i + 2 < src.Length)
{
next |= (uint)(src[i + 2] << 8 + lastDecodedBits);
if (i + 3 < src.Length)
next |= (uint)(src[i + 3] << lastDecodedBits);
}
}
I have code for using the multi-level table and am currently working on cleaning it up a bit and testing. Decoding becomes simple:
```C#
public static int Decode(uint data, int validBits, out int decodedBits)
{
int byteNumber = 3; // grab the most significant byte
uint virtualDictionaryMask = 0; // used to key into different "virtual" dictionaries
DecodingTableEntry entry;
do
{
// extract the working byte
uint workingByte = data >> (byteNumber * 8) & 0xFF;
// apply virtual dictionary bitmask
workingByte |= virtualDictionaryMask;
// key into the dictionary
if (!s_decodingDictionary.TryGetValue(workingByte, out entry))
break; // we should either get an entry or bitmask for the next virtual dictionary. if we get neither then
// the bit pattern is not consistent with any entry we have
// if we get a length back then we have found the decoded value
if (entry.BitLength > 0)
{
if (entry.BitLength > validBits)
break; // we only found a value by incorporating bits beyond the the valid remaining length of the data stream
decodedBits = entry.BitLength;
return (int)entry.DecodedValue;
}
// otherwise, we have found a mask that lets us key into the next virtual dictionary
else
{
virtualDictionaryMask = entry.DecodedValue;
byteNumber--;
}
} while (entry.BitLength == 0);
// no luck. signal to caller that we could not decode
decodedBits = 0;
return -1;
}
``
Building the table is a bit ugly and will likely need to be done with aLazythat calls a method to construct it since the total size is 3840 entries. We could still use a collection initializer and surround it with a#region` I suppose. Either way, I'm a little concerned about the memory footprint.
Since there would be 15 tables in total, I took the approach of using bitmasks on the key value of a single table to create several "virtual" ones within it. The "root" table has no mask, so you initially key in with the value you want to decode. If you get a bit length back then you have the decoded value. Otherwise, you have a bitmask to apply to your next byte when keying in again. The bitmask represents the next "vitual" table.
You have more total entries (3840) than necessary.
The fourth-level tables only need 6 bits to index them (max encoded length is 30 bits). That by itself cuts the total number of entries in half, if my math is correct.
Additionally, at least some intermediate tables don't need all 8 bits either. For example, all encodings that start with 11111110 are exactly 10 bits in length, so the second level table here can be exactly 4 entries instead of the 256 you'd need if you indexed with a full 8 bits.
Dealing with all this will complicate the code and isn't necessary to get decent perf results.
True, the storage could be optimized. Right now, however, the perf results are not looking very promising. Oddly enough, my test with a pre-computed codeMax is worse which I cannot explain.
Test set consists of 792 lines of actual HTTP request header data. The file is read and Huffman encoded into bytes as part of the setup and then each decoding method gets it's own crack at it, while verifying that correct values are being decoded. I kind of expected something more dramatic than this.
| Method | Mean | Error | StdDev | Scaled | ScaledSD | Gen 0 | Gen 1 | Allocated |
|----------------------------------------- |--------------:|--------------:|--------------:|---------:|---------:|--------:|--------:|----------:|
| BuildHuffmanDictionary | 112,665.66 ns | 2,241.0968 ns | 2,096.3232 ns | 6,667.28 | 119.88 | 31.8604 | 10.6201 | 201624 B |
| ProcessHeadersWithLoop | 16.90 ns | 0.0089 ns | 0.0074 ns | 1.00 | 0.00 | - | - | 0 B |
| ProcessHeadersWithLoopAndComputedCodeMax | 17.39 ns | 0.3766 ns | 0.5637 ns | 1.03 | 0.03 | - | - | 0 B |
| ProcessHeadersWithDictionary | 16.88 ns | 0.3689 ns | 0.3451 ns | 1.00 | 0.02 | - | - | 0 B |
For reference, my version of the pre-computed codeMax treats it as a complete constant:
```C#
public static readonly (int codeLength, int codeMax, int[] codes)[] s_decodingTable2 = new[]
{
(5, 10, new[] { 48, 49, 50, 97, 99, 101, 105, 111, 115, 116 }),
(6, 46, new[] { 32, 37, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57, 61, 65, 95, 98, 100, 102, 103, 104, 108, 109, 110, 112, 114, 117 }),
(7, 124, new[] { 58, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 106, 107, 113, 118, 119, 120, 121, 122 }),
(8, 254, new[] { 38, 42, 44, 59, 88, 90 }),
(10, 1021, new[] { 33, 34, 40, 41, 63 }),
(11, 2045, new[] { 39, 43, 124 }),
(12, 4092, new[] { 35, 62 }),
(13, 8190, new[] { 0, 36, 64, 91, 93, 126 }),
(14, 16382, new[] { 94, 125 }),
(15, 32767, new[] { 60, 96, 123 }),
(19, 524275, new[] { 92, 195, 208 }),
(20, 1048558, new[] { 128, 130, 131, 162, 184, 194, 224, 226 }),
(21, 2097129, new[] { 153, 161, 167, 172, 176, 177, 179, 209, 216, 217, 227, 229, 230 }),
(22, 4194284, new[] { 129, 132, 133, 134, 136, 146, 154, 156, 160, 163, 164, 169, 170, 173, 178, 181, 185, 186, 187, 189, 190, 196, 198, 228, 232, 233 }),
(23, 8388597, new[] { 1, 135, 137, 138, 139, 140, 141, 143, 147, 149, 150, 151, 152, 155, 157, 158, 165, 166, 168, 174, 175, 180, 182, 183, 188, 191, 197, 231, 239 }),
(24, 16777206, new[] { 9, 142, 144, 145, 148, 159, 171, 206, 215, 225, 236, 237 }),
(25, 33554416, new[] { 199, 207, 234, 235 }),
(26, 67108847, new[] { 192, 193, 200, 201, 202, 205, 210, 213, 218, 219, 238, 240, 242, 243, 255 }),
(27, 134217713, new[] { 203, 204, 211, 212, 214, 221, 222, 223, 241, 244, 245, 246, 247, 248, 250, 251, 252, 253, 254 }),
(28, 268435455, new[] { 2, 3, 4, 5, 6, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 30, 31, 127, 220, 249 }),
(30, 1073741824, new[] { 10, 13, 22, 256 })
};
And the loop decoder:
```C#
public static int DecodeWithLoopAndCodeMax(uint data, int validBits, out int decodedBits)
{
for (int i = 0; i < s_decodingTable2.Length && s_decodingTable2[i].codeLength <= validBits; i++)
{
(int codeLength, int codeMax, int[] codes) = s_decodingTable2[i];
int mask = int.MinValue >> (codeLength - 1);
long masked = (data & mask) >> (32 - codeLength);
if (masked < codeMax)
{
decodedBits = codeLength;
return codes[codes.Length - (codeMax - masked)];
}
}
decodedBits = 0;
return -1;
}
Can you link to the full code?
For one thing, you really should run more iterations. All your measured times are < 20ns, which is super small. I'd run enough iterations to have times above 1s, if possible.
I'll get the test project up and link, but that will probably be tomorrow.
Good idea @jbhensley, amongst other tricks that codeMax calc got it ~35% faster than the original.
I am also interested in your test cases, mine are trivial.
Does ProcessHeadersWithLoop in your tests reflect the original, unchanged, code?
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
---------------------- |---------:|----------:|---------:|-------:|---------:|----------:|
OptimizedCodeAndTable | 49.29 ns | 0.9742 ns | 1.628 ns | 0.65 | 0.02 | 0 B |
OriginalCode | 75.45 ns | 1.1256 ns | 1.053 ns | 1.00 | 0.00 | 0 B |
I'm new to corefx, so I'm struggling a little with the benchmarking. I'll take any pointers you have in regards to that or the code in general.
Also, I trimmed duplicate line items out of my headers.txt file so it's a bit smaller now.
@jbhensley I have created a fork of your project, added my code to it, then cleaned it all up a bit. I have changed the layout of your benchmark, since the enum approach did not allow you to optimize the outer loop. So each algorithm is just a duplicated Huffman<foo>.cs file in its entirety now. I have also added your headers.txt data to all the benchmarks.
Here's the latest results (see the README for details on algorithms).
You are right that the IDictionary approach is slower, I think it's probably due to the overhead of hashing and branching. Perhaps a hard-coded switch for a small subset of high-frequency values might work?
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
-------- |-----------:|---------:|---------:|-------:|---------:|----------:|
DictOpt | 1,281.3 ns | 24.27 ns | 22.70 ns | 1.41 | 0.03 | 0 B |
Dict | 1,311.6 ns | 24.41 ns | 22.83 ns | 1.45 | 0.03 | 0 B |
OrigOpt | 822.8 ns | 16.08 ns | 15.79 ns | 0.91 | 0.02 | 0 B |
Orig | 907.3 ns | 11.78 ns | 10.44 ns | 1.00 | 0.00 | 0 B |
Thanks. I'll take a look at what you've done with this.
My benchmarking probably wasn't as valid, but I was pretty much coming up with the same result. The looping implementation (either original or with codeMax pre-calculated) is able to return the top 10 most frequently used values on the first pass through the loop with only a few bit shift operations and numeric calcs/comparisons. The second pass gets you 26 more. I'm betting that hashing and keying isn't able to keep up with how quickly the loop is able to return the first several levels of most frequently used values. Maybe dictionary is faster for the less frequently used where the loop has to make more iterations, but it's not able to gain enough on those to make up for what it loses on the others.
Thanks for posting the code.
Don't use a dictionary for the table lookup. Just use an array. It's much cheaper.
I think that might be the ticket. I’ll give that it a shot tomorrow. Should simplify the code a bit as well.
Also -- In case you're not already doing this, you should do a warmup run before the real run, in the same process. Otherwise you may be including things like JIT costs in the run.
And I still think you should do many more iterations. The times above are in the 1ms range. Do 1000x more iterations and get it in the 1s range.
Or maybe you are already doing lots of iterations? I'm not familiar with the benchmark framework here, so not quite sure how it's executing the code.
Looks like the framework does all of the warmup for you. I tried giving it a large iteration count and it appears to be running each benchmark method more times, but the average runtime of any given iteration is still small. I could probably bring the numbers up if I added iterations in the method itself with a for loop instead of telling the framework to run it more.
Unrelated thought:
The fact that the per- symbol Decode method is a separate method isn't ideal. It's only called one place and it's probably large enough that the JIT is not inlining it. MethodImplOptions.AggressiveInlining might help here. Eventually we should probably just fold this into the main Decode routine.
Re this code that @grant-d mentioned above:
var next = (uint)(src[i] << 24 + lastDecodedBits);
next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0)
I think we can improve this. We are re-reading all 4 bytes from the source buffer each time. We shouldn't need to do this. We should just shift the current value (in next) by the number of decoded bits, and then read byte by byte from the source buffer into next until we've got at least 30 bits.
So initially, we would read the first 4 bytes into next, and set validBits = 32. Then at the end of each loop iteration, we shifit next by decodedBits, subtract decodedBits from validBits, and read from the source buffer until validBits is >= 30.
This gets tricky because the smallest encoded value is 5 bits, so it's possible that decodedBits is 5 and validBits is 32, meaning validBits becomes 27 and we need to read another byte to get it over 30. Unfortunately this means we need 33 bits. Changing next from a uint to a ulong is a quick fix for this, and shouldn't actually matter in practice on a 64 bit machine.
In a table-based lookup scheme, we don't actually need validBits to be >= 30; we just need it to be >= whatever the next table lookup size is (commonly 8). So that would simplify this as well.
Actually, looking at this code, I don't quite understand how it works in certain cases.
Imagine that lastDecodedBits is 7 after the last loop iteration. Then we are only actually going to read (32-7)=25 bits from the source buffer. If the next encoded value has more than 25 bits, it seems like this logic is broken. Am I missing something here?
This points out that we need much better tests for this code :)
We actually don't currently have any tests. At a minimum we should steal the ASP.NET tests. And probably add a bunch of new cases as well. This is all pretty tricky logic and we need thorough tests to ensure we're doing the right thing here.
We'll need these tests before we can check in any perf improvements here.
On my fork linked above I already included the ASP/Kestrel units. I tried inlining the second method but it made little difference, I will have another look. The buffered read is worth trying too
It might not matter much. Or it might already be inlined. A profile would tell for sure.
On my PR linked above I already included the ASP/Kestrel units.
I saw that. We need to add these to corefx.
There is an issue with lastDecodedBits as I described above.
Fixed in dotnet/corefx#32043. I also added a bunch of tests in that PR. Clearly the existing ones weren't sufficient...
Me and @grant-d have been messing about with this, and have arrived at a completely different algorithm that should be around O(N) (where N = bits). It's probably at the pit of optimization unless a vectorized algorithm is used - which is likely overkill.
| Method | Mean | Error | StdDev | Scaled | Allocated |
|-------- |---------:|---------:|---------:|-------:|----------:|
| Jump | 606.3 ns | 2.460 ns | 1.920 ns | 0.73 | 0 B |
| OrigOpt | 755.7 ns | 6.260 ns | 4.887 ns | 0.91 | 0 B |
| Orig | 833.2 ns | 2.639 ns | 2.203 ns | 1.00 | 0 B |
The idea is to make a binary tree, conceptually this:
c#
class Node {
public Node Zero;
public Node One;
public byte? ZeroByte;
public byte? OneByte;
}
You traverse the tree, feeding it one bit at a time until either ZeroByte or OneByte contain a value - at which point you emit the byte and reset to the root of the tree. The actual implementation compresses the tree into an array (as binary trees can be represented in flat arrays) of ushort tuples of length two (zero, one). If the MSB of the ushort is set it signifies a terminal/byte (terminate, emit and return to root); otherwise, it indicates the next index in the array to visit.
Interesting. I still suspect the table lookup approach will be faster, though. It's similar to your algorithm, except that you basically feed it a byte at a time instead of a single bit.
I included the dictionary/lookup table in the benchmark, it (bold) takes longer than the original:
| Method | Mean | Error | StdDev | Median | Scaled | ScaledSD | Allocated |
|-------- |-----------:|---------:|---------:|-----------:|-------:|---------:|----------:|
| Jump | 759.1 ns | 15.12 ns | 36.81 ns | 744.6 ns | 0.70 | 0.03 | 0 B |
| DictOpt | 1,439.8 ns | 28.75 ns | 42.15 ns | 1,423.1 ns | 1.32 | 0.04 | 0 B |
| Dict | 1,426.7 ns | 18.98 ns | 15.85 ns | 1,425.6 ns | 1.31 | 0.02 | 0 B |
| OrigOpt | 921.9 ns | 18.28 ns | 41.26 ns | 908.7 ns | 0.85 | 0.04 | 0 B |
| Orig | 1,087.5 ns | 18.37 ns | 15.34 ns | 1,083.4 ns | 1.00 | 0.00 | 0 B |
The problem is that the dictionary is hit (include negative tests) for every bit in a symbol after the 4th (5th?) bit. The DFA/Jump mechanism avoids this - every test in the tree is a positive test unless an invalid/terminal symbol is encountered.
@jbhensley unless you can find a faster algorithm, feel free to take the jump table and get your first netfx PR.
I just added the array implementation to the test project. In particular this commit.
Benchmark puts it very close to jump, but slightly behind.
| Method | Mean | Error | StdDev | Median | Scaled | ScaledSD | Allocated |
|-------- |-----------:|-----------:|-----------:|-----------:|-------:|---------:|----------:|
| Jump | 617.8 ns | 9.2048 ns | 8.1598 ns | 613.9 ns | 0.65 | 0.03 | 0 B |
| OrigOpt | 796.7 ns | 5.5610 ns | 4.9297 ns | 798.0 ns | 0.84 | 0.03 | 0 B |
| DictOpt | 1,174.0 ns | 1.4705 ns | 1.3035 ns | 1,173.4 ns | 1.24 | 0.05 | 0 B |
| Dict | 1,213.7 ns | 24.2705 ns | 43.7647 ns | 1,207.8 ns | 1.28 | 0.07 | 0 B |
| Orig | 950.6 ns | 18.8892 ns | 37.7237 ns | 925.2 ns | 1.00 | 0.00 | 0 B |
| Array | 650.4 ns | 0.5128 ns | 0.4546 ns | 650.5 ns | 0.69 | 0.03 | 0 B |
I haven't touched the overload public static int Decode(byte[] src, int offset, int count, byte[] dst) yet. Looking at that now. The array method processes one byte at a time, so pushing 4 bytes together with
uint next = (uint)(src[i] << 24 + lastDecodedBits);
next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0);
and then splitting them back up again seems like extra work. I'll see if the array implementation can be optimized by refactoring the calling overload.
@jbhensley sent you a PR with microoptimizations, Array is now _way_ faster - nicely done!
I'm not sure if the decode method with ints is used at all (it is public, so I assumed it was) - the jump mechanism ignores it completely for the byte array overload. Maybe leave it as is and look at the byte array overload independently.
Nice job. With some optimizations on Array I get the following results:
Method | Mean | Error | StdDev | Scaled | Allocated |
-------- |---------:|----------:|----------:|-------:|----------:|
Jump | 605.8 ns | 4.494 ns | 3.752 ns | 0.68 | 0 B |
OrigOpt | 739.6 ns | 10.508 ns | 9.830 ns | 0.84 | 0 B |
Orig | 884.4 ns | 12.059 ns | 10.070 ns | 1.00 | 0 B |
Array | 562.3 ns | 4.022 ns | 3.565 ns | 0.64 | 0 B |
Nice. I'll take a look at your changes.
Correction, even better :-)
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
Jump | 643.3 ns | 12.645 ns | 12.986 ns | 0.72 | 0.02 | 0 B |
OrigOpt | 739.5 ns | 15.968 ns | 15.683 ns | 0.83 | 0.02 | 0 B |
Orig | 889.0 ns | 8.995 ns | 8.414 ns | 1.00 | 0.00 | 0 B |
Array | 498.6 ns | 3.865 ns | 3.615 ns | 0.56 | 0.01 | 0 B |
My changes are in the my fork. Please pull my latest (if you want), then I can start working on yours directly.
Nice work. Great to see the results here!
Re this code:
int value = s_decodingArray[arrayIndex, workingByte];
// if the value is positive then we have a pointer into the encoding table
if (value > -1)
{
var entry = s_encodingTable[value];
if (entry.bitLength > validBits)
break; // we only found a value by incorporating bits beyond the the valid remaining length of the data stream
decodedBits = s_encodingTable[value].bitLength;
return value; // the index is also the value
}
You don't need to use s_encodingTable here at all. Just put the bitLength into the decoding table and you can just do one lookup on it.
Good point. I'll make that change.
I still want to take a crack at integrating public static int Decode(byte[] src, int offset, int count, byte[] dst) and public static int Decode(uint data, int validBits, out int decodedBits) or at least rewiring the former to take advantage of how the latter is now processing.
Re the construction of next on every loop iteration: I agree, there's probably a win here. See my comment above for suggestions: https://github.com/dotnet/corefx/issues/31751#issuecomment-417535191
@grant-d I think I bungled the merge from your fork since I had pulled @jcdickinson's changes earlier and that created conflicts yours. Accidentally pushed to you while trying to resolve. I assume you can just roll that back. Sorry.
@grant-d Nevermind, I got it worked out now. Sorry for the confusion.
@jbhensley I fixed a bug in my changes to Array - it wasn't passing all units.
Also a couple more optimizations on your code.
Fixes pushed to my fork.
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
Jump | 619.8 ns | 12.096 ns | 19.185 ns | 0.71 | 0.02 | 0 B |
OrigOpt | 736.1 ns | 10.946 ns | 9.141 ns | 0.84 | 0.01 | 0 B |
Orig | 876.1 ns | 3.694 ns | 3.275 ns | 1.00 | 0.00 | 0 B |
Array | 472.2 ns | 4.820 ns | 4.025 ns | 0.54 | 0.00 | 0 B |
@geoffkizer @jbhensley
You don't need to use s_encodingTable here at all. Just put the bitLength into the decoding table and you can just do one lookup on it.
I tried that, it somehow ended up being slower.
@jbhensley I have submitted a PR to your repo.
Perf is now 0.47, and all units pass.
Awesome. I'm working on refactoring both Decode overloads into one, but will probably have to pick it back up after the holiday weekend.
I tried that, it somehow ended up being slower.
I'm surprised by that. Can you post the code?
I'm enjoying following this but I'm curious @geoffkizer whether you expect time spent in this code to be material to real scenarios?
@geoffkizer In addition to your comments regarding this section:
while (i < count)
{
uint next = (uint)(src[i] << 24 + lastDecodedBits);
next |= (i + 1 < src.Length ? (uint)(src[i + 1] << 16 + lastDecodedBits) : 0);
next |= (i + 2 < src.Length ? (uint)(src[i + 2] << 8 + lastDecodedBits) : 0);
next |= (i + 3 < src.Length ? (uint)(src[i + 3] << lastDecodedBits) : 0);
There's another issue to address. If the caller passes in a count less than the source buffer size, this code may read from bytes that it's not technically supposed to. If the caller passes in some arbitrarily large buffer with a count of 2, I assume we should honor that and not load from bytes i + 2 and i + 3.
@danmosemsft I can't speak for @geoffkizer, but our efforts appear to be turning up flaws in the current implementation logic, aside from the perf aspect. See my immediately preceding comment
@danmosemsft There's some reason to believe it could matter, but I really don't know if it does matter in real scenarios. Our HTTP2 implementation is not ready for real perf analysis.
@jbhensley
If the caller passes in a count less than the source buffer size, this code may read from bytes that it's not technically supposed to.
Yeah, good catch! It seems like it would be better to use Span here. That would both simplify the code a bit and enforce the bounds.
Ok, I have implemented a unified Decode that integrates both overloads. Check it out in the test project here.
I believe that I have eliminated all of the issues we were discussing. My initial objective was creating something that worked correctly and was easy (as possible) to understand. After that, I tried to eliminate a few variables and clean up. Also tried to include enough comments to help the next guy looking at it because this can be a bit difficult to follow.
The decoding array is being built with a Lazy<T> right now, but that can be easily changed to a collection initializer.
Also, I added a test that loops 1 million times. Each pass, it generates an array of bytes of random length up to 255. It populates the bytes with random values, encodes them, decodes them and verifies that you get back with you put in. Figured that might help us turn up oddball combinations/edge cases.
Tests still needed to ensure it handles invalid inputs correctly.
@geoffkizer Your thoughts on the rewrite and how far we should go trying to micro-optimize this? In the trade-off between optimization and readability/maintainability and I tend to error on the side of maintainability unless there's a proven reason to go otherwise. Also, is there any concern for the amount of memory we're using for the decoding array? Maybe put it up as a formal PR to enlist more feedback?
@jbhensley can you maybe make that a ReadOnlySpan<byte> instead - it's more intentional.
Also, maybe use short[,] instead of int[,] - gets us down to ~7.5k runtime overhead (still a concern)
Can you remove the Lazy property (s_decodingArray) entirely and evaluate it just outside the loop in both method bodies instead.
var decodingArray = s_decodingArrayLoader.Value; // <-- Cache Lazy.Value
var srcSpan = new ReadOnlySpan<byte>(src, offset, count); // <--- Note RoS
...
while (sourceIndex < srcSpan.Length)
{
...
var decodedValue = decodingArray[arrayIndex, workingByte]; // <--- Note cached Lazy.Value
The latter changes gets us down to 0.36 from from 0.43, which is close to be a 3x improvement!
Now:
Method | Mean | Error | StdDev | Scaled | Allocated |
------- |---------:|---------:|---------:|-------:|----------:|
Orig | 904.3 ns | 7.200 ns | 6.012 ns | 1.00 | 0 B |
Array | 322.0 ns | 2.072 ns | 1.837 ns | 0.36 | 0 B |
Was:
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
Orig | 880.9 ns | 12.820 ns | 11.365 ns | 1.00 | 0.00 | 0 B |
Array | 378.2 ns | 2.610 ns | 2.441 ns | 0.43 | 0.01 | 0 B |
Awesome work @grant-d. Big gains for simple changes. I'll incorporate this and update.
No problem. Another small change.
for (int i = 0; i < 4; i++)
{
uint workingByte = data >> (8 * (3 - i)) & 0xFF; // 24, 16, 8, 0
can be expressed more succinctly (& without the mult) as
for (byte i = 24; i >= 0; i -= 8) // 24, 16, 8, 0
{
uint workingByte = data >> i & 0xFF;
Similar change on the other override's loop
Your thoughts on the rewrite and how far we should go trying to micro-optimize this? In the trade-off between optimization and readability/maintainability and I tend to error on the side of maintainability unless there's a proven reason to go otherwise.
Readability and maintainability are critical.
As mentioned above, we don't really know how much of an impact Huffman decoding perf has in real world scenarios. As such, micro-optimization doesn't make sense right now. The focus should be on changes that make the code obviously better (for perf or correctness) without introducing additional complexity.
We can always revisit and micro-optimize later, if we get data that shows it's worth it.
@geoffkizer what about the 7.5k static runtime table size. It seems a lot for a deep internal component
As an aside, it's debatable whether maintainability is affected by unrolling here. This code:
for (var i = 0; i < 4; i++)
{
<stuff>
}
becomes this:
for (var i = 0; i < 4; i++)
{
Refactored(i, ...);
}
[AggressiveInline]
private static Refactored(int i, …) => <stuff>;
becomes this:
Refactored(0, ...);
Refactored(1, ...);
Refactored(2, ...);
Refactored(3, ...);
[AggressiveInline]
private static Refactored(int i, …) => <stuff>;
That being said, I do agree that it's premature to do this right now.
Just to be clear, the Decode method as it currently stands is below in full. There is only one now, although a separate overload still exists in the test project.
I also share the concern regarding memory consumption. We can cut that in half by changing the decoding table to a byte array (currently is a short array), but that requires a separate lookup from the encoding table to get the bit length since byte is not big enough to carry both the decoded value and bit length in one.
Other than that, it feels like we are at the end of this. Anything else or should we offer up a PR?
```C#
public static int Decode(byte[] src, int offset, int count, byte[] dst)
{
// input validation
if (src == null || count == 0)
return 0;
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset));
if (offset >= src.Length)
throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if (count > src.Length - offset)
throw new ArgumentOutOfRangeException(nameof(count));
if (dst == null)
throw new ArgumentNullException(nameof(dst));
// let's narrow thing down to just the part of the source buffer that we've been asked to decode
var sourceSpan = new ReadOnlySpan<byte>(src, offset, count);
int sourceIndex = 0;
int destinationIndex = 0;
int bitOffset = 0; // symbol bit patterns do not necessarily align to byte boundaries. need to keep track of our offset within a byte
var decodingArray = s_decodingArrayLoader.Value;
while (sourceIndex < sourceSpan.Length)
{
int arrayIndex = 0; // index into the decoding array
int decodedBits = 0;
// decoding a symbol will take a maximum of four bytes (longest symbol is represented by a 30 bit pattern)
for (int i = 0; i <= 24; i += 8)
{
// grab the next byte and push it over to make room for bits we have to borrow
byte workingByte = (byte)(sourceSpan[sourceIndex] << bitOffset);
// borrow some bits if we need to
if (bitOffset > 0)
workingByte |= (byte)((sourceIndex < sourceSpan.Length - 1 ? sourceSpan[sourceIndex + 1] : 0) >> (8 - bitOffset));
// key into array
int decodedValue = decodingArray[arrayIndex, workingByte];
// negative values are a pointer to the next decoding array
if (decodedValue < 0)
{
// move to the next source byte and prepare to key in again
sourceIndex++;
arrayIndex = -decodedValue;
}
else
// we have decoded a character
{
decodedBits = decodedValue >> 8; // length is in the second LSB
decodedValue &= 0xFF; // value is in the LSB
// Destination is too small.
if (destinationIndex == dst.Length)
throw new HuffmanDecodingException();
// store the decoded value and increment destinationIndex
dst[destinationIndex++] = (byte)decodedValue;
// if we decoded more bits than we are borrowing then advance to next byte
if (decodedBits - i >= 8 - bitOffset)
sourceIndex++;
// calculate and re-align bitOffset
bitOffset += decodedBits;
bitOffset &= 7; // same as modulo 8
break;
}
}
// we checked four bytes did not come up with a valid bit pattern
if (decodedBits == 0)
throw new HuffmanDecodingException();
// check for padding in the last byte before we loop back around and try to decode again
if (sourceIndex == sourceSpan.Length - 1)
{
// see if all of the remaning bits are padding
int paddingMask = (0x1 << (8 - bitOffset)) - 1;
if ((sourceSpan[sourceIndex] & paddingMask) == paddingMask)
{
// if our bitOffset is zero then all of bits in this byte are set
// "A padding strictly longer than 7 bits MUST be treated as a decoding error."
// https://httpwg.org/specs/rfc7541.html#rfc.section.5.2
if (bitOffset == 0)
throw new HuffmanDecodingException();
// otherwise (padding less than 8), just break out of the main loop because we are done
break;
}
}
}
return destinationIndex; // destinationIndex is incremented each time a value is decoded, so it also represents total decoded byte count
}
```
Thanks for checking. Only one small change - could you expand the 2nd line here into a full nested if instead. It's hard to parse, which makes maintaining and understanding it hard, especially since only the true case does anything useful.
if (bitOffset > 0)
workingByte |= (byte)((sourceIndex < sourceSpan.Length - 1 ? sourceSpan[sourceIndex + 1] : 0) >> (8 - bitOffset));
I don't see the link to the PR (?) but I noticed this code above (maybe not yours)
```c#
if (offset < 0)
throw new ArgumentOutOfRangeException(nameof(offset));
if (offset >= src.Length)
throw new ArgumentOutOfRangeException(nameof(offset));
if (count < 0)
throw new ArgumentOutOfRangeException(nameof(count));
if (count > src.Length - offset)
throw new ArgumentOutOfRangeException(nameof(count));
The compiler and JIT do not coalesce these exceptions. It is less code if you write eg
```c#
if (offset < 0 || offset >= src.Length || count < 0 || count > src.Length - offset)
throw new ArgumentOutOfRangeException(nameof(count));
I see more of this pattern.
Also, many places we cast to uint to combine comparisons, eg., change if (offset < 0 || offset >= src.Length || count < 0 || count > src.Length - offset) to if ((uint)offset >= (uint)src.Length || (uint)count > (uint)(src.Length - offset))
Since src.Length can't be negative, you can eliminate two of the compares using this pattern
C#
if ((uint)offset >= (uint)src.Length)
throw new ArgumentOutOfRangeException(nameof(offset));
if ((uint)count > (uint)(src.Length - offset))
throw new ArgumentOutOfRangeException(nameof(count));
Thanks @danmosemsft and @saucecontrol, The PR is up here.
@AndyAyersMS is there already an issue for the JIT to do the "uint cast" trick above for free? It seems like the pattern would be fairly consistent (and so hopefully easy to recognize) - like offset < 0 || offset >= src.Length
what about the 7.5k static runtime table size
This isn't a problem if it's needed, but it would be good to understand why the table is so large and whether it actually needs to be that large.
dotnet/coreclr#16363
To step back here for a minute:
There have been a lot of issues discussed above. I think these fall into the following categories:
(1) Correctness. This is the most important. I have a PR out for one correctness issue here: dotnet/corefx#32043. This PR includes a bunch of unit tests as well. Any additional changes will need to merge with this PR and pass the new tests.
Beyond that, there's at least one other correctness issue I'm aware of -- reading past count. This needs to get fixed as well. My suggestion here is to change Decode so it takes a ReadOnlySpan<byte>, which effectively makes it impossible to read beyond the end of the span.
Are there other correctness issues that were identified that I missed?
(2) Incremental perf improvements. There was some discussion of small changes that make the existing code better and would apply regardless of the actual decoding algorithm. I'm not sure what happened with these? If there are any changes that make sense here, we should make them before making big algorithmic changes. This ensures we have a reasonable baseline for comparing the different algorithmic approaches.
(3) Algorithmic changes. I think we've got some good data about the different approaches, but I don't know quite where all of this ended up. We need to understand the different choices here and the tradeoffs involved (e.g. added memory usage, table construction costs, etc).
Also, it would be good to get the benchmarking code added to the repo. This will be useful in the future to validate we are not regressing and to evaluate any future changes here.
Additionally:
The tests I added in the PR linked above are reasonably comprehensive, but they are missing some invalid cases. In particular, we need to generate all cases of a truncated encoding and ensure that the decoder correctly rejects these cases.
(2) - I optimized the original code, here's the results (0.84). Do you want me to send a separate PR for that?
Method | Mean | Error | StdDev | Scaled | ScaledSD | Allocated |
-------- |---------:|----------:|----------:|-------:|---------:|----------:|
CodeOpt | 736.1 ns | 10.946 ns | 9.141 ns | 0.84 | 0.01 | 0 B |
Orig | 876.1 ns | 3.694 ns | 3.275 ns | 1.00 | 0.00 | 0 B |
I agree that correctness can be fixed aside from any algorithmic change. Unit tests from dotnet/corefx#32043 have been incorporated and are passing. I'll be updating the PR shortly. There was one correction to the code that the tests turned up for invalid inputs.
For perf, it looks like @grant-d has managed to get the original code to scale at 0.84 as per above. The last test with the array method is at 0.36 as per here.
I'm not sure how we attempt to answer whether the trade-off of memory for speed is worth it in this scenario.
Before we take any PR that changes the algorithm here, we need to do a few things:
(1) Ensure we have comprehensive tests. My PR has a bunch of tests, and I think they are reasonably comprehensive for valid cases. But we need more comprehensive testing of invalid cases. This includes (a) incomplete final bit sequences and (b) EOS appearing anywhere in the encoding. I think that covers all the possible invalid sequences.
(2) Submit a PR for the benchmark code. We should review the code and ensure it's measuring useful things, and we should get it in the repo so that we can use it to validate future changes here.
I can work on that. Hopefully have something before Monday.
One other thought on the algorithmic approach here:
The "jump" algorithm that @jcdickinson posted above could be modified to process more than a single bit at a time. In particular, 4 bits seems like a reasonable number to process in a single operation. This would require 16 different transitions per starting state (versus 2 for a single bit), and since there are 256 starting states, this is probably reasonable in terms of table size.
Since the minimum encoding size is 5 bits, there can be at most one decoded value emitted for every 4 bits walked, so for each transition we need to know (a) what the final tree state is after walking the 4 bits, and (b) whether a symbol was emitted or not.
Maybe @jcdickinson and/or @grant-d can work on further optimization of the "jump" method.
I submitted PR dotnet/corefx#32190, which adds a test for emitting EOS in the middle of an otherwise valid sequence.
As for dotnet/corefx#32124, I feel like we should just close this one out as it's not clear at the moment which direction this is going to go and the open PR is probably just tying up other resources.
@jbhensley @geoffkizer I'm happy to attempt improvements on "jump" (really a DFA) to work over nibbles or bytes if there is interest in optimizing this further.
@jcdickinson I think that would be interesting to try, yeah. I would guess that nibbles are the sweet spot; doing bytes seems like it would result in a really large table.
Get Outlook for iOShttps://aka.ms/o0ukef
From: Geoff Kizer notifications@github.com
Sent: Monday, September 10, 2018 11:52 pm
To: dotnet/corefx
Cc: Subscribed
Subject: Re: [dotnet/corefx] HTTP2: Huffman decoding implementation is inefficient (#31751)
@jcdickinsonhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjcdickinson&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=wA86TdkooSDhY2a8MZ8INFZZdDB53AA%2F08TlSB2qqfM%3D&reserved=0 I think that would be interesting to try, yeah. I would guess that nibbles are the sweet spot; doing bytes seems like it would result in a really large table.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fcorefx%2Fissues%2F31751%23issuecomment-420087149&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=oy%2FqDfvWbCOfzWULDyF8C4veMueSgGgVqPc3Vf6hoRo%3D&reserved=0, or mute the threadhttps://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAdM6-KTNq21ZY6d3iuEFJTJBF0yBYijUks5uZu0igaJpZM4V7jpR&data=02%7C01%7C%7Cb45b18da0763435fae1f08d617701047%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636722167400812366&sdata=f0SNsMoz968YdbD2vg7j0ztao9UGYrE%2Bi%2F2vdlhXtYo%3D&reserved=0.
name,address,icon,phone,website,userComment,lat,lng,tags,gid,4id
"Keely Rafferty Makeup","Portishead House, 29 Nore Rd, Portishead, Bristol BS20 8NN, UK","generic","+44 7734 241255","http://www.keelyrafferty.com/","","51.485311","-2.767996","Https://goo.gl/photos/pMYTnCir","ChIJp1MULPzxcUgRp2Nahs9vnz0"
"Location D'Outils Gps Inc","1282 Av de la Gare, Mascouche, QC, J7K, Canada","generic","+1 (450) 417-1477","","","45.740653","-73.589199","Contacts",""
"Dan Freeman","26 Cobley Croft, Clevedon, United Kingdom","generic","+447538552769","","","51.482252","-2.787124","Contacts",""
"Location D'Outils Gps Inc","1282 Av de la Gare, Mascouche, QC, J7K, Canada","generic","+1 (450) 417-1477","","","45.740653","-73.589199","Contacts",""
"United Kingdom","United Kingdom","civic","","","","52.80110090809378","-1.636962890625","Likes","ChIJM8zfqhnwdUgRMXaz0VNl3ow"
"Tyntesfield House (NT)","Clevedon Rd (near Wraxall Hill), Bristol, BS48 1NT, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","Likes","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"United Kingdom","United Kingdom","civic","","","","52.80110090809378","-1.636962890625","Checkins","ChIJM8zfqhnwdUgRMXaz0VNl3ow"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Vue Cinema","Patchway, Bristol BS10 7SR, UK","movies","+44 871 224 0240","http://www.myvue.com","","51.52343949999999","-2.6040692","Checkins#Https://assets.contentstack.io","ChIJUYB5xIORcUgRBzedC4CSmiw"
"Burger King","The Venue, Merlin Rd, Patchway, Bristol BS10 7SR, UK","restaurant","+44 117 959 0712","http://www.burgerking.co.uk/","","51.52345959999999","-2.6010024","Checkins#Https://assets.contentstack.io","ChIJIV0CBsiRcUgR8wwBC_4vHVU"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.4373958","-2.865220799999999","Checkins#Https://assets.contentstack.io","ChIJlbwvRfXwcUgR2CkZghuzo30"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","Checkins#Https://assets.contentstack.io",""
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","Checkins#Https://assets.contentstack.io","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"The Albion","15 Bristol Rd. Portishead North Somerset BS20 6PZ","generic","","","","51.47631195397998","-2.7668539197926587","Checkins#Https://assets.contentstack.io",""
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","Checkins#Https://assets.contentstack.io","ChIJS215oIntcUgRPZbO7CuAaec"
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Mark's Barbershop","3A West Hill Bristol England BS20 6PG United Kingdom","generic","","","","51.4816660944565","-2.782440790723976","Checkins#Https://assets.contentstack.io",""
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","Likes",""
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.4373958","-2.865220799999999","Likes","ChIJlbwvRfXwcUgR2CkZghuzo30"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","Likes","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","Likes","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"Clevedon Seafront","The Beach Clevedon North Somerset","generic","","","","51.436309265421464","-2.865371091919184","What",""
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","Checkins",""
"Sainsbury's","Serbert Rd, Portishead, Bristol BS20 7GD, UK","shopping","+44 1275 750400","https://stores.sainsburys.co.uk/2252/portishead","","51.4839614","-2.7613376","Checkins","ChIJrUeuNojtcUgRDg9ZtuAVw80"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","Checkins","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","What",""
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","36 Hours In...Bath",""
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Checkins","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","Checkins","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Checkins","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","Checkins","ChIJ8QRHo4CRcUgR5c96odklLJE"
"B&M Home Store","3 Centaurus Rd, Patchway, Bristol BS34 5TS, UK","shopping","+44 330 838 9530","https://www.bmstores.co.uk/stores/bristol-patchway-530","","51.5272919","-2.5973424","Checkins","ChIJDVQq7YGRcUgRLa0ly8_ABp0"
"McDonald's","Western Links, Weston-super-Mare BS23 3WL, United Kingdom","restaurant","+44 1934 625340","http://www.mcdonalds.co.uk/","","51.3390109","-2.956917499999999","Checkins","ChIJvYjF49D4cUgRoVqxOjGVwlQ"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","Likes","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","Likes","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","Likes","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Sainsbury's","Serbert Rd, Portishead, Bristol BS20 7GD, UK","shopping","+44 1275 750400","https://stores.sainsburys.co.uk/2252/portishead","","51.4839614","-2.7613376","","ChIJrUeuNojtcUgRDg9ZtuAVw80"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"National Trust - Tyntesfield","Wraxall, Tyntesfield BS48 1NX, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","Likes","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"Badminton Villa","10 Upper Oldfield Park Bath Bath and North East Somerset BA2 3JZ","generic","","","","51.375799700792406","-2.370470520619507","",""
"The Bath Priory","Weston Rd, Bath BA1 2XT, United Kingdom","lodging","+44 1225 331922","http://www.thebathpriory.co.uk/","","51.3900945","-2.382454000000001","36 Hours In...Bath","ChIJxSy0uEiBcUgRBWDUuWOJVsE"
"The Halcyon Hotel Apartments, Henry Street","8 Henry St, Bath BA1 1JT, UK","lodging","+44 1225 585100","http://www.thehalcyon.com/","","51.3799113","-2.3576342","36 Hours In...Bath","ChIJ05aKlhGBcUgRcn9OzuqwEFQ"
"Sub13 - new gin collection bar","4 2ee, Edgar Buildings, George St, Bath BA1 2DU, UK","bar","+44 1225 466667","http://www.sub13.net","","51.3849994","-2.3624013","36 Hours In...Bath","ChIJq6q2oRSBcUgRs6kN6clmP80"
"The Roman Baths","Stall St, Bath BA1 1LZ, United Kingdom","museum","+44 1225 477785","https://www.romanbaths.co.uk/","","51.381072","-2.359619","36 Hours In...Bath","ChIJtTDV3hOBcUgRTSLxFGi-Rg4"
"Colonna & Small's","6 Chapel Row, Bath BA1 1HN, United Kingdom","cafe","+44 7766 808067","http://www.colonnaandsmalls.co.uk/","","51.3827777","-2.3644163","36 Hours In...Bath","ChIJMUKdQBSBcUgRvz8IiqG_Fm8"
"The Bertinet Bakery","1 New Bond St Pl, Bath BA1 1BH, United Kingdom","restaurant","+44 1225 445531","http://www.bertinet.com/","","51.3829721","-2.360359700000001","36 Hours In...Bath","ChIJG_nQlxOBcUgRMCvWK8NtvwU"
"Bath Assembly Rooms","Bennett St, Bath, Avon BA1 2QH, United Kingdom","generic","+44 1225 477173","http://www.nationaltrust.org.uk/bath-assembly-rooms/","","51.386204","-2.3628429","36 Hours In...Bath","ChIJVyYrVWuBcUgRXqedsmf2zJc"
"The Royal Crescent Hotel","16 Royal Crescent, Bath BA1 2LS, United Kingdom","lodging","+44 1225 823333","https://www.royalcrescent.co.uk/?utm_source=google&utm_medium=local&utm_campaign=hotel","","51.387348","-2.368125999999999","36 Hours In...Bath","ChIJZUo6omqBcUgRBxusGzy3GoI"
"The Star Inn","The Street Hullavington Wiltshire SN14 6DU","generic","","","","51.536032213209815","-2.1543502807617183","36 Hours In...Bath",""
"Thermae Bath Spa","The Hetling Pump Room, Hot Bath St, Bath BA1 1SJ, United Kingdom","generic","+44 1225 331234","http://www.thermaebathspa.com/","","51.3803643","-2.3615092","36 Hours In...Bath","ChIJ7b7VUBGBcUgRI-5CyCMdPsM"
"Prior Park Landscape Garden","Ralph Allen Drive Bath Bath and North East Somerset BA2 5AH","generic","","","","51.36479996703179","-2.3475266493227194","36 Hours In...Bath",""
"Costcutter","Avon way North Somerset","generic","","","","51.481604736419456","-2.7814604011169495","What",""
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.43759530000001","-2.864774","","ChIJlbwvRfXwcUgR2CkZghuzo30"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Lidl","Harbour Rd, Portishead, Bristol BS20 7DE, UK","shopping","+44 800 977 7766","http://www.lidl.co.uk/","","51.4858439","-2.7633696","","ChIJEeX3EIntcUgR4EsSxlQvIw8"
"National Trust - Tyntesfield","Wraxall, Tyntesfield BS48 1NX, United Kingdom","generic","+44 344 800 4966","http://www.nationaltrust.org.uk/tyntesfield/","","51.44182619999999","-2.718278599999999","","ChIJYTkuzCDzcUgRsnkpeIdf8oI"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"The Posset Cup","3, Mustard Way, Portishead BS20 7QZ, UK","bar","+44 1275 848008","https://www.jdwetherspoon.com/pubs/all-pubs/england/somerset/the-posset-cup-portishead","","51.48571630000001","-2.7624962","","ChIJpehLHYntcUgRE-w5YuDMwmA"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Asda Patchway Supercentre","Highwood Ln, Patchway, Bristol BS34 5TL, UK","shopping","+44 117 317 2400","http://storelocator.asda.com/store/patchway-supercentre","","51.5306145","-2.597049600000001","","ChIJL3f-H56RcUgR6l06m6xXMzc"
"Toys R Us","Centaurus Rd, Patchway, Bristol BS34 5TU, UK","shopping","+44 117 959 1430","http://www.toysrus.co.uk/?utm_source=google&utm_medium=local&utm_campaign=gmb%20ToysRUsBristolCribbsCauseway3606","","51.528624","-2.596471","","ChIJkVnr2IGRcUgRxhA5pLK7lec"
"Smyths Toys Superstores","5, A, Cribbs Causeway Retail Park, Lysander Rd, Patchway, Bristol BS34 5TX, UK","shopping","+44 117 959 0055","http://www.smythstoys.com/","","51.52624339999999","-2.600264","","ChIJ8QRHo4CRcUgR5c96odklLJE"
"B&M Home Store","3 Centaurus Rd, Patchway, Bristol BS34 5TS, UK","shopping","+44 330 838 9530","https://www.bmstores.co.uk/stores/bristol-patchway-530","","51.5272919","-2.5973424","","ChIJDVQq7YGRcUgRLa0ly8_ABp0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Little Harp","Elton Rd, Clevedon BS21 7RH, UK","bar","+44 1275 343739","https://www.oldenglishinns.co.uk/our-locations/the-little-harp-clevedon","","51.43759530000001","-2.864774","","ChIJlbwvRfXwcUgR2CkZghuzo30"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48688259999999","-2.7653411","","ChIJS215oIntcUgRPZbO7CuAaec"
"The Bath Priory","Weston Rd, Bath BA1 2XT, United Kingdom","lodging","+44 1225 331922","http://www.thebathpriory.co.uk/","","51.3900945","-2.382454000000001","","ChIJxSy0uEiBcUgRBWDUuWOJVsE"
"The Halcyon Hotel Apartments, Henry Street","8 Henry St, Bath BA1 1JT, UK","lodging","+44 1225 585100","http://www.thehalcyon.com/","","51.3799113","-2.3576342","","ChIJ05aKlhGBcUgRcn9OzuqwEFQ"
"Sub13 - new gin collection bar","4 2ee, Edgar Buildings, George St, Bath BA1 2DU, UK","bar","+44 1225 466667","http://www.sub13.net","","51.3849994","-2.3624013","","ChIJq6q2oRSBcUgRs6kN6clmP80"
"Colonna & Small's","6 Chapel Row, Bath BA1 1HN, United Kingdom","cafe","+44 7766 808067","http://www.colonnaandsmalls.co.uk/","","51.3827777","-2.3644163","","ChIJMUKdQBSBcUgRvz8IiqG_Fm8"
"The Bertinet Bakery","1 New Bond St Pl, Bath BA1 1BH, United Kingdom","restaurant","+44 1225 445531","http://www.bertinet.com/","","51.3829721","-2.360359700000001","","ChIJG_nQlxOBcUgRMCvWK8NtvwU"
"The Royal Crescent Hotel","16 Royal Crescent, Bath BA1 2LS, United Kingdom","lodging","+44 1225 823333","https://www.royalcrescent.co.uk/?utm_source=google&utm_medium=local&utm_campaign=hotel","","51.387348","-2.368125999999999","","ChIJZUo6omqBcUgRBxusGzy3GoI"
"Thermae Bath Spa","The Hetling Pump Room, Hot Bath St, Bath BA1 1SJ, United Kingdom","generic","+44 1225 331234","http://www.thermaebathspa.com/","","51.3803643","-2.3615092","","ChIJ7b7VUBGBcUgRI-5CyCMdPsM"
"Kirsty Martin","Martcombe Road, Bristol, England, BS20, United Kingdom","generic","+44 7453 655952","","","51.474918","-2.70991","Contacts",""
"Kirsty Martin","Martcombe Road, Bristol, England, BS20, United Kingdom","generic","+44 7453 655952","","","51.474918","-2.70991","Contacts",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead Working Mens Club","Slade Road, Portishead, BS20 6BW, United Kingdom","generic","","","","51.484703248025","-2.7693352316805","Facebook",""
"Royal Victoria Park","Marlborough Ln, Bath BA1 2NQ, UK","generic","+44 1225 394041","http://visitbath.co.uk/things-to-do/royal-victoria-park-p25701","","51.3857067","-2.3708195","Facebook","ChIJNbSuIRWBcUgRDWNdo793S88"
"Swindon Oasis","Swindon, United Kingdom","generic","","","","51.48171056","-2.52573798","Facebook",""
"Oldbury Court Estate","Oldbury Ct Rd, Fishponds, Bristol BS16 2JH, UK","generic","+44 117 922 2000","https://www.bristol.gov.uk/museums-parks-sports-culture/oldbury-court-estate","","51.48785659999999","-2.529845399999999","Facebook","ChIJb9eUOiCQcUgRrq22DD_ICk8"
"Riverside Holiday Village","Bridgwater Rd, Bleadon, Weston-super-Mare BS24 0AN, UK","camping","+44 1934 812342","http://www.westcountryparks.co.uk/riverside-holiday-village/","","51.3069719","-2.9592683","Facebook","ChIJbRYKqW0CckgRixFozu7irpM"
"Blaise Castle","Bristol BS10 7QT, UK","generic","+44 117 903 9818","http://www.friendsofblaise.co.uk/","","51.5025659","-2.637383899999999","Facebook","ChIJfdUWIGaScUgRpZyQr13zOTk"
"McDonald's","4, Portishead Retail Park, Wyndham Way, Portishead, Bristol, Somerset BS20 7BY, United Kingdom","restaurant","+44 1275 844792","http://www.mcdonalds.co.uk/","","51.4841309","-2.765955","","ChIJHaSp8ontcUgRM0wmUgMp6l0"
"High Down Junior School","Down Rd, Portishead, Bristol BS20 6DY, UK","school","+44 1275 843969","http://www.highdownjunior.co.uk/","","51.4801769","-2.7872175","Facebook","ChIJHSDTGQfycUgR1FNO3N3KgWc"
"Parish Wharf Leisure Centre","Harbour Rd, Portishead, Bristol BS20 7DD, UK","generic","+44 1275 848494","http://www.placesforpeopleleisure.org/centres/parish-wharf-leisure-centre/","","51.48686459999999","-2.7653641","","ChIJS215oIntcUgRPZbO7CuAaec"
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"The Fleece","12 St Thomas St, Bristol BS1 6JJ, UK","generic","+44 117 945 0996","http://www.thefleece.co.uk/","","51.45230189999999","-2.589315399999999","Facebook","ChIJ_TD91XuOcUgRjMrlhE2_y5o"
"The Bristol Hot Air Baloon Fiesta","Bristol, BS41 9, United Kingdom","generic","","","","51.441254277308","-2.6433306349142","Facebook",""
"Weston-super-Mare","Weston-super-Mare, United Kingdom","generic","","","","51.3458","-2.96778","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Brean Leisure Park","Coast Rd, Brean, Burnham-on-Sea TA8 2QY, UK","generic","+44 1278 751595","http://www.brean.com/","","51.2851298","-3.009675","Facebook","ChIJo-wata8BckgRnLpSPWqfLiY"
"Brean Leisure Park","Coast Rd, Brean, Burnham-on-Sea TA8 2QY, UK","generic","+44 1278 751595","http://www.brean.com/","","51.2851298","-3.009675","Facebook","ChIJo-wata8BckgRnLpSPWqfLiY"
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Portishead, North Somerset","Portishead, United Kingdom","generic","","","","51.4844","-2.77028","Facebook",""
"Redcliffe Bay Seafront","249 Down Road Bristol England BS20 8HY United Kingdom","generic","","","","51.476458095","-2.8004557133333","Facebook",""
"Weston Big Wood","25 Spring Rise Bristol England BS20 6RE United Kingdom","generic","","","","51.475398524608","-2.7761074471516","Facebook",""
"Welcome Family Holiday Park","Dawlish Warren, Dawlish, South Devon EX7 0PH, United Kingdom","camping","+44 345 165 6265","https://www.welcomefamily.co.uk/","","50.6001833","-3.4476097","Facebook","ChIJ73pY0UQKbUgRNpirpCc6-tQ"
"Welcome Family Holiday Park","Dawlish Warren, Dawlish, South Devon EX7 0PH, United Kingdom","camping","+44 345 165 6265","https://www.welcomefamily.co.uk/","","50.6001833","-3.4476097","Facebook","ChIJ73pY0UQKbUgRNpirpCc6-tQ"
"Love Hate ","9 Brendon Road Bristol England BS20 6DJ United Kingdom","home","","","","51.482314952313","-2.78711266877407","Https://goo.gl/photos/pMYTnCir#What#Https://assets.contentstack.io#Youporn#Likes#Meee#Porn hub#Yup#Restaurant#🤑🚘#07538 552769#36 Hours In...Bath#07 38 552769#Bus stop#Checkins#Friend#Freeman#Contacts#36 Hours In...Bath#C3#Facebook#Busstopd#Hotel#Dmap",""
"Lower Down","Portishead, Bristol BS20 6EB, UK","bowling","","","","51.48093799999999","-2.784538","Dan","ChIJ_xac6AfycUgRKmgffCqQHME"
"358999/07/718825/8","thanks","generic","07358552769","","","undefined","undefined","🤑🚘",""
"See","12-23 Gaunts Cl, Portishead, Bristol BS20 8BL, UK","casino","07502220198",".com","Gay","51.4779400858475","-2.79685295939177","Paije",""
I doubt that makes sense. The HPACK Huffman encoding is a static encoding that's specific to HPACK.
I really enjoyed the conversation and co-operation between you devs to optimize those little things. Just here to say Thanks. 😶
Thank you Open source. Thank you Microsoft
@mafshin any interest in ongoing involvement? We have plenty of new API, fixes, and perf work up for grabs. We would be glad to have you.
(Note for those marked both up-for-grabs and api-needs-work the task is to make an API proposal for review. If api-approved the task is to implement the API.)
Triage: The idea is described above: https://github.com/dotnet/corefx/issues/31751#issuecomment-419618551
Hi @geoffkizer, I was hooked up by this and gave it a try. I have build decoding table and run benchmark for several lookup bits size tables. I have managed to bring tables (static, pregenerated in source code, one dimensional array of ushort) to bellow sizes:
|bytes|lookup bits|
|-|-|
|1008|2|
|1472|3|
|1728|4|
|3904|5|
|5248|6|
|7936|7|
|7680|8|
I have also tried to optimize current algorithm, as I quite like the elegance and simplicity of it.
I have created micro benchmark by BenchmarkDotNet with following results:
| Method | Mean | Error | StdDev | Op/s | Ratio |
|----------------- |------------:|----------:|----------:|-------------:|------:|
| Current | 1,822.29 ns | 13.807 ns | 12.240 ns | 548,760.2 | 1.00 |
| CurrentOptimized | 1,050.57 ns | 5.903 ns | 5.522 ns | 951,867.1 | 0.58 |
| LookupBits2 | 1,674.67 ns | 13.353 ns | 12.491 ns | 597,132.4 | 0.92 |
| LookupBits3 | 1,213.27 ns | 19.844 ns | 18.562 ns | 824,219.2 | 0.66 |
| LookupBits4 | 1,126.86 ns | 15.168 ns | 13.446 ns | 887,419.7 | 0.62 |
| LookupBits5 | 922.54 ns | 8.767 ns | 8.201 ns | 1,083,960.3 | 0.51 |
| LookupBits6 | 762.84 ns | 7.241 ns | 6.419 ns | 1,310,899.4 | 0.42 |
| LookupBits7 | 642.59 ns | 6.567 ns | 5.821 ns | 1,556,201.7 | 0.35 |
| LookupBits8 | 641.35 ns | 4.952 ns | 4.632 ns | 1,559,205.3 | 0.35 |
It runs and decode following strings which I hoped might be close to common usage:
Benchmark shows, that there is compromise between size of static decoded table and decoding speed.
It also shows that current optimized algorithm can compete with lookup tables.
I do not feel I have enough context to make decision of what way to proceed as I am not aware of how costly is to have additional 8KB static table.
From my point of view I would go with CurrentOptimized for followign reasons:
Would you recommend to proceed with PR for CurrentOptimized?
This commit has the draft version of proposed changes. Todo is benchmark and one time 'monte carlo' tests comparing existing and new implementation results.
@karelz Can you please assign me to this issue?
It would be good to see perf results that are not in the ~1us range; this is very short and the results are not necessarily representative in general. I'd suggest adding more input to your test.
Assuming that the test is representative, we would want to have this perf test in the tree in order to measure future changes/improvements.
I can't comment on any of the "lookup bits" variations without seeing the code.
If you have an improvement on the existing approach, then we would certainly consider taking that. It would be good to understand which changes are responsible for the improvements, and to what extent.
I was using BenchmarkDotNet for those benchmarks. This tool compute number of invocation of benchmark methods to reach sufficient precision. Default behavior, which I used, is described here:
--iterationTime - desired time of execution of an iteration. Used by Pilot stage to estimate the number of invocations per iteration. 500ms by default.
I tried to run body of benchmark method 1e6 times with statistically identical results.
I plan to create and sumbit PR for bechmark into https://github.com/dotnet/aspnetcore/blob/master/src/Servers/Kestrel/perf/Kestrel.Performance/HPackDecoderBenchmark.cs as this seems to already have another perf benchmark related to HPack.
Bellow are links on 'draft' commits where you can see implementation of different algorithms.
Current optimized
https://github.com/rokonec/runtime/commit/2c350184c2ace484f2350a8428d3d64426d5a3b2
Main speed improvements lies by replacing lines
https://github.com/rokonec/runtime/blob/50ac454d8d8a1915188b2a4bb3fff3b81bf6c0cf/src/libraries/Common/tests/System/Net/Http/HuffmanDecoder.cs#L52-L56 by simple load and shift.
As oppose to shift and rotate 5 bytes for every huffman code I preload data into UInt64 variable and use simple shift to extract value based on count of bits left in that buffer.
C#
// we need exactly 32 bits of code, if there is less than 32 bits left in source, we have to fill missing bits by zeroes from LSB up
uint data = (uint)(bitsInAcc > 32 ? acc >> (bitsInAcc - 32) : acc << (32 - bitsInAcc));
Decoding by lookup table
https://github.com/rokonec/runtime/commit/52229bbc3f948b6ae384e313e77c2120ce67ec69
Static lookup table is omitted for clarity.
We are generally not shy about going the extra mile for perf. I don't think your LUT code is concerning.
I am personally fine with an 8K LUT, but we've recently put in a lot of work to reducing assembly size and I would understand pushback. @stephentoub
any thoughts here?
If it is a problem, then can the LUT be generated quickly at runtime? If yes, then I say lets go with the LookupBits8 version.
It would be useful to know how frequently Huffman-coded headers are used and how much they impact perf of a request. @JamesNK do you have any data here for gRPC?
All the gRPC spec headers are small so I doubt anyone is huffman encoding them. And they're common between requests so gRPC headers will be compressed with HPack.
LUT could be generated quickly at runtime with about O(n log n) where n = 257 and body just about 10 fast lines.
It should be possible to do it with just one allocation - the destination LUT ushort[].
LUT could be generated quickly at runtime with about O(n log n) where n = 257 and body just about 10 fast lines.
It should be possible to do it with just one allocation - the destination LUT ushort[].
Lets just do this then.
Related: Any plans for huffman encoding? With an optimized implementation then there is probably a header size threshold where huffman encoding large headers makes sense.
I have created pull request https://github.com/dotnet/runtime/pull/43603 about 2 hours ago with 8 bit LUT.
@JamesNK I would like to benchmarks those large headers, can I have some please?
Related: Any plans for huffman encoding? With an optimized implementation then there is probably a header size threshold where huffman encoding large headers makes sense.
No immediate plans, but it's something to consider.
I thought we had an issue for this, but apparently not...
I thought we had an issue for this, but apparently not...
It was closed; iirc we weren't happy with the CPU vs bandwidth tradeoff.
Most helpful comment
Lets just do this then.