name old time/op new time/op delta
CodeEncoder-4 7.43ms ± 0% 5.35ms ± 1% -28.01% (p=0.002 n=6+6)
CodeDecoder-4 30.8ms ± 1% 27.3ms ± 0% -11.42% (p=0.004 n=6+5)
name old speed new speed delta
CodeEncoder-4 261MB/s ± 0% 363MB/s ± 1% +38.91% (p=0.002 n=6+6)
CodeDecoder-4 62.9MB/s ± 1% 70.9MB/s ± 1% +12.71% (p=0.002 n=6+6)
Notice, however, how decoding is about five times slower than encoding. While it makes sense that decoding is more work, a 5x difference seems to point that there are some bottlenecks.
Here are the encoding top 10 functions by cpu, as reported by go test -run=- -bench=CodeEncoder -benchtime=5s -cpuprofile=cpu.out:
flat flat% sum% cum cum%
3.93s 10.84% 10.84% 36.07s 99.50% encoding/json.structEncoder.encode
3.45s 9.52% 20.36% 3.92s 10.81% strconv.formatBits
3.39s 9.35% 29.71% 4.80s 13.24% strconv.(*extFloat).ShortestDecimal
2.50s 6.90% 36.61% 2.50s 6.90% runtime.memmove
2.41s 6.65% 43.26% 3.18s 8.77% encoding/json.(*encodeState).string
2.11s 5.82% 49.08% 2.11s 5.82% strconv.fmtF
1.78s 4.91% 53.99% 3.40s 9.38% bytes.(*Buffer).WriteString
1.36s 3.75% 57.74% 2s 5.52% bytes.(*Buffer).WriteByte
1.34s 3.70% 61.43% 1.34s 3.70% bytes.(*Buffer).tryGrowByReslice (inline)
1.32s 3.64% 65.08% 2.31s 6.37% bytes.(*Buffer).Write
It's been optimized to a point where bytes.Buffer, runtime.memmove and strconv all appear in the top 10, so it's pretty well optimized. I can't see any more low-hanging fruit in that top 10.
And here are the as reported by go test -run=- -bench=CodeDecoder -benchtime=5s -cpuprofile=cpu.out:
flat flat% sum% cum cum%
3190ms 10.74% 10.74% 3240ms 10.91% encoding/json.stateInString
3150ms 10.60% 21.34% 8260ms 27.80% encoding/json.(*Decoder).readValue
2990ms 10.06% 31.40% 7660ms 25.78% encoding/json.(*decodeState).scanWhile
1990ms 6.70% 38.10% 21120ms 71.09% encoding/json.(*decodeState).object
1730ms 5.82% 43.92% 1860ms 6.26% encoding/json.stateEndValue
1550ms 5.22% 49.14% 2110ms 7.10% encoding/json.state1
1350ms 4.54% 53.69% 1350ms 4.54% encoding/json.unquoteBytes
1020ms 3.43% 57.12% 1080ms 3.64% encoding/json.stateBeginValue
920ms 3.10% 60.22% 3120ms 10.50% encoding/json.indirect
720ms 2.42% 62.64% 740ms 2.49% encoding/json.stateBeginString
The big difference here is that all 10 functions are in the json package itself. In particular, the scanner's state* functions take up half of the list, including number one. And numbers two and three, readValue and scanWhile, are also part of scanning tokens.
There's another issue about speeding up the decoder, #16212. It's about doing reflect work ahread of time, like the encoder does. However, out of that top 10, only object and indirect involve any reflect work, so I doubt that a refactor for #16212 would bring substantial gains while the scanner takes vastly more time than reflection.
So I propose that we refactor or even redesign the scanner to make it faster. The main bottleneck at the moment is the step function, which must be called for every decoded byte. I can imagine that it's slow for a few main reasons:
step is a function value, which can be nil - one nil check per bytestate* functions are very small - one call per byteThe godoc for the step field reads:
// The step is a func to be called to execute the next transition.
// Also tried using an integer constant and a single func
// with a switch, but using the func directly was 10% faster
// on a 64-bit Mac Mini, and it's nicer to read.
I tried using a function switch, and even a function jump table, but neither gave noticeable speed-ups. In fact, they all made CodeDecoder 1-5% slower. I presume the manual jump table via a [...]func(*scanner, byte) int array is slow because we still can't get rid of the nil checks. I'd hope that #5496 would help here.
I have a few ideas in mind to try to make the work per byte a bit faster, but I'm limited by the "one unknown step call per byte" design, as I tried to explain above. I understand that the current design makes the JSON scanner simpler to reason about, but I wonder if there's anything we can do to make it faster while keeping it reasonably simple.
This issue is to track small speed-ups in the scanner, but also to discuss larger refactors to it. I'm milestoning for 1.13, since at least the small speed-ups can be merged for that release.
/cc @rsc @dsnet @bradfitz @josharian @kevinburke
Change https://golang.org/cl/150919 mentions this issue: encoding/json: use uint offsets in the decoder
Here's a potential idea. Quoted strings are very common in JSON, so we could make the scanner tokenise those with specialised code. Only a few states are involved in strings, and they're hot paths - for example, stateInString and stateBeginString both appear in the top 10 above. I think this would give a very noticeable speed-up, since we could avoid nil checks and inline much of the work.
Of course, the disadvantage is that we break with the "one step call == one byte advanced" rule. I think we need to break that rule, if we hope to make the scanner significantly faster.
I had a realisation yesterday - the decoder scans the input bytes twice, both in Decoder.Decode and in Unmarshal. This accentuates the scanner as a bottleneck, since it's run twice on the input bytes.
On the bright side, this means we can introduce cheap speed-ups without redesigning the scanner. The tokenization happens during the second scan, at which point there can't be any syntax errors. https://go-review.googlesource.com/c/go/+/151042 uses this for a shortcut, since we can iterate over a string literal's bytes in a much faster way. Forty lines of code gives a ~7% speed-up, which I think is worth it.
Change https://golang.org/cl/151042 mentions this issue: encoding/json: speed up string tokenization
The new decoder will need to be fuzzed.
Agreed. I don't think anyone has been fuzzing the encoding/json package lately. I can take a look at that during the 1.13 cycle, once I'm done with this batch of decoding optimizations.
Change https://golang.org/cl/151157 mentions this issue: encoding/json: avoid work when unquoting strings
Here are the final numbers after the three CLs above:
name old time/op new time/op delta
CodeDecoder-4 31.1ms ± 0% 26.3ms ± 0% -15.58% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-4 62.3MB/s ± 0% 73.8MB/s ± 0% +18.45% (p=0.002 n=6+6)
I'll leave it here during the freeze - looking forward to feedback and reviews once the 1.13 tree opens. So far, I haven't redesigned or restructured the scanner, I've simply introduced shortcuts in the decoder.
If these kinds of optimizations are welcome, I might look into more of them for 1.13 once this first batch is reviewed.
Change https://golang.org/cl/170939 mentions this issue: encoding/json: remove a bounds check in readValue
Change https://golang.org/cl/172918 mentions this issue: encoding/json: index names for the struct decoder
Change https://golang.org/cl/190659 mentions this issue: encoding/json: avoid work when unquoting strings, take 2
Most helpful comment
Here are the final numbers after the three CLs above:
I'll leave it here during the freeze - looking forward to feedback and reviews once the 1.13 tree opens. So far, I haven't redesigned or restructured the scanner, I've simply introduced shortcuts in the decoder.
If these kinds of optimizations are welcome, I might look into more of them for 1.13 once this first batch is reviewed.