Go: encoding/json: performance slower than expected

Created on 11 Jun 2013  路  36Comments  路  Source: golang/go

by reid.write:

STR:
1. clone the git repository here: [email protected]:tarasglek/jsonbench.git
2. Generate some sample JSON data using the instructions in the README
3. Run the go json benchmark which is in gojson/src/jsonbench/json.go

What is the expected output?
I expected to see performance roughly in line with Java (using Jackson for json
parsing). On my test machine, the Java benchmark results in the following:
Processing 436525928 bytes took 5491 ms (75.82 MB/s)

What do you see instead?
Significantly slower performance, using the same input file:
Duration: 27.497043818s, 15.14 MB/s

Which compiler are you using (5g, 6g, 8g, gccgo)?
Not sure

Which operating system are you using?
Linux dell-ubuntu 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux

Which version are you using?  (run 'go version')
go version go1.1 linux/amd64
FrozenDueToAge Performance

Most helpful comment

The primary benefit I see from having a "faster" library is that it will reduce the number of third-party JSON parsers.

These parsers prioritize speed over simplicity, safety, compatibility, and security. It would be nice not to encounter the same 10-25 packages with the same micro-benchmark leaflets every so often. Going through the task of deleting them from a codebase requires a lot of patience and is often time consuming even with special-purpose tooling for the task.

All 36 comments

Comment 1:

If you want us to investigate further, please attach a data file and a program.
I tried looking in your github repo but I couldn't figure out how to generate a data
file (enable telemetry means nothing to me) and it's probably good for us to have the
same data file anyway. 
I did look at the sources. Your Java program is just tokenizing the JSON, not building a
data structure. Go is actually building a data structure for you. So you are comparing
apples and oranges.
A more equal comparison would be to have Go unmarshal into
var x struct{}
dec.Decode(&x)
which will parse the JSON but throw away all the data. 
In practice, you should be parsing into a real struct anyway. I think you'll find that
case is much nicer in Go than in Java, and probably competitive in speed.

_Status changed to WaitingForReply._

Comment 2 by reid.write:

Thanks for the feedback!
You're right, it's much faster with the unmarshaling approach you recommended:
Duration: 10.034308451s, 47.98 MB/s
I will update the Java test to be more apples to apples.  The Java version is basically
only doing JSON Validation, so I'll modify it to parse the actual document structure.
In the meantime, I've added a new make target `make sample-data` that will generate a
~500MB test file, as well as a few more make targets to simplify running the various
tests.
Again, Go's performance when actually parsing the objects is:
Duration: 35.149537439s, 14.27 MB/s
For reference, here is the performance for a few of the other languages/runtimes on my
machine:
Javascript (spidermonkey):
39.550669327341836MB/s 525822000bytes in 12.679s
Python (simplejson):
37 MB/s 525831000bytes in 13seconds
C++ (rapidjson):
163 MB/s 525831000 bytes in 3 seconds

Comment 3:

@reid - is there anything left to do on this ticket ?

Comment 4 by reid.write:

I've made changes to the Java benchmark to make for a more apples-to-apples comparison. 
Go is still significantly slower than most of the other versions, and it should be
fairly straightforward to actually run the tests now (make sample-data; make java; make
python; make go)
If someone is able to look into improving json parsing performance, that would be much
appreciated!

Comment 5:

If you don't publish your benchmark data and methodology (code), then your complaints
aren't actionable.
Anyone working on this needs to be making the same comparison. Saying "make it faster"
is not helpful.

Comment 6:

It's right there in the first line, isn't it?
https://github.com/tarasglek/jsonbench

Comment 7 by reid.write:

The motivation for filing this issue is that we are logging terabytes of Firefox
performance data in (compressed) JSON format and we'd really like to be able to process
& analyze this data set using a modern language like Go instead of Java / C++.
Here are the latest steps to reproduce:
1. clone (or update) the git repository here: git clone
[email protected]:tarasglek/jsonbench.git
2. Generate some sample JSON data: make sample-data
3. Run the go json benchmark: make go
Optionally, you can run some of the other language benchmarks using "make python", "make
java", etc.
Here are the results I get on my test machine:
make go: 14.36 MB/s in 34.932531708s
make python_simplejson: 39 MB/s 525831000bytes in 12seconds
make java: 48.82 MB/s 525822000 bytes in 10.27 s
Note: The # bytes is different for java because it's not counting end-of-line characters.

Comment 8:

You are measuring something that is likely not relevant to the final
program, namely unmarshaling into a generic data type
(map[string]interface{}).
Almost any Go program doing JSON for data structure processing unmarshals
into a struct. Even if the struct lists every field in the input, that form
will be more compact in memory and therefore faster to create.
Russ

Comment 9 by runner.mei:

I have the same feeling, I found the problem in the json/stream.go, as following:
   155  func (enc *Encoder) Encode(v interface{}) error {
   156      if enc.err != nil {
   157          return enc.err
   158      }
   159      enc.e.Reset()
   160      err := enc.e.marshal(v)
   161      if err != nil {
   162          return err
   163      }
   164  
   165      // Terminate each value with a newline.
   166      // This makes the output look a little nicer
   167      // when debugging, and some kind of space
   168      // is required if the encoded value was a number,
   169      // so that the reader knows there aren't more
   170      // digits coming.
   171      enc.e.WriteByte('\n')
   172  
---------------------------- here ----------------------------------
   173      if _, err = enc.w.Write(enc.e.Bytes()); err != nil {
   174          enc.err = err
   175      }
   176      return err
   177  }
In general the "w" is the buffer,  Write method had an unnecessary memory copy

Comment 10:

_Labels changed: added go1.3maybe._

Comment 11:

_Labels changed: added release-none, removed go1.3maybe._

Comment 12:

_Labels changed: added repo-main._

is there any work planned on this?

Nobody is working on it.

FWIW, I've resurrected the package in question, added structs so it's not just parsing into a struct{}, and added benchmarks around it that follow the format of the benchmarks in the standard library. Those are here: https://github.com/kevinburke/jsonbench/blob/master/go/bench_test.go

CL https://golang.org/cl/24466 mentions this issue.

I have started working on fixes on the benchmark itself.
@tarasglek I'm waiting for review and merge before submitting more benchmark fixes.

@dolmen @tarasglek, it looks like @kevinburke fixed the Go benchmark over at https://github.com/kevinburke/jsonbench/blob/master/go/bench_test.go . Would it be possible to merge that if that fix is good and close this issue so people don't think Go JSON parsing is really slow?

@kevinburke do you recall if your new bench_test.go was comparable to Java, Python, and Node versions in terms of speed?

JSON parsing has gotten faster, but I don't know its current performance vs. Node or Java or Python. In addition to the CL linked above and overall improvements to the speed of runtime, some other improvements have been merged to improve the performance in the fast path in some cases.

It looks like the benchmark at tarasglek/jsonbench now has a Go script, so you should be able to generate comparisons for your specific case.

I would also point out there are tools for generating custom serializers and deserializers which should allow you to greatly exceed the performance of the generic JSON library for any specific object.

I would also point out there are tools for generating custom serializers and deserializers which should allow you to greatly exceed the performance of the generic JSON library for any specific object.

As an example I added fastjson benchmark to the https://github.com/kevinburke/jsonbench/blob/master/go/bench_test.go and got the following results on go tip:

$ GOMAXPROCS=1 go test -bench=Unmarshal
goos: linux
goarch: amd64
pkg: github.com/kevinburke/jsonbench/go
BenchmarkUnmarshal               500       2715217 ns/op      35.23 MB/s      462379 B/op       7256 allocs/op
BenchmarkUnmarshalFastjson     10000        205919 ns/op     464.57 MB/s         132 B/op          0 allocs/op
BenchmarkUnmarshalReuse          500       2808449 ns/op      34.06 MB/s      435406 B/op       7220 allocs/op

The code for the added benchmark:

func BenchmarkUnmarshalFastjson(b *testing.B) {
        if codeJSON == nil {
                b.StopTimer()
                codeInit()
                b.StartTimer()
        }
        b.ReportAllocs()
        var p fastjson.Parser
        for i := 0; i < b.N; i++ {
                if _, err := p.ParseBytes(codeJSON); err != nil {
                        b.Fatal("Unmarshal:", err)
                }
        }
        b.SetBytes(int64(len(codeJSON)))
}

The primary benefit I see from having a "faster" library is that it will reduce the number of third-party JSON parsers.

These parsers prioritize speed over simplicity, safety, compatibility, and security. It would be nice not to encounter the same 10-25 packages with the same micro-benchmark leaflets every so often. Going through the task of deleting them from a codebase requires a lot of patience and is often time consuming even with special-purpose tooling for the task.

Change https://golang.org/cl/122459 mentions this issue: encoding/json: tweak the size of the safe tables

Change https://golang.org/cl/122460 mentions this issue: encoding/json: encode struct field names ahead of time

Change https://golang.org/cl/122462 mentions this issue: encoding/json: remove alloc when encoding short byte slices

Change https://golang.org/cl/122468 mentions this issue: encoding/json: various minor speed-ups

Change https://golang.org/cl/122467 mentions this issue: encoding/json: defer error context work until necessary

Change https://golang.org/cl/125418 mentions this issue: encoding/json: prepare struct field names as []byte

Change https://golang.org/cl/125416 mentions this issue: encoding/json: simplify the structEncoder type

Change https://golang.org/cl/125417 mentions this issue: encoding/json: inline fieldByIndex

Change https://golang.org/cl/131401 mentions this issue: encoding/json: remove a branch in the structEncoder loop

Change https://golang.org/cl/131400 mentions this issue: encoding/json: avoid some more pointer receivers

I think the encoder is reasonably optimized - there is no more low-hanging fruit that I can see. It should get considerable improvements once the compiler gets better at certain things like inlining and removing nil/bounds checks.

The decoder has plenty of room to improve, though. It seems to me like the single biggest win would be #16212. The encoder already prepares the reflect work upfront and caches it globally by type.

I wonder if there's a point to leaving such a broad issue open. Though I do think that issues to improve decoding performance, such as the one mentioned above, would be useful. Perhaps even one to track the overall decoding slowness. But this issue seems too broad to be useful at this point.

@mvdan, can you post a benchmark summary here of Go 1.11 vs Go tip after your various changes?

I'm fine closing this and opening more specific issues. Just cross-reference them to this.

Of course. I don't have 1.11 installed yet (I know), but barely any performance work happened in the package during the 1.11 cycle, so these 1.10 vs tip numbers should still be useful.

name           old time/op    new time/op    delta
CodeEncoder-4    7.43ms 卤 0%    5.35ms 卤 1%  -28.01%  (p=0.002 n=6+6)
CodeDecoder-4    30.8ms 卤 1%    27.3ms 卤 0%  -11.42%  (p=0.004 n=6+5)

name           old speed      new speed      delta
CodeEncoder-4   261MB/s 卤 0%   363MB/s 卤 1%  +38.91%  (p=0.002 n=6+6)
CodeDecoder-4  62.9MB/s 卤 1%  70.9MB/s 卤 1%  +12.71%  (p=0.002 n=6+6)

name           old alloc/op   new alloc/op   delta
CodeEncoder-4    91.9kB 卤 0%    62.1kB 卤 0%  -32.38%  (p=0.002 n=6+6)
CodeDecoder-4    2.74MB 卤 0%    2.74MB 卤 0%   -0.04%  (p=0.010 n=6+4)

name           old allocs/op  new allocs/op  delta
CodeEncoder-4      0.00           0.00          ~     (all equal)
CodeDecoder-4     90.3k 卤 0%     77.5k 卤 0%  -14.09%  (p=0.002 n=6+6)

Also note that lots of other changes happened to improve performance in earlier releases, such as @kevinburke's table lookups for the encoder. So the speedup should be much higher if we compared tip with an even older Go release.

Closing as discussed above - discussion will continue on other json performance issues, after being cross-referenced to this one.

Was this page helpful?
0 / 5 - 0 ratings