Go: runtime: performance degradation for versions > 1.4.x

Created on 2 Feb 2016  Â·  19Comments  Â·  Source: golang/go

This issue is a follow-up to a thread that started on the mailing list here:
https://groups.google.com/forum/#!topic/golang-dev/sfwfQTu7eM4

We recently reverted back to Go 1.4.3 for the most recent release of InfluxDB after testing against 1.5.x and collecting feedback from both our own benchmarks and the experiences of InfluxDB users. There were significant performance hits with both 1.5.x and 1.6beta1, across a variety of dimensions. For example, with write performance, we saw a 33% reduction in overall throughput, coupled with a 26% increase in average response time. We expected that the GC improvements would lead to a reduction in response times, but that hasn't proven to be the case. It's far from exhaustive, but here are the findings from a medium-scale write test:

| version | points/sec | response time |
| --- | --- | --- |
| Go 1.4.3 | 398,747 | 33.97ms |
| Go 1.5.3 | 300,750 | 43.12ms |
| Go 1.6rc1 | 315,545 | 39.44ms |

As requested in the mailing list thread, we ran the test with GODEBUG=gctrace=1 and recorded the output alongside our log messages for context. It doesn't look like the GC-specific lines should be too hard to grep out if our other log lines are too noisy.

We also ran with cpuprofile and memprofile on, and collected the output. We've included results for these tests on all three versions of Go listed above, and provided the compiled binaries as well. You can find them in this Google Drive folder:
https://drive.google.com/open?id=0B3YrTka8Ai_McUNCak1YSW9ZM28

I hope this is enough information to assist in debugging the underlying issue. Definitely let me know if there are any other tests I can run to help. We're looking forward to getting to _better_ performance than Go 1.4.3!

FrozenDueToAge

Most helpful comment

I just wanted to revisit this issue and let everyone know that as of v0.13.0 (released today), InfluxDB is now being built with Go 1.6.2.

We did a lot of work over the last couple of months to reduce unnecessary allocations, but the GC performance in Go has also been steadily improving. Based on our own internal testing (influxdata/influxdb#6511) and user feedback (influxdata/influxdb#5217), we're consistently seeing better performance across the board when using Go 1.6.2. This is also without tuning the GOGC parameter, which further improves performance.

I'm going to close this issue out because I don't think it represents an actionable item for the Go team any longer. For anyone else who still sees better performance on Go 1.4.3, I'd recommend working to identify and reduce unnecessary allocations - you'll be happier for it!

Finally, we're seeing even better performance with tip, so we'll be looking forward to Go 1.7 later this year. Thanks to the entire Go team for all of the help and responsiveness along the way!

All 19 comments

gctrace reports that you have 8 HW threads (GOMAXPROCS=8) and the GC is
constraining the heap to half a Gigabyte. Allowing the GC to grow the heap
to fill available RAM may solve this problem. Perhaps adjusting GOGC to
GOGC=1600 and rerunning these numbers will help.

On Tue, Feb 2, 2016 at 12:25 AM, Todd Persen [email protected]
wrote:

This issue is a follow-up to a thread that started on the mailing list
here:
https://groups.google.com/forum/#!topic/golang-dev/sfwfQTu7eM4

We recently reverted back to Go 1.4.3 for the most recent release of
InfluxDB after testing against 1.5.x and collecting feedback from both our
own benchmarks and the experiences of InfluxDB users. There were
significant performance hits with both 1.5.x and 1.6beta1, across a variety
of dimensions. For example, with write performance, we saw a 33% reduction
in overall throughput, coupled with a 26% increase in average response
time. We expected that the GC improvements would lead to a reduction in
response times, but that hasn't proven to be the case. It's far from
exhaustive, but here are the findings from a medium-scale write test:
version points/sec response time
Go 1.4.3 398,747 33.97ms
Go 1.5.3 300,750 43.12ms
Go 1.6rc1 315,545 39.44ms

As requested in the mailing list thread, we ran the test with
GODEBUG=gctrace=1 and recorded the output alongside our log messages for
context. It doesn't look like the GC-specific lines should be too hard to
grep out if our other log lines are too noisy.

We also ran with cpuprofile and memprofile on, and collected the output.
We've included results for these tests on all three versions of Go listed
above, and provided the compiled binaries as well. You can find them in
this Google Drive folder:
https://drive.google.com/open?id=0B3YrTka8Ai_McUNCak1YSW9ZM28

I hope this is enough information to assist in debugging the underlying
issue. Definitely let me know if there are any other tests I can run to
help. We're looking forward to getting to _better_ performance than Go
1.4.3! [image: :boom:]

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189.

@RLH I'll try that now with 1.6rc1 and report back. Thanks!

@RLH So, some preliminary results. Interesting.

Using GOGC=1600, I got the following:

| version | points/sec | response time |
| --- | --- | --- |
| Go 1.4.3 | 496,271 | 24.67ms |
| Go 1.6rc1 | 452,760 | 17.08ms |

So, the response time for both came down, with a greater improvement on 1.6rc1. It's now in line with what I'd expect for the new GC, but we're still seeing better performance with 1.4.3 on overall throughput. Does the GOGC parameter behave the same way on 1.4.x as it does on later versions?

Yes, GOGC behaves the same.

On Tue, Feb 2, 2016 at 1:54 AM, Todd Persen [email protected]
wrote:

@RLH https://github.com/RLH So, some preliminary results. Interesting.

Using GOGC=1600, I got the following:
version points/sec response time
Go 1.4.3 496,271 24.67ms
Go 1.6rc1 452,760 17.08ms

So, the response time for both came down, with a greater improvement on
1.6rc1. It's now in line with what I'd expect for the new GC, but we're
still seeing better performance with 1.4.3 on overall throughput. Does the
GOGC parameter behave the same way on 1.4.x as it does on later versions?

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189#issuecomment-178410866.

Does GODEBUG=gcstoptheworld=1 helps?

The unrelated discussion concerning tsml.Cache needs to be moved to its own
issue.

On Tue, Feb 2, 2016 at 1:20 PM, INADA Naoki [email protected]
wrote:

I'm sorry about slightly offtopic.

It seems tsm1.Cache is GC unfriendly. There are so many pointers in the
cache.
Each value type has time time.Time. time.Time contains pointer *Location.
So changing time time.Time to unixnano int64 may reduce GC cost.

I've wrote quick benchmark:

func BenchmarkCacheFloatEntries(b *testing.B) {
for i := 0; i < b.N; i++ {
cache := NewCache(10000)
for j := 0; j < 10000; j++ {
v := NewValue(time.Unix(1, 0), float64(j))
cache.Write("test", []Value{v})
}
}
}

before:
$ go test -run=xxx -bench=BenchmarkCache
PASS
BenchmarkCacheFloatEntries-4 300 4019495 ns/op

after:
$ go test -run=xxx -bench=BenchmarkCache
PASS
BenchmarkCacheFloatEntries-4 500 3720449 ns/op

patch: https://gist.github.com/methane/59c8777907b10dc983a1

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189#issuecomment-178733763.

It looks our GC rate in 1.6 is 1.5X what it was in 1.4. Based on your logs, go-1.4.3.txt does 762 GCs in 8.73 minutes = 87 GCs/minute and go-1.6.txt does 856 GCs in 6.35 minutes = 135 GCs/minute. I haven't analyzed the heap sizes yet, but that suggests we're keeping the heap somewhat smaller than 1.4 did, which is simply forcing the GC to do more work. It would also be good to compare the scan rates (bytes/sec). Comparing the heap size CDFs and the scan rates should give us a better sense of whether this is a raw GC throughput problem or a heap trigger problem.

We have a similar issue that is preventing us from moving to Go 1.5 on our production services. I have some godebug gc traces comparing go 1.4.3 and 1.5.2 under various values of GOGC. Would these be of interest for this issue?

As of today our focus is 1.6 and we won't be changing 1.5. Do you have 1.6
numbers we can compare to your 1.4 and 1.5 numbers. Such numbers would be
valuable.

On Thursday, February 18, 2016, Ian Davis [email protected] wrote:

We have a similar issue that is preventing us from moving to Go 1.5 on our
production services. I have some godebug gc traces comparing go 1.4.3 and
1.5.2 under various values of GOGC. Would these be of interest for this
issue?

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189#issuecomment-185660090.

I will work on getting the equivalent 1.6 numbers

Here are details of a similar degradation we are experiencing that is keeping us on Go 1.4.3. For context the service I took these measurements from is one instance in a cluster of real time advertising bidders. It receives bid requests via HTTP and performs two key operations that I timed. The first "augment" makes a single HTTP call to a couchbase cluster (using https://github.com/couchbase/go-couchbase) which returns about 5kB of protobuf encoded data. The second "evaluate" is a pure go section of the code that makes no network calls and is heavily optimised to avoid garbage generation. Responses from our service are expected to complete within 50ms.

I ran each test for 10 minutes with gctrace=1 and various values of GOGC. Our instrumentation takes samples every 10 seconds. The service is running on an Amazon c4.xlarge instance with 4 cpus and 7.5GB RAM. At the time of the Go 1.5 and 1.4.3 tests the service was receiving about 1100 requests per second. When I ran the 1.6 tests the rate was about 1400 per second. We rate this service on this instance type at about 2000 per second max (under Go 1.4.3)

 time  go version  GOGC    
---------------------------
 9:42       1.5.2   100         
 9:52       1.5.2   200
10.02       1.5.2   400
10:12       1.5.2   800
10:22       1.4.3  default
13:56       1.6     100         
14:06       1.6     200         
14:16       1.6     400         
14:26       1.6     800       

Charts of timings of 99.9th (orange) and maximum values (blue) for augment operation. (Go 1.4.3 is the last section of the top graph, starting at 10:22)

augment-go1 5
augment-go1 6

time     99.9% augment  max augment  
------------------------------------
 9:42       17.8ms         216.3ms
 9:52       12.0ms         204.9ms
10:02        7.9ms         225.2ms
10:12        6.3ms         134.0ms
10:22        5.6ms         236.0ms
13:56       35.8ms         237.2ms
14:06        8.0ms         226.9ms
14:16        5.7ms         240.6ms
14:26        6.5ms         472.6ms

And same for evaluate operation:

evaluate-go1 5
evaluate-go1 6

time     99.9% evaluate  max evaluate  
------------------------------------
 9:42       1.5ms         27.5ms
 9:52       1.2ms          8.8ms
10:02       1.1ms         15.7ms
10:12       1.1ms         13.5ms
10:22       1.2ms        141.3ms
13:56       1.2ms        256.0ms
14:06       1.1ms        221.1ms
14:16       1.2ms        234.0ms
14:26       1.2ms        233.3ms

Overall I would characterise the results as Go 1.5 and 1.6 result in slower timings than 1.4.3 but are more consistent. The 99.9 centiles were larger but there were fewer instances where the maximum time exceeded our deadlines.

Attached are the gctrace outputs from each of the tests.

go1.4-gogcdefault.txt
go1.5-gogc100.txt
go1.5-gogc200.txt
go1.5-gogc400.txt
go1.5-gogc800.txt
go1.6-gogc100.txt
go1.6-gogc200.txt
go1.6-gogc400.txt
go1.6-gogc800.txt

Thanks Ian,

Looking at the raw 1.6 gctrace numbers in go1.6-gogc100, go1.6-gogc200, and
go1.6-gogc400 the GC took up to 3% of the CPU depending on the heap size
and the largest stop the world pauses was less than 3 ms. In fact of the
300 or so GC cycles over the 3 10 minute runs only 1 showed a stop the
world pause greater than 2 ms. and only a handful were greater than 1 ms.

The 10 minute go1.6-gogc800 had an interesting single anomaly. Only 21 GC
cycles happened over the entire run and all the cycles, except one,
completed in less than 300ms including the concurrent phases. While the
total CPU used by the GC was much less than 1% there was a single instance
(gc 13 at 208 seconds into the run) of a 241 ms stop the world pause. I
can't account for this single very large pause time.

I'm concluding that the latency problems are not related to the GC.

The only vaguely similar numbers that I have seen were when the OS /
service provider noticed that the application wasn't fully utilizing the
allocated CPU resources and took them away after an asynchronous
preemption.

It looks like GOMAXPROCS is set to 4, was anything else running on the
machine that might have reduced the number of HW threads available in some
non-deterministic fashion? Of particular interest would be such an
interrupt that might last 200 ms?

  • Rick

On Thu, Feb 18, 2016 at 10:35 AM, Ian Davis [email protected]
wrote:

Here are details of a similar degradation we are experiencing that is
keeping us on Go 1.4.3. For context the service I took these measurements
from is one instance in a cluster of real time advertising bidders. It
receives bid requests via HTTP and performs two key operations that I
timed. The first "augment" makes a single HTTP call to a couchbase cluster
(using https://github.com/couchbase/go-couchbase) which returns about 5kB
of protobuf encoded data. The second "evaluate" is a pure go section of the
code that makes no network calls and is heavily optimised to avoid garbage
generation. Responses from our service are expected to complete within 50ms.

I ran each test for 10 minutes with gctrace=1 and various values of GOGC.
Our instrumentation takes samples every 10 seconds. The service is running
on an Amazon c4.xlarge instance with 4 cpus and 7.5GB RAM. At the time of
the Go 1.5 and 1.4.3 tests the service was receiving about 1100 requests
per second. When I ran the 1.6 tests the rate was about 1400 per second. We
rate this service on this instance type at about 2000 per second max (under
Go 1.4.3)

time go version GOGC

9:42 1.5.2 100
9:52 1.5.2 200
10.02 1.5.2 400
10:12 1.5.2 800
10:22 1.4.3 default
13:56 1.6 100
14:06 1.6 200
14:16 1.6 400
14:26 1.6 800

Charts of timings of 99.9th (orange) and maximum values (blue) for augment
operation.

[image: augment-go1 5]
https://cloud.githubusercontent.com/assets/18375/13148113/cebd0972-d653-11e5-8d08-2f97091d7483.png
[image: augment-go1 6]
https://cloud.githubusercontent.com/assets/18375/13148120/d3e3457e-d653-11e5-9002-5d0cd18f5959.png

time 99.9% augment max augment

9:42 17.8ms 216.3ms
9:52 12.0ms 204.9ms
10:02 7.9ms 225.2ms
10:12 6.3ms 134.0ms
10:22 5.6ms 236.0ms
13:56 35.8ms 237.2ms
14:06 8.0ms 226.9ms
14:16 5.7ms 240.6ms
14:26 6.5ms 472.6ms

And same for evaluate operation:

[image: evaluate-go1 5]
https://cloud.githubusercontent.com/assets/18375/13148157/f85e0182-d653-11e5-91f0-b393e9976b3b.png
[image: evaluate-go1 6]
https://cloud.githubusercontent.com/assets/18375/13148162/fd38aab8-d653-11e5-969f-af39f9dd1b28.png

time 99.9% evaluate max evaluate

9:42 1.5ms 27.5ms
9:52 1.2ms 8.8ms
10:02 1.1ms 15.7ms
10:12 1.1ms 13.5ms
10:22 1.2ms 141.3ms
13:56 1.2ms 256.0ms
14:06 1.1ms 221.1ms
14:16 1.2ms 234.0ms
14:26 1.2ms 233.3ms

Overall I would characterise the results as Go 1.5 and 1.6 result in
slower timings than 1.4.3 but are more consistent. The 99.9 centiles were
larger but there were fewer instances where the maximum time exceeded our
deadlines.

Attached are the gctrace outputs from each of the tests.

go1.4-gogcdefault.txt
https://github.com/golang/go/files/136435/go1.4-gogcdefault.txt
go1.5-gogc100.txt
https://github.com/golang/go/files/136432/go1.5-gogc100.txt
go1.5-gogc200.txt
https://github.com/golang/go/files/136434/go1.5-gogc200.txt
go1.5-gogc400.txt
https://github.com/golang/go/files/136433/go1.5-gogc400.txt
go1.5-gogc800.txt
https://github.com/golang/go/files/136436/go1.5-gogc800.txt
go1.6-gogc100.txt
https://github.com/golang/go/files/136437/go1.6-gogc100.txt
go1.6-gogc200.txt
https://github.com/golang/go/files/136438/go1.6-gogc200.txt
go1.6-gogc400.txt
https://github.com/golang/go/files/136439/go1.6-gogc400.txt
go1.6-gogc800.txt
https://github.com/golang/go/files/136440/go1.6-gogc800.txt

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189#issuecomment-185778139.

There are other support processes on that instance (e.g. consul agent) but they have very low cpu utilization and memory usage. GOMAXPROCS was 4 for that service (in 1.5 builds we don't set it at all so it's left as the Go default)

I saw the interrupt/spike in the gc times instrumentation we also capture. I don't know what that is for sure but it could have been a refresh of an in-memory data cache that we use as part of our bid prediction logic. I can re-run the test with those settings if you think it is worth it.

With the analysis you gave I agree that it's probably not the time spent in GC that is causing the latency. However there is lower latency when there are fewer GC cycles. Do you think it could be thread contention? Would repeating with different GOMAXPROCS give any insight?

If a CPU is being taken away, setting GOMAXPROCS to 3 will give the process
a free CPU for the other non-Go stuff. It is worth a try but it doesn't
explain 1.4 not having this problem.

On Thu, Feb 18, 2016 at 2:55 PM, Ian Davis [email protected] wrote:

There are other support processes on that instance (e.g. consul agent) but
they have very low cpu utilization and memory usage. GOMAXPROCS was 4 for
that service (in 1.5 builds we don't set it at all so it's left as the Go
default)

I saw the interrupt/spike in the gc times instrumentation we also capture.
I don't know what that is for sure but it could have been a refresh of an
in-memory data cache that we use as part of our bid prediction logic. I can
re-run the test with those settings if you think it is worth it.

With the analysis you gave I agree that it's probably not the time spent
in GC that is causing the latency. However there is lower latency when
there are fewer GC cycles. Do you think it could be thread contention?
Would repeating with different GOMAXPROCS give any insight?

—
Reply to this email directly or view it on GitHub
https://github.com/golang/go/issues/14189#issuecomment-185889673.

@iand, CPU profiles may be enlightening. It's not exactly the right tool, but if this is being caused by something in the runtime or the standard libraries, it's possible it would show up as out of place in a profile.

I retested go 1.5.2 and 1.6 versions of our service with varying GOMAXPROCS values (3, 6 and 8). GOGC was left as default. I didn't see any appreciable change in latency. If I squinted hard I might say that the network operation "augment" saw a slight improvement at higher GOMAXPROCS and the cpu operation "evaluate" saw a slight worsening. I can post images of our metrics but it hardly seems worth the noise.

I took multiple cpu profiles of the service with default settings for GOGC and GOMAXPROCS under Go 1.4.3, 1.5.2 and 1.6. I can make those available with binaries privately if you think they will help. Email me at [email protected] and I will share a link to them. I have had a superficial look at them and nothing leaps out as obviously different.

CL https://golang.org/cl/21503 mentions this issue.

I just wanted to revisit this issue and let everyone know that as of v0.13.0 (released today), InfluxDB is now being built with Go 1.6.2.

We did a lot of work over the last couple of months to reduce unnecessary allocations, but the GC performance in Go has also been steadily improving. Based on our own internal testing (influxdata/influxdb#6511) and user feedback (influxdata/influxdb#5217), we're consistently seeing better performance across the board when using Go 1.6.2. This is also without tuning the GOGC parameter, which further improves performance.

I'm going to close this issue out because I don't think it represents an actionable item for the Go team any longer. For anyone else who still sees better performance on Go 1.4.3, I'd recommend working to identify and reduce unnecessary allocations - you'll be happier for it!

Finally, we're seeing even better performance with tip, so we'll be looking forward to Go 1.7 later this year. Thanks to the entire Go team for all of the help and responsiveness along the way!

@toddboom, that's great! Thanks for updating the issue.

Was this page helpful?
0 / 5 - 0 ratings