Go: net/http: HTTP/1 server got 0.5渭s slower in Go 1.8

Created on 6 Feb 2017  路  35Comments  路  Source: golang/go

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

Go 1.7.5, 1.8 rc3 and git

What operating system and processor architecture are you using (go env)?

Arch Linux 64bit

What did you do?

go run https://gist.github.com/OneOfOne/4d7e13977886ddab825870bc3422a901
switch terminals and run wrk -c 20 -d 30s http://localhost:8081

What did you expect to see?

Same throughput as 1.7 or faster.

What did you see instead?

go tip fd37b8ccf2

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   195.30us  470.12us  16.30ms   95.11%
    Req/Sec    85.62k     6.00k   95.21k    80.83%
  5110713 requests in 30.01s, 721.35MB read
Requests/sec: 170322.67
Transfer/sec:     24.04MB

go 1.8rc3 758a7281ab

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   192.49us  451.74us  15.14ms   95.02%
    Req/Sec    85.91k     6.37k   97.60k    83.50%
  5130079 requests in 30.01s, 724.08MB read
Requests/sec: 170941.23
Transfer/sec:     24.13MB

go 1.7.5

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   210.16us  528.53us  14.78ms   94.13%
    Req/Sec    94.34k     4.31k  103.56k    73.83%
  5631803 requests in 30.01s, 794.89MB read
Requests/sec: 187673.03
Transfer/sec:     26.49MB
FrozenDueToAge NeedsInvestigation Performance help wanted

Most helpful comment

Hi @bradfitz

Just a quick question from an outsider, so please bear with me if I missed something obvious.
Why would it be too late to fix this issue? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?

If not, please educate me.

Also, thanks for working on this project, I know a lot of people who love this language and the community around it. 馃帀

All 35 comments

Too late for Go 1.8, but we can look into performance during Go 1.9.

Anybody want to investigate the difference? What's a Go 1.7 vs Go 1.8 CPU profile say?

Updated play with pprof: https://play.golang.org/p/GZ4zQOg1Wf

I ran go tool pprof http://localhost:6060/debug/pprof/profile, switched terms and ran wrk -c 20 -d 30s http://localhost:6060/.

go tip

(pprof) top
36600ms of 80000ms total (45.75%)
Dropped 297 nodes (cum <= 400ms)
Showing top 10 nodes out of 135 (cum >= 970ms)
      flat  flat%   sum%        cum   cum%
   26360ms 32.95% 32.95%    27340ms 34.17%  syscall.Syscall
    2280ms  2.85% 35.80%     5740ms  7.17%  runtime.mallocgc
    1310ms  1.64% 37.44%     1310ms  1.64%  runtime.heapBitsSetType
    1260ms  1.57% 39.01%     1260ms  1.57%  runtime._ExternalCode
    1030ms  1.29% 40.30%     7280ms  9.10%  net/http.(*chunkWriter).writeHeader
     970ms  1.21% 41.51%      970ms  1.21%  runtime.epollwait
     900ms  1.12% 42.64%      920ms  1.15%  runtime.mapiternext
     880ms  1.10% 43.74%      880ms  1.10%  runtime.usleep
     820ms  1.03% 44.76%     1490ms  1.86%  runtime.deferreturn
     790ms  0.99% 45.75%      970ms  1.21%  runtime.mapaccess1_faststr
(pprof) top -cum
27.89s of 80s total (34.86%)
Dropped 297 nodes (cum <= 0.40s)
Showing top 10 nodes out of 135 (cum >= 23.44s)
      flat  flat%   sum%        cum   cum%
     0.01s 0.013% 0.013%     73.46s 91.83%  runtime.goexit
     0.55s  0.69%   0.7%     69.55s 86.94%  net/http.(*conn).serve
     0.30s  0.38%  1.07%     35.91s 44.89%  net/http.(*response).finishRequest
     0.15s  0.19%  1.26%     32.10s 40.12%  bufio.(*Writer).Flush
    26.36s 32.95% 34.21%     27.34s 34.17%  syscall.Syscall
     0.10s  0.12% 34.34%     24.56s 30.70%  net/http.checkConnErrorWriter.Write
         0     0% 34.34%     24.44s 30.55%  net.(*conn).Write
     0.23s  0.29% 34.62%     24.44s 30.55%  net.(*netFD).Write
     0.06s 0.075% 34.70%     23.50s 29.38%  syscall.Write
     0.13s  0.16% 34.86%     23.44s 29.30%  syscall.write

go 1.7.5

(pprof) top
40520ms of 82240ms total (49.27%)
Dropped 281 nodes (cum <= 411.20ms)
Showing top 10 nodes out of 128 (cum >= 860ms)
      flat  flat%   sum%        cum   cum%
   29480ms 35.85% 35.85%    30920ms 37.60%  syscall.Syscall
    2550ms  3.10% 38.95%     5710ms  6.94%  runtime.mallocgc
    1560ms  1.90% 40.84%     1590ms  1.93%  runtime.heapBitsSetType
    1220ms  1.48% 42.33%     1220ms  1.48%  runtime.epollwait
    1050ms  1.28% 43.60%     2750ms  3.34%  runtime.mapassign1
    1050ms  1.28% 44.88%     1080ms  1.31%  runtime.mapiternext
    1000ms  1.22% 46.10%     7890ms  9.59%  net/http.(*chunkWriter).writeHeader
     880ms  1.07% 47.17%     2910ms  3.54%  net/http.DetectContentType
     870ms  1.06% 48.22%     1010ms  1.23%  runtime.mapaccess1_faststr
     860ms  1.05% 49.27%      860ms  1.05%  runtime.futex
(pprof) top -cum
31.67s of 82.24s total (38.51%)
Dropped 281 nodes (cum <= 0.41s)
Showing top 10 nodes out of 128 (cum >= 27.69s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     75.77s 92.13%  runtime.goexit
     0.44s  0.54%  0.54%     74.26s 90.30%  net/http.(*conn).serve
     0.27s  0.33%  0.86%     37.08s 45.09%  net/http.(*response).finishRequest
     0.18s  0.22%  1.08%     36.44s 44.31%  bufio.(*Writer).Flush
     0.25s   0.3%  1.39%     36.26s 44.09%  bufio.(*Writer).flush
    29.48s 35.85% 37.23%     30.92s 37.60%  syscall.Syscall
     0.12s  0.15% 37.38%     27.99s 34.03%  net/http.checkConnErrorWriter.Write
     0.69s  0.84% 38.22%     27.85s 33.86%  net/http.(*conn).readRequest
     0.08s 0.097% 38.31%     27.77s 33.77%  net.(*conn).Write
     0.16s  0.19% 38.51%     27.69s 33.67%  net.(*netFD).Write

Let me know if you want me to do anything specific.

Ouch, I'm seeing the same thing with about a 20% slowdown on my webserver benchmark with 1.8r3, and a slightly bigger binary size too

I realize it's late on the release cycle but 20% regression on http is a huge regression for a lot of people.

I'm willing to help debug if you tell me what's needed @bradfitz.

I'm willing to help debug if you tell me what's needed @bradfitz.

Step 1 is debugging why it got slower. If you find any clues, let me know.

@OneOfOne how does this work for non-hello world applications? Do you see any issues there? Maybe 20% for Hello World apps is acceptable until 1.9?

@bradfitz I think the feature for Shutdown is causing lower performance for the kind of 'hello world' apps/tests.

            select {
            case <-srv.getDoneChan():
return ErrServerClosed
//....

Look here .

At 1.7 it was just a for loop.

Can somebody confirm my assumption?

Thanks,
kataras

I am not able to reproduce the OP's results. I'm on a Mac, and I'm using _slightly_ older versions of Go.

$ wrk -c 20 -d 30s http://localhost:8081  # go 1.8 RC2
Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.87ms    6.99ms 151.45ms   99.38%
    Req/Sec    25.22k     2.34k   33.28k    66.94%
  1510655 requests in 30.10s, 213.22MB read
Requests/sec:  50188.34
Transfer/sec:      7.08MB

$ wrk -c 20 -d 30s http://localhost:8081  # go 1.7.4
Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   840.73us    6.85ms 151.46ms   99.41%
    Req/Sec    26.05k     3.67k   33.43k    60.96%
  1560770 requests in 30.10s, 220.29MB read
Requests/sec:  51853.50
Transfer/sec:      7.32MB

There's a slight difference but it's smaller than 20%, almost negligible here.

@kataras, do you have evidence in favor of that theory?

You can confirm it yourself: delete those lines and measure again.

I would be surprised, though.

@kataras That looks like it is probably it. In addition to the select you have a mutex acquire and a unlock that is done in a defer (which I know recently got sped up, but are still a bit slower than a direct unlock.

func (s *Server) getDoneChan() <-chan struct{} {
    s.mu.Lock()
    defer s.mu.Unlock()
    return s.getDoneChanLocked()
}

Since the accept loop is such a hot path, it may be worth moving the shutdown select into a separate goroutine and using sync/atomic to signal shutdown in the accept loop.

EDIT:
I don't think it's just this, just tried tip without the select all together and it adds about 7us (5%~).

I understand that's too late for 1.8, but why not maybe fix this in a future 1.8.1 version?

Has anyone else even been able to reproduce these results?

-11.2% on Debian 8, amd64 (1.7.5 and rc3)

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   457.65us    1.26ms  46.74ms   93.23%
    Req/Sec    65.08k     7.84k   99.65k    73.83%
  3891443 requests in 30.07s, 549.25MB read
Requests/sec: 129397.13
Transfer/sec:     18.26MB

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   350.93us    0.92ms  39.50ms   94.72%
    Req/Sec    57.70k     5.34k   77.31k    74.50%
  3447358 requests in 30.03s, 486.57MB read
Requests/sec: 114800.24
Transfer/sec:     16.20MB

@kataras The implementation is correct, which means the author was careful.. it's not that big of a deal bugs happen in software.

That said I was wrong anyways, I looked again just now and realized the select only happens when an error occurs during accept. What is odd is how it slowed down my program consistently when I remove it from the accept. Maybe it disqualifies that block from some optimization passes or inlining? There may be a problem somewhere causing a slowdown, but it's somewhere else and needs a different setup than mine.

@bradfitz Briefly bisected this, and seems commit faf882d1d427e8c8a9a1be00d8ddcab81d1e848e is causing at least some of it. I'm seeing a performance hit of roughly 10% on this benchmark with this commit compared to the commit preceding it.

@codido, thanks for the bisect! Yes, and that's the commit I would suspect too. Good to see confirmation. That was one of the biggest architectural changes in the net/http.Server in quite some time.

I never did any benchmarking (or optimizations) after that change.

We can look into it for Go 1.9.

Hi @bradfitz

Just a quick question from an outsider, so please bear with me if I missed something obvious.
Why would it be too late to fix this issue? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?

If not, please educate me.

Also, thanks for working on this project, I know a lot of people who love this language and the community around it. 馃帀

Why are people responding and voting as if this is a major issue? This change seems to have a worst case performance impact of ~40us per request. That sounds very low to me, is there a real world scenario where this would matter?

(edit: I was wrong, it's ~0.5us per request, so even lower)

I think there might need to be more benchmarks around this to see the effect on things other than hello world. Things like hello world stick a lot of benchmark pressure on the internals and as such can magnify the effects of slowdowns. In a real world application, there's a boat-load more code that runs which in theory makes the effect of things like this far smaller per request - ie it becomes less of a % per request which means the effect is reduced.
Just my 2 cents.
(I'm going to look at precisely this later on if I have time)

@reimertz I remember reading about the release cycle here: https://github.com/golang/go/wiki/Go-Release-Cycle

Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix.

One of the criteria for issuing a release candidate is that Google be using that version of the code for new production builds by default: if we at Google are not willing to run it for production use, we shouldn't be asking others to.

@kmlx thanks for the link! I only went here: https://golang.org/doc/contribute.html and searched for 'release' and couldn't find anything. My bad.

Also, wow! If Google will run this version on their production servers and are okay with a worst case performance impact of ~40us per request (quoting @tinco ), I guess the rest of the world can as well. 馃檪

This is why we ask people to test beta releases. So that if they believe they've found an issue, they can complain early.

go1.8beta1 was released on 1, Dec 2016, more than 2 months ago:

https://groups.google.com/d/topic/golang-nuts/QYuo0fai6YE/discussion

Quoting the announcement:

It is important that we find bugs before issuing a release candidate.
The release candidate is planned for the first week of January.
Your help in testing this beta is invaluable.

Sorry I looked at the numbers wrong. So from @OneOfOne's test, go tip made 5110713 requests in 30 seconds, that's 5.87us per request. Go 1.7.5 did 5631803 requests in 30 seconds, 5.33us per request. So when you compare those to eachother, that's like an 11% performance decrease. But if you look at it from an absolute perspective, that's a performance hit of just a half microsecond per request. I can't even imagine an HTTP service where this would be relevant.

@tinco I agree, it's a very minor regression when put into perspective. However, it's still very important to figure out why it regressed. Unless you want a situation like: #6853 and Go 1.9 comes around with another 11% decrease.

That being said I'm not sure why this couldn't be fixed with a patch release(vs a minor release).

@tinco , how many cores that computer has? multiply 0.5us by number of cores.

On Wed, Feb 8, 2017 at 4:13 PM, Sokolov Yura notifications@github.com
wrote:

how many cores that computer has? multiply 0.5us by number of cores.

Why does the number of cores has to do with this.
each request is only handled by a single core.

@minux the 5 million requests are handled in 30 seconds by N cores. So the real time spent per request on each core is 30 / 5M * N. N is likely rather small, less than 10 probably so it's not really relevant.

Minor releases are for critical, unavoidable bugs. Your program has a 1% chance of randomly crashing after a day of execution is the kind of bug we fix in a point release, with the simplest, safest, most trivial fix possible. Minor releases are not for fixing "your program runs 0.5us slower to execute an operation that probably takes far longer than that overall anyway".

I guess you're referring to patch releases (1.8.x) vs minor (1.x.0).
IMO the whole point of patch releases was to address regressions without changing features or any behavior. Whereas this one doesn't look like a big one, I see no reason not to fix it in a 1.8.patch.

In the Go project, we refer to 1.x as a major release and 1.x.y as a minor (or sometimes point) release: Go 1.8 is a major release; Go 1.8.2 is a minor or point release. It doesn't make sense for us to call Go 1.8 a minor release.

The Go project release policy is documented at https://golang.org/doc/devel/release.html#policy:

Each major Go release obsoletes and ends support for the previous one. For example, if Go 1.5 has been released, then it is the current release and Go 1.4 and earlier are no longer supported. We fix critical problems in the current release as needed by issuing minor revisions (for example, Go 1.5.1, Go 1.5.2, and so on).

Go 1.5 is backwards compatible with Go 1.4 with some caveats: your code has to not depend on undocumented behavior (for example, the ordering of equal elements chosen by sort.Sort), your code has to use keyed struct literals so it doesn't break if new struct fields are added, and so on.

To the fullest extent possible, Go 1.5.1 is backwards compatible with Go 1.5 without any such caveats: we aim for updating from Go 1.5 to Go 1.5.1 to be a guaranteed-safe operation, a non-event, as you say "without changing features or any behavior".

Accepting that making mistakes is an inevitable part of writing software and that any time you make a change there's a risk it will introduce a bug that isn't caught by testing before the release, the best way we know to reduce that risk is to disallow non-critical changes in point releases.

Fixing a 0.5us slowdown that only appears in microbenchmarks is a non-critical change.

@tinco if we don't consider this "so-called minor" performance issue and every time we ignore it, we'll ultimately get "a slower go". IMO no one wants this to happen.

@rashidul0405 This issue is still open; no one is ignoring it. I was explaining why it doesn't merit a rushed fix into Go 1.8 or Go 1.8.1.

I would gladly take a 1% slower Go instead of a 1% crashier Go.

Let's keep discussion on mailing lists and Twitter etc.

Please only comment here if you're working on this issue.

Since no one appears to want to work on this, I'm going to close it. We can't chase down every possible performance detail, and this one is getting old.

Was this page helpful?
0 / 5 - 0 ratings