Please answer these questions before submitting your issue. Thanks!
go version
)?Go 1.7.5, 1.8 rc3 and git
go env
)?Arch Linux 64bit
go run
https://gist.github.com/OneOfOne/4d7e13977886ddab825870bc3422a901
switch terminals and run wrk -c 20 -d 30s http://localhost:8081
Same throughput as 1.7 or faster.
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 195.30us 470.12us 16.30ms 95.11%
Req/Sec 85.62k 6.00k 95.21k 80.83%
5110713 requests in 30.01s, 721.35MB read
Requests/sec: 170322.67
Transfer/sec: 24.04MB
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 192.49us 451.74us 15.14ms 95.02%
Req/Sec 85.91k 6.37k 97.60k 83.50%
5130079 requests in 30.01s, 724.08MB read
Requests/sec: 170941.23
Transfer/sec: 24.13MB
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 210.16us 528.53us 14.78ms 94.13%
Req/Sec 94.34k 4.31k 103.56k 73.83%
5631803 requests in 30.01s, 794.89MB read
Requests/sec: 187673.03
Transfer/sec: 26.49MB
Too late for Go 1.8, but we can look into performance during Go 1.9.
Anybody want to investigate the difference? What's a Go 1.7 vs Go 1.8 CPU profile say?
Updated play with pprof: https://play.golang.org/p/GZ4zQOg1Wf
I ran go tool pprof http://localhost:6060/debug/pprof/profile
, switched terms and ran wrk -c 20 -d 30s http://localhost:6060/
.
(pprof) top
36600ms of 80000ms total (45.75%)
Dropped 297 nodes (cum <= 400ms)
Showing top 10 nodes out of 135 (cum >= 970ms)
flat flat% sum% cum cum%
26360ms 32.95% 32.95% 27340ms 34.17% syscall.Syscall
2280ms 2.85% 35.80% 5740ms 7.17% runtime.mallocgc
1310ms 1.64% 37.44% 1310ms 1.64% runtime.heapBitsSetType
1260ms 1.57% 39.01% 1260ms 1.57% runtime._ExternalCode
1030ms 1.29% 40.30% 7280ms 9.10% net/http.(*chunkWriter).writeHeader
970ms 1.21% 41.51% 970ms 1.21% runtime.epollwait
900ms 1.12% 42.64% 920ms 1.15% runtime.mapiternext
880ms 1.10% 43.74% 880ms 1.10% runtime.usleep
820ms 1.03% 44.76% 1490ms 1.86% runtime.deferreturn
790ms 0.99% 45.75% 970ms 1.21% runtime.mapaccess1_faststr
(pprof) top -cum
27.89s of 80s total (34.86%)
Dropped 297 nodes (cum <= 0.40s)
Showing top 10 nodes out of 135 (cum >= 23.44s)
flat flat% sum% cum cum%
0.01s 0.013% 0.013% 73.46s 91.83% runtime.goexit
0.55s 0.69% 0.7% 69.55s 86.94% net/http.(*conn).serve
0.30s 0.38% 1.07% 35.91s 44.89% net/http.(*response).finishRequest
0.15s 0.19% 1.26% 32.10s 40.12% bufio.(*Writer).Flush
26.36s 32.95% 34.21% 27.34s 34.17% syscall.Syscall
0.10s 0.12% 34.34% 24.56s 30.70% net/http.checkConnErrorWriter.Write
0 0% 34.34% 24.44s 30.55% net.(*conn).Write
0.23s 0.29% 34.62% 24.44s 30.55% net.(*netFD).Write
0.06s 0.075% 34.70% 23.50s 29.38% syscall.Write
0.13s 0.16% 34.86% 23.44s 29.30% syscall.write
(pprof) top
40520ms of 82240ms total (49.27%)
Dropped 281 nodes (cum <= 411.20ms)
Showing top 10 nodes out of 128 (cum >= 860ms)
flat flat% sum% cum cum%
29480ms 35.85% 35.85% 30920ms 37.60% syscall.Syscall
2550ms 3.10% 38.95% 5710ms 6.94% runtime.mallocgc
1560ms 1.90% 40.84% 1590ms 1.93% runtime.heapBitsSetType
1220ms 1.48% 42.33% 1220ms 1.48% runtime.epollwait
1050ms 1.28% 43.60% 2750ms 3.34% runtime.mapassign1
1050ms 1.28% 44.88% 1080ms 1.31% runtime.mapiternext
1000ms 1.22% 46.10% 7890ms 9.59% net/http.(*chunkWriter).writeHeader
880ms 1.07% 47.17% 2910ms 3.54% net/http.DetectContentType
870ms 1.06% 48.22% 1010ms 1.23% runtime.mapaccess1_faststr
860ms 1.05% 49.27% 860ms 1.05% runtime.futex
(pprof) top -cum
31.67s of 82.24s total (38.51%)
Dropped 281 nodes (cum <= 0.41s)
Showing top 10 nodes out of 128 (cum >= 27.69s)
flat flat% sum% cum cum%
0 0% 0% 75.77s 92.13% runtime.goexit
0.44s 0.54% 0.54% 74.26s 90.30% net/http.(*conn).serve
0.27s 0.33% 0.86% 37.08s 45.09% net/http.(*response).finishRequest
0.18s 0.22% 1.08% 36.44s 44.31% bufio.(*Writer).Flush
0.25s 0.3% 1.39% 36.26s 44.09% bufio.(*Writer).flush
29.48s 35.85% 37.23% 30.92s 37.60% syscall.Syscall
0.12s 0.15% 37.38% 27.99s 34.03% net/http.checkConnErrorWriter.Write
0.69s 0.84% 38.22% 27.85s 33.86% net/http.(*conn).readRequest
0.08s 0.097% 38.31% 27.77s 33.77% net.(*conn).Write
0.16s 0.19% 38.51% 27.69s 33.67% net.(*netFD).Write
Let me know if you want me to do anything specific.
Ouch, I'm seeing the same thing with about a 20% slowdown on my webserver benchmark with 1.8r3, and a slightly bigger binary size too
I realize it's late on the release cycle but 20% regression on http is a huge regression for a lot of people.
I'm willing to help debug if you tell me what's needed @bradfitz.
I'm willing to help debug if you tell me what's needed @bradfitz.
Step 1 is debugging why it got slower. If you find any clues, let me know.
@OneOfOne how does this work for non-hello world applications? Do you see any issues there? Maybe 20% for Hello World apps is acceptable until 1.9?
@bradfitz I think the feature for Shutdown
is causing lower performance for the kind of 'hello world' apps/tests.
select {
case <-srv.getDoneChan():
return ErrServerClosed
//....
Look here .
At 1.7 it was just a for
loop.
Can somebody confirm my assumption?
Thanks,
kataras
I am not able to reproduce the OP's results. I'm on a Mac, and I'm using _slightly_ older versions of Go.
$ wrk -c 20 -d 30s http://localhost:8081 # go 1.8 RC2
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 0.87ms 6.99ms 151.45ms 99.38%
Req/Sec 25.22k 2.34k 33.28k 66.94%
1510655 requests in 30.10s, 213.22MB read
Requests/sec: 50188.34
Transfer/sec: 7.08MB
$ wrk -c 20 -d 30s http://localhost:8081 # go 1.7.4
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 840.73us 6.85ms 151.46ms 99.41%
Req/Sec 26.05k 3.67k 33.43k 60.96%
1560770 requests in 30.10s, 220.29MB read
Requests/sec: 51853.50
Transfer/sec: 7.32MB
There's a slight difference but it's smaller than 20%, almost negligible here.
@kataras, do you have evidence in favor of that theory?
You can confirm it yourself: delete those lines and measure again.
I would be surprised, though.
@kataras That looks like it is probably it. In addition to the select you have a mutex acquire and a unlock that is done in a defer (which I know recently got sped up, but are still a bit slower than a direct unlock.
func (s *Server) getDoneChan() <-chan struct{} {
s.mu.Lock()
defer s.mu.Unlock()
return s.getDoneChanLocked()
}
Since the accept loop is such a hot path, it may be worth moving the shutdown select into a separate goroutine and using sync/atomic to signal shutdown in the accept loop.
EDIT:
I don't think it's just this, just tried tip without the select all together and it adds about 7us (5%~).
I understand that's too late for 1.8, but why not maybe fix this in a future 1.8.1 version?
Has anyone else even been able to reproduce these results?
-11.2% on Debian 8, amd64 (1.7.5 and rc3)
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 457.65us 1.26ms 46.74ms 93.23%
Req/Sec 65.08k 7.84k 99.65k 73.83%
3891443 requests in 30.07s, 549.25MB read
Requests/sec: 129397.13
Transfer/sec: 18.26MB
Running 30s test @ http://localhost:8081
2 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 350.93us 0.92ms 39.50ms 94.72%
Req/Sec 57.70k 5.34k 77.31k 74.50%
3447358 requests in 30.03s, 486.57MB read
Requests/sec: 114800.24
Transfer/sec: 16.20MB
@kataras The implementation is correct, which means the author was careful.. it's not that big of a deal bugs happen in software.
That said I was wrong anyways, I looked again just now and realized the select only happens when an error occurs during accept. What is odd is how it slowed down my program consistently when I remove it from the accept. Maybe it disqualifies that block from some optimization passes or inlining? There may be a problem somewhere causing a slowdown, but it's somewhere else and needs a different setup than mine.
@bradfitz Briefly bisected this, and seems commit faf882d1d427e8c8a9a1be00d8ddcab81d1e848e is causing at least some of it. I'm seeing a performance hit of roughly 10% on this benchmark with this commit compared to the commit preceding it.
@codido, thanks for the bisect! Yes, and that's the commit I would suspect too. Good to see confirmation. That was one of the biggest architectural changes in the net/http.Server in quite some time.
I never did any benchmarking (or optimizations) after that change.
We can look into it for Go 1.9.
Hi @bradfitz
Just a quick question from an outsider, so please bear with me if I missed something obvious.
Why would it be too late to fix this issue? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?
If not, please educate me.
Also, thanks for working on this project, I know a lot of people who love this language and the community around it. 馃帀
Why are people responding and voting as if this is a major issue? This change seems to have a worst case performance impact of ~40us per request. That sounds very low to me, is there a real world scenario where this would matter?
(edit: I was wrong, it's ~0.5us per request, so even lower)
I think there might need to be more benchmarks around this to see the effect on things other than hello world. Things like hello world stick a lot of benchmark pressure on the internals and as such can magnify the effects of slowdowns. In a real world application, there's a boat-load more code that runs which in theory makes the effect of things like this far smaller per request - ie it becomes less of a % per request which means the effect is reduced.
Just my 2 cents.
(I'm going to look at precisely this later on if I have time)
@reimertz I remember reading about the release cycle here: https://github.com/golang/go/wiki/Go-Release-Cycle
Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix.
One of the criteria for issuing a release candidate is that Google be using that version of the code for new production builds by default: if we at Google are not willing to run it for production use, we shouldn't be asking others to.
@kmlx thanks for the link! I only went here: https://golang.org/doc/contribute.html and searched for 'release' and couldn't find anything. My bad.
Also, wow! If Google will run this version on their production servers and are okay with a worst case performance impact of ~40us per request (quoting @tinco ), I guess the rest of the world can as well. 馃檪
This is why we ask people to test beta releases. So that if they believe they've found an issue, they can complain early.
go1.8beta1 was released on 1, Dec 2016, more than 2 months ago:
https://groups.google.com/d/topic/golang-nuts/QYuo0fai6YE/discussion
Quoting the announcement:
It is important that we find bugs before issuing a release candidate.
The release candidate is planned for the first week of January.
Your help in testing this beta is invaluable.
Sorry I looked at the numbers wrong. So from @OneOfOne's test, go tip
made 5110713 requests in 30 seconds, that's 5.87us per request. Go 1.7.5 did 5631803 requests in 30 seconds, 5.33us per request. So when you compare those to eachother, that's like an 11% performance decrease. But if you look at it from an absolute perspective, that's a performance hit of just a half microsecond per request. I can't even imagine an HTTP service where this would be relevant.
@tinco I agree, it's a very minor regression when put into perspective. However, it's still very important to figure out why it regressed. Unless you want a situation like: #6853 and Go 1.9 comes around with another 11% decrease.
That being said I'm not sure why this couldn't be fixed with a patch release(vs a minor release).
@tinco , how many cores that computer has? multiply 0.5us by number of cores.
On Wed, Feb 8, 2017 at 4:13 PM, Sokolov Yura notifications@github.com
wrote:
how many cores that computer has? multiply 0.5us by number of cores.
Why does the number of cores has to do with this.
each request is only handled by a single core.
@minux the 5 million requests are handled in 30 seconds by N cores. So the real time spent per request on each core is 30 / 5M * N. N is likely rather small, less than 10 probably so it's not really relevant.
Minor releases are for critical, unavoidable bugs. Your program has a 1% chance of randomly crashing after a day of execution is the kind of bug we fix in a point release, with the simplest, safest, most trivial fix possible. Minor releases are not for fixing "your program runs 0.5us slower to execute an operation that probably takes far longer than that overall anyway".
I guess you're referring to patch releases (1.8.x
) vs minor (1.x.0
).
IMO the whole point of patch releases was to address regressions without changing features or any behavior. Whereas this one doesn't look like a big one, I see no reason not to fix it in a 1.8.patch
.
In the Go project, we refer to 1.x as a major release and 1.x.y as a minor (or sometimes point) release: Go 1.8 is a major release; Go 1.8.2 is a minor or point release. It doesn't make sense for us to call Go 1.8 a minor release.
The Go project release policy is documented at https://golang.org/doc/devel/release.html#policy:
Each major Go release obsoletes and ends support for the previous one. For example, if Go 1.5 has been released, then it is the current release and Go 1.4 and earlier are no longer supported. We fix critical problems in the current release as needed by issuing minor revisions (for example, Go 1.5.1, Go 1.5.2, and so on).
Go 1.5 is backwards compatible with Go 1.4 with some caveats: your code has to not depend on undocumented behavior (for example, the ordering of equal elements chosen by sort.Sort), your code has to use keyed struct literals so it doesn't break if new struct fields are added, and so on.
To the fullest extent possible, Go 1.5.1 is backwards compatible with Go 1.5 without any such caveats: we aim for updating from Go 1.5 to Go 1.5.1 to be a guaranteed-safe operation, a non-event, as you say "without changing features or any behavior".
Accepting that making mistakes is an inevitable part of writing software and that any time you make a change there's a risk it will introduce a bug that isn't caught by testing before the release, the best way we know to reduce that risk is to disallow non-critical changes in point releases.
Fixing a 0.5us slowdown that only appears in microbenchmarks is a non-critical change.
@tinco if we don't consider this "so-called minor" performance issue and every time we ignore it, we'll ultimately get "a slower go". IMO no one wants this to happen.
@rashidul0405 This issue is still open; no one is ignoring it. I was explaining why it doesn't merit a rushed fix into Go 1.8 or Go 1.8.1.
I would gladly take a 1% slower Go instead of a 1% crashier Go.
Let's keep discussion on mailing lists and Twitter etc.
Please only comment here if you're working on this issue.
Since no one appears to want to work on this, I'm going to close it. We can't chase down every possible performance detail, and this one is getting old.
Most helpful comment
Hi @bradfitz
Just a quick question from an outsider, so please bear with me if I missed something obvious.
Why would it be too late to fix this issue? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?
If not, please educate me.
Also, thanks for working on this project, I know a lot of people who love this language and the community around it. 馃帀