Go: cmd/go: slow "native" performance with Mac OS X 10.14.1 and 10.12.6

Created on 12 Nov 2018  路  16Comments  路  Source: golang/go

Following a debug session with @fatih

What version of Go are you using (go version)?

go version go1.11.2 darwin/amd64

# and

go version go1.11.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

GOARCH="amd64"
GOBIN="/Users/fatih/go/bin"
GOCACHE="/Users/fatih/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/fatih/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.2/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/k_/87syx3r50m93m72hvqj2qqlw0000gn/T/go-build130848285=/tmp/go-build -gno-record-gcc-switches -fno-common"

and

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build416231211=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The symptom that @fatih and I were investigating was godef performing ~2x more slowly on his fast machine than on my slower machine.

@fatih's machine specs for reference:

 Model Name:    iMac Pro
  Model Identifier:    iMacPro1,1
  Processor Name:    Intel Xeon W
  Processor Speed:    3 GHz
  Number of Processors:    1
  Total Number of Cores:    10
  L2 Cache (per Core):    1 MB
  L3 Cache:    13.8 MB
  Memory:    64 GB

OS X 10.14.1 (18B75)

We think we have a smaller reproduction that demonstrates what appears to be an OS X 10.14.n issue that may or may not be related to Go. But go list appears to be a good way to demonstrate the problem and hopefully therefore a good place to start further investigation.

The following was run in a Terminal on @fatih's setup, and then within a Docker container on the same machine. Run-times of the go list commands were compared:

/bin/bash
(
set -eux
export GOPATH=$(mktemp -d)
command cd $(mktemp -d)
git clone https://github.com/digitalocean/csi-digitalocean
command cd csi-digitalocean/
git checkout b9ed6c2ea9ad85c5e4956bdfe418940bb5aa883a
rm go.sum
go mod tidy
command cd driver
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
time go list -export -deps -json -e . > /dev/null 2>&1
) 2>&1 | tee output.txt

The "native" terminal run shows "average" speeds of ~300ms:

+ go list -export -deps -json -e .
real    0m1.524s
user    0m7.224s
sys 0m2.391s
+ go list -export -deps -json -e .
real    0m0.297s
user    0m0.420s
sys 0m0.898s
+ go list -export -deps -json -e .
real    0m0.300s
user    0m0.403s
sys 0m0.919s
+ go list -export -deps -json -e .
real    0m0.308s
user    0m0.402s
sys 0m0.968s

Whereas the Docker run shows "average" speeds of ~180ms:

+ go list -export -deps -json -e .

real    0m4.837s
user    0m12.070s
sys     0m5.030s
+ go list -export -deps -json -e .

real    0m0.186s
user    0m0.180s
sys     0m0.110s
+ go list -export -deps -json -e .

real    0m0.165s
user    0m0.160s
sys     0m0.120s
+ go list -export -deps -json -e .

real    0m0.187s
user    0m0.190s
sys     0m0.100s

What did you expect to see?

Similar run-times in each, possibly even faster times in the "native" OS X environment.

What did you see instead?

The "native" run-times are almost 70% longer.

cc @bcmills (and FYI @ianthehat for any go/packages side-effects)

GoCommand NeedsInvestigation OS-Darwin Performance

Most helpful comment

I don't have an answer to why macOS 10.12 and 10.14 performance would be different, but I think I know why Docker and macOS are different.

As far as I understand, Docker on macOS actually runs a Linux kernel and userspace under a hypervisor. That means overhead for I/O and system calls is more similar to native Linux overhead than it is to macOS overhead. "go list" does a lot of stat and read system calls on small files, so the system call overhead matters a lot.

Here are the results from a quick benchmark I ran. The Read and Write tests are reading and writing slices of various sizes to a file (the file is almost certainly in the kernel file cache, so disk speed doesn't matter). The Import test is measuring the time to call go/build.Context.Import on "fmt".

macOS native:

BenchmarkStat-8               300000          4470 ns/op
BenchmarkRead/1K-8             50000         27366 ns/op
BenchmarkRead/50K-8            50000         30980 ns/op
BenchmarkRead/1024K-8          10000        104866 ns/op
BenchmarkWrite/1K-8            10000        151379 ns/op
BenchmarkWrite/50K-8            5000        311861 ns/op
BenchmarkWrite/1024K-8          1000       2020063 ns/op
BenchmarkImport-8               2000        919822 ns/op

macOS Docker

BenchmarkStat-4              1000000          1222 ns/op
BenchmarkRead/1K-4            300000          4101 ns/op
BenchmarkRead/50K-4           200000          7405 ns/op
BenchmarkRead/1024K-4          20000         92996 ns/op
BenchmarkWrite/1K-4            10000        181860 ns/op
BenchmarkWrite/50K-4           10000        341137 ns/op
BenchmarkWrite/1024K-4          1000       2404169 ns/op
BenchmarkImport-4               2000        580594 ns/op

Notably, the Stat and Read/1K tests are much, much slower on native macOS. Windows is quite a bit slower than this.

All 16 comments

Here's what I got from running the above script on a 13' macOS (seirra 10.12.6)

Without Docker

+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m10.234s
user    0m8.298s
sys    0m1.270s
+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m1.276s
user    0m0.381s
sys    0m0.234s
+ godef -o 19358 -f controller.go
/var/folders/n4/35bgdnkd6vlgqrdzbyl0x1ycmhcndx/T/tmp.DVAXjerX/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m1.295s
user    0m0.399s
sys    0m0.240s

With Docker

+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m7.215s
user    0m10.540s
sys    0m2.270s
+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m0.304s
user    0m0.250s
sys    0m0.110s
+ godef -o 19358 -f controller.go
/tmp/tmp.U5kJ73H8kO/pkg/mod/github.com/container-storage-interface/[email protected]/lib/go/csi/v0/csi.pb.go:1233:1

real    0m0.282s
user    0m0.230s
sys    0m0.110s

Sorry, forgot to cc @rsc in the original description

Will a dtruss/strace log help here ? Just wondering if there is any OS level variable here.

cc @jayconrod

The same problem on my windows 10:

What version of Go are you using (go version)?
go version go1.11 windows/amd64

What operating system and processor architecture are you using (go env)?
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\shuye\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=G:\GO
set GOPROXY=
set GORACE=
set GOROOT=D:\Go
set GOTMPDIR=
set GOTOOLDIR=D:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=G:\GIT.HUB\marketstore\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\shuye\AppData\Local\Temp\go-build930085358=/tmp/go-build -gno-record-gcc-switches

I don't have an answer to why macOS 10.12 and 10.14 performance would be different, but I think I know why Docker and macOS are different.

As far as I understand, Docker on macOS actually runs a Linux kernel and userspace under a hypervisor. That means overhead for I/O and system calls is more similar to native Linux overhead than it is to macOS overhead. "go list" does a lot of stat and read system calls on small files, so the system call overhead matters a lot.

Here are the results from a quick benchmark I ran. The Read and Write tests are reading and writing slices of various sizes to a file (the file is almost certainly in the kernel file cache, so disk speed doesn't matter). The Import test is measuring the time to call go/build.Context.Import on "fmt".

macOS native:

BenchmarkStat-8               300000          4470 ns/op
BenchmarkRead/1K-8             50000         27366 ns/op
BenchmarkRead/50K-8            50000         30980 ns/op
BenchmarkRead/1024K-8          10000        104866 ns/op
BenchmarkWrite/1K-8            10000        151379 ns/op
BenchmarkWrite/50K-8            5000        311861 ns/op
BenchmarkWrite/1024K-8          1000       2020063 ns/op
BenchmarkImport-8               2000        919822 ns/op

macOS Docker

BenchmarkStat-4              1000000          1222 ns/op
BenchmarkRead/1K-4            300000          4101 ns/op
BenchmarkRead/50K-4           200000          7405 ns/op
BenchmarkRead/1024K-4          20000         92996 ns/op
BenchmarkWrite/1K-4            10000        181860 ns/op
BenchmarkWrite/50K-4           10000        341137 ns/op
BenchmarkWrite/1024K-4          1000       2404169 ns/op
BenchmarkImport-4               2000        580594 ns/op

Notably, the Stat and Read/1K tests are much, much slower on native macOS. Windows is quite a bit slower than this.

We have a project with a mono repo with 40k LOC, 700k LOC dependencies and 30 binaries, code generation before build and so forth. Running time make without any changes and a warm cache takes 3.3 sec on a modern MacBook Pro and only 2 sec on a Fedora VM running on the same machine using Parallels. It seems other people have similar experiences with VMware/Ubuntu and macOS.

all:
    go build -o build/ ./gen/... # build code generators
    go generate ./...
    go test ./...
    go build -o build/$(GOOS)_$(GOARCH)/ ./cmd/...

The build cache contains only this one project. When making a simple change to the project build times change to 5.3 sec vs. 2.8 sec. Building a single binary the numbers are 1.35 sec vs 0.85 sec.

We are using modules, everything is downloaded, no CGO and we are not cross-compiling.

Using a RAM disk on macOS for GOCACHE and the output directory did not improve the build speed.

I'm testing this on a quite beefy machine but this is more noticeable on smaller laptops of my colleagues. I'll double check but the build times were more in the range of 5-10 seconds. This can be quite noticeable and I'm curious how we can debug this a bit further.

Update 1:

The project and dependencies are about 1700 Go files at this point.

Could this possibly be due to #26550 (thus a dup of #30497 or #27415) ?
(Note that change occurred for 10.12, so it can't explain the OP's problem, but it may explain what some other posters are seeing.)

The speed difference was already noticeable with go1.13 since I've started playing with a Fedora laptop a while ago. Not sure about go1.12.

@randall77 you mean that i/o syscalls are slower on Darwin than on Linux?

@magiconair, yes, syscalls will be somewhat slower because they go through libc now. But I was particularly thinking of fsync which is much more expensive on Darwin. The regular fsync is kinda broken, so we have to use ioctl with F_FULLFSYNC.

BTW, that #26650, not #26550, sorry.

Are we sure that what we are seeing here is not just Docker for Mac virtualization doing some very aggressive in-memory caching?

We are not just comparing Darwin to Linux; we are comparing Darwin to Hyperkit+Linux.

In the numbers shared by @myitcv in the issue description, native macOS is actually faster on the first cold run:

+ go list -export -deps -json -e .
real    0m1.524s
user    0m7.224s
sys 0m2.391s

versus this for Docker:

+ go list -export -deps -json -e .
real    0m4.837s
user    0m12.070s
sys     0m5.030s

Update: I assumed everybody was using Docker for Mac, but maybe you use another solution to run Docker on your Mac?

@randall77 is there some indication that this situation will improve in the future? Is Apple addressing the fsync issue? I'm now at 15sec build times on Darwin vs 4.5 sec in a Fedora VM running on the same super-beefy hardware.

@randall77 is there some indication that this situation will improve in the future?

I don't know of anyone working on it, or anyone who has plans in this area. So, no.

Is Apple addressing the fsync issue?

I have no idea.

I'm now at 15sec build times on Darwin vs 4.5 sec in a Fedora VM running on the same super-beefy hardware.

I think if we want to make progress here we need to understand why it is so much slower. For build times, I don't think a slow fsync would be the cause. Building Go code is mostly reading; it's probably stat or read call. But that's a guess; data would help.

Building Go code is mostly reading; it's probably stat or read call. But that's a guess; data would help.

Do you have a quick tip on how to collect this data? Otherwise, I'll try and ask the internet.

Run with strace?
Run under a profiler?
Both might be kinda tricky, as the build system runs multiple processes.

Was this page helpful?
0 / 5 - 0 ratings