Go: internal/cpu: new package to expose processor capabilities

Created on 21 Apr 2016 · 47Comments · Source: golang/go

We have a lot of packages in the std that uses specialized instructions
that are not always present. Most of the times, the package will detect
the required features by themselves. This is fine but will lead to code
duplication. And as Bryan Chan mentioned on https://golang.org/cl/22201,
even if the processor provides way to detect certain optional features,
it's still better to use AT_HWCAP from Linux auxv because that also
takes kernel support into account.

Only the runtime can access auxv, so it makes sense for the runtime
to query the processor capabilities and provide that to the packages.

I propose that we add an internal package internal/cpu that exposes
capability flags for the current processor so that each std packages
could query it directly instead of having a runtime detection routine
that duplicates the work.

Another benefit is that, some processors, like ARM, doesn't provide
a way to do runtime capability detection, so we have to rely on the
kernel to provide this information. Different kernels provide different
mechanisms for this (sysctl for BSD and auxv for linux), so providing
a package that abstracts those OS-dependent feature away is also
beneficial.

We might promote the package to runtime/cpu if deemed fit, but that's
out of the scope for this proposal.

The package could be modeled after the Linux's AT_HWCAP bits,
and it will be processor dependent.

FrozenDueToAge Proposal Proposal-Accepted

Source

minux

👍5

Most helpful comment

@martisch, sounds reasonable, if LOUD.

rsc on 29 Mar 2017

😄4

All 47 comments

I'm happy to work on adding this (particularly the s390x bit).

I'm interested in what thoughts people have for the API. The three options that occur to me are:

Function taking strings: cpu.Has("ssse3", "popcnt")
Function taking constants: cpu.Has(cpu.SSSE3, cpu.POPCNT))
Raw bools: cpu.SSSE3 && cpu.POPCNT)

I'm not sure if we want to prefix with the CPU type to avoid naming conflicts. If so then perhaps s/SSSE3/AMD64.SSSE3/.

It might also be nice if the function (or bools) could be inlined and constant folded. In that case I suspect we'd need to limit the number of features to be checked to one per call. This could be useful in scenarios where a feature is optional on say i386/ppc64, but mandatory on newer versions of the architecture such as amd64/ppc64le.

Another possibility is that we do this in the runtime/internal/sys package. Something like sys.Feature("ssse3") perhaps? There are variables in sys that might be useful to an application as well and could be made part of a public cpu package were we to ever go down that route.

BTW @minux when you say:

it will be processor dependent

Do you mean you want the API to be processor dependent?

mundaym on 11 Aug 2016

I mean the constants will be cpu dependent. There is no constant folding
though, this package is to check cpu capability at runtime, so I think an
API that looks like this is fine:

cpu.Has(feats cpu.Feature...)

E.g.:
cpu.Has(cpu.FloatingPoint, cpu.AES, ...)

Yeah, we need to figure out a way to name and category the CPU features.
Should each arch be in a separate package (e.g. internal/CPU/arm?) Or
prefix each feature with arch name? Do we consolidate common features
across arches?

minux on 11 Aug 2016

I think prefixing them with the arch name would be good. That seems to be the convention used for many other constants in golang (opcodes, relocations, etc.). That way there would be no confusion in case some cpu features are close but don't mean exactly the same thing on different architectures.

laboger on 11 Aug 2016

One idea is to use the same naming conventions that we have in glibc, for consistency (i.e. glibc/sysdeps/powerpc/dl-procinfo.c:_dl_powerpc_cap_flags). This way, the names match the ones shown when LD_SHOW_AUXV=1:

AT_HWCAP: vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_HWCAP2: tar isel ebb dscr htm arch_2_07

Querying for the capabilities should take into account whether an arch has a HWCAP2 or not, because there is no way to know (a priori) if a capability bit is in HWCAP or HWCAP2. My suggestion here is to use a concatenated 64-bit HWCAP+HWCAP2 mask, so we can easily map the bits to the capabilities. That's what we did in glibc/gcc to implement __builtin_cpu_is() / __builtin_cpu_supports() for Power.

ceseo on 11 Aug 2016

@minux are you already working on this? Thanks.

ceseo on 31 Aug 2016

No, I'm not. Please go ahead if you want to. Thanks.

minux on 1 Sep 2016

@mundaym I think that starting with a runtime/internal/sys approach is a good idea for now. My idea was to create something that would work similarly to __builtin_cpu_supports() in gcc. What do you think?

ceseo on 1 Sep 2016

I had a play with this last weekend. I ended up getting a bit lost in circular dependencies... You might want to just add (perhaps hackily) what you need directly into the runtime package for now and let it get cleaned up later.

Along those lines it would be easy to add the variables runtime.hwcap (already there for arm) and runtime.hwcap2 and then grab them using assembly in other packages when necessary. Ideally they'd both have the type uint32 and be defined and set in os_linux.go (rather than in an arch-dependent file).

The constants representing features could go in the runtime/internal/sys package but that means they can only be accessed in the runtime package. If you instead put them in a global package like internal/cpu then the runtime package can't get at them. That could be a way to get this proposal started though.

mundaym on 1 Sep 2016

@mundaym OK. We'll need these checks later, when we start to add the new ISA 3.0 (POWER9) instructions and write runtime optimizations using those.

For now, I was thinking about something trivial, like: https://github.com/ceseo/go/commit/ca09310b2e7deafbfbbd34b1532273a00eb69b2b

then evolve from that.

ceseo on 1 Sep 2016

CL https://golang.org/cl/31149 mentions this issue.

gopherbot on 18 Oct 2016

CL https://golang.org/cl/32330 mentions this issue.

gopherbot on 28 Oct 2016

For an internal package, it seems fine to experiment through the usual code review process. No approval needed here. There's runtime/internal/sys, for example. If some of that gets promoted to plain internal/sys or some other name, that seems OK.

rsc on 12 Dec 2016

@minux @laboger @ceseo
Did somebody plan to work on this in the near future or did already work on implementing an internal cpu package?

If not i would like to start on a CL for a small internal/cpu package and then have seperate CLs to clean up the std lib uses for cpu feature detection as listed in #19739.

I would just start by providing:
cpu.GOARCH.FLAG bools
and populate them on package init. That seems most similar to the current uses of the runtime flags i have seen and also decouples the std lib packages usages from the runtime.

FLAG would be named after the names Linux uses but in upper case.
e.g. for AMD64 linux uses: mmx, sse, sse2, sse3, sse4_1, see4_2, popcnt, aes, avx2, bmi1, bmi2, erms, ...

which results in: cpu.AMD64.AVX2 and cpu.AMD64.SSE4_1

In further iterations we can add HWCAP bit vectors and if needed more complex query functions or change the init such that the runtime queries the information first and we copy from there.

martisch on 28 Mar 2017

🎉1

@martisch no, it's not in my list. I already have a functional workaround for ppc64x. Feel free to start working on your CL.

I can help later by adding any code to make it work for ppc64x.

ceseo on 28 Mar 2017

@martisch, sounds reasonable, if LOUD.

rsc on 29 Mar 2017

😄4

After exploring the implementation details and replacing the uses of feature flags in the std lib - i changed the approach for the first version to:
cpu.Has_avx2 , cpu.Has_sse4_2 ...

This allows us to use the bools directly in assembler and go code.
Also has the advantage that we dont need extra code versions for amd64, 386, amd64p32 as they all can use the same code to query the internal/cpu package since the naming doesn't differ. Downsides are we can not put padding before and after the variables as easily as when they would be collected in an arch specific struct
and that the naming is less LOUD.

Different architectures can implement additional flags. When we find a name that conflicts between architectures they can share it as i guess for the foreseeable future go will only ever run on one architecture at the same time within a single go runtime.

I plan to have the first version ready for review next week after i finalized the initialization code and tested it on a few different cpus.

martisch on 7 Apr 2017

On Fri, Apr 7, 2017 at 4:38 PM, Martin Möhrmann notifications@github.com
wrote:

After exploring the implementation details and replacing the uses of
feature flags in the std lib - i changed the approach for the first version
to:
cpu.Has_avx2 , cpu.Has_sse4_2 ...

These are somewhat non-idiomatic. Maybe HasAVX2, HasSSE42, etc?

rsc on 7 Apr 2017

👍3

True. Was reading to many C/C++ style cpuid libraries it seems.
Will change the naming according to your suggestion. Fortunately
we dont seem to need HasSSSE3 currently :)

martisch on 7 Apr 2017

CL https://golang.org/cl/41950 mentions this issue.

gopherbot on 1 May 2017

CL https://golang.org/cl/41476 mentions this issue.

gopherbot on 1 May 2017

The current proposal seems to head to a direction where cpu.HasAVX2 wouldn't be defined for any platform but x86. While I understand the rationale, I'm sure this would eventually cause some small pain to programmers for little benefit. I suggest to design the package in a way that if GOARCH == "amd64" && cpu.HasAVX2 works and can be compiled on all platforms.

rasky on 2 May 2017

I don't see any problem with cpu.HasAVX2 being defined on non-x86 platforms. It isn't useful, but it won't hurt anyone. It's likely that cpu.HasAVX2 is only referenced from _am64.go files, so HasAVX2 won't even make it to the binary.
But I think the proposal has names like like cpu.amd64.HasAVX2, which is clearer, provides a better godoc experience, and avoids potential name conflicts between architectures.

randall77 on 2 May 2017

@randall77
Assuming amd64 is a struct.
Can we reference cpu.amd64.HasAVX2 in go assembler without hard coding a numeric offset into the struct?

To answer myself: it seems we can use something like cpu·Amd64+Amd64_HasAVX2(SB)
if we define a type for the struct. Will test.

martisch on 2 May 2017

@martisch: There are macros defined in assembly which contain struct offsets. So I think something like:

MOVQ cpu.amd64+amd64_HasAVX2(SB), AX

should work. I'm not sure if those offsets are defined across packages.

A package per architecture might work better.

randall77 on 2 May 2017

👍1

thanks randall. Will test it.

Not sure if we want a package/struct per architecture.
E.g having one X86 package/struct would allows us to write cpu.X86.HasSSSE3 instead of cpu.Amd64.HasSSSE3 || cpu.Amd64p32.HasSSSE3 || cpu.386.HasSSSE3. And the detection code is identical. On the other hand amd64, amd64p32 and 386 is more familiar since it is used by goarch.

martisch on 2 May 2017

with a struct type x86 it works within the package when adding #include "go_asm.h"
e.g. for CMPB internal∕cpu·X86+x86_HasSSE41(SB), $1
It does indeed not seem to work across package boundaries.

A possible workaround is to declare a package level variable
var useSSE41 = cpu.X86.HasSSE41
and then use the useSSE41 variable in assembler.

martisch on 2 May 2017

Basic support for feature flag detection has been added and the std lib usage of cpuid for x86 has been unified.

The runtime x86 cpuid detection has been consolidated a bit too and is separate from internal/cpu. There are some more cleanups i like to add in go1.10. e.g. padding of runtime cpu flags and moving the rest of cpuflags_amd64.go into rt0.go.

I also plan to work on extending internal/cpu in go1.10 by allowing to mask cpu features in internal/cpu so we can test different code paths in the std lib #12805.

martisch on 4 Jun 2017

I have a question: what's the "correct" way of implementing this for architectures that do not have a cpuid instruction? For example, on ppc64x, we currently rely on HWCAP/HWCAP2 bits for that.

ceseo on 21 Jun 2017

@ceseo, whatever the hardware provides, I guess. If needed (if the instruction is expensive) we can call it once on process startup and cache the results in a global variable.
It can be more challenging if we have to test instructions & catch unimplemented faults, or ask the OS. We can fight that hurdle if it becomes necessary.

randall77 on 22 Jun 2017

@randall77 Power doesn't have anything in the hardware for identifying capabilities/ISA level. That's why we use the HWCAP/HWCAP2 bits exposed by the kernel. In glibc, for instance, I had to modify the TLS ABI and write the HWCAP/HWCAP2 bits inside the TCB, so that __builtin_cpu_is/__builtin_cpu_supports in gcc can read the information from somewhere.

In Go, we currently initialize a struct in runtime/os_linux_ppc64x.go and read from there in runtime to identify capabilities. However, that's heavily tied to the initialization procedures in runtime/os_linux.go.

Do you have any suggestions for adapting what we currently have so that it can reside inside the new internal/cpu package?

ceseo on 22 Jun 2017

For internal/ you can just use linkname to expose otherwise unexported parts of of its api.

For instance, you could have the runtime call into internal/cpu and pass it the auxv array so it can initialize itself. Initialization order is tricky but that's always the case with early initialization like this.

randall77 on 22 Jun 2017

I would favor we keep as much isolation of code between runtime and internal/cpu as possible.
However i see how that can not be achieved 100% for e.g. ppc64.

Since runtime and internal/cpu go versions will stay in sync i think we can expose (not officially export) the HWCAP/HWCAP2 bits the runtime uses and then read those from a internal/cpu (assembler) function.
https://github.com/golang/go/blob/master/src/cmd/compile/internal/gc/builtin/runtime.go

This would keep any code of internal/cpu out of the runtime and if we ever need to read that information elsewhere e.g. for something similar like the popcnt intrinsic for x86 we already have that info readable.

There might be downsides to that approach vs using linkname i am missing. If we want to stay with Go code linkname seems the way to go.

martisch on 22 Jun 2017

You don't need assembly, you just need to define in internal/cpu:

var hwcap uint32

and then in runtime:

//go:linkname cpu_hwcap internal/cpu.cap
var cpu_hwcap uint32

and then when hwcap is processed in runtime today, just set cpu_hwcap too. Then internal/cpu can reads the hwcap variable.

Alternately, if runtime can import internal/cpu (I don't particularly see why not), then internal/cpu can export a SetHwcap function and runtime can just call it, no linkname magic required.

rsc on 22 Jun 2017

I think generally importing internal/cpu in runtime wont work currently as it will result in an import cycle.

martisch on 22 Jun 2017

I don't see any imports in internal/cpu at all. It should be fine for low-level runtime to import it. I would wait for Go 1.10 though.

rsc on 23 Jun 2017

I checked with

go list -f '{{range .Deps}} {{.}} {{end}}' $PACKAGENAME

similar to what cmd/dist/mkdeps.bash is using
which even for an empty package in internal/ gives the dependencies:
runtime runtime/internal/atomic runtime/internal/sys unsafe

in runtime/internal:
runtime/internal/sys

and ./make.bash and cmd/dist/mkdeps.bash give "can't load package: import cycle not allowed" errors for internal/cpu.

Could be it is only the tooling that needs to be adjusted for the import to work without cycle errors.

martisch on 25 Jun 2017

The go tool knows that every package outside runtime, other than unsafe, depends on runtime. See Package.load in cmd/go/internal/load/pkg.go. If we want the runtime package to import internal/cpu that code will have to be adjusted.

ianlancetaylor on 25 Jun 2017

@ianlancetaylor I see. So, do you think using linknames is a more reasonable approach in this case?

ceseo on 27 Jun 2017

@ceseo I think it would be fine to adjust the go tool to permit the internal/cpu package to not depend on the runtime package, assuming of course that internal/cpu never needs to import runtime. It's an internal package, so while doing this would permit core Go developers to make a horrible error it should never affect any users of Go.

ianlancetaylor on 27 Jun 2017

What's the correct way of re-generating cmd/dist/deps.go? Just call mkdeps.bash directly?

ceseo on 7 Jul 2017

Yes.

bradfitz on 7 Jul 2017

Change https://golang.org/cl/53830 mentions this issue: runtime, internal/cpu: CPU capabilities detection for ppc64x

gopherbot on 8 Aug 2017

Since many interested parties are already in this thread and it is still open i reply here (if wanted i can create a new proposal).

I started prototyping disabling cpu features via internal/cpu e.g. for testing. https://golang.org/cl/91737
The idea would be that for now one can only disable them from the beginning via an environment variable and not enable or disable them during runtime so packages can cache combined feature variables and initialize lookup tables for some special implementations on init.

We already have GODEBUG and GOGC so i would propose GOCPU to disable cpu features so that whatever GO package implements cpu feature detection is the only consumer of GOCPU and there are no overlaps with GODEBUG.

An example to run a test with AVX and SSE41 on amd64 disabled could look like:
GOCPU=avx=0,sse41=0 go test ...

I would propose lower case feature names as it seems more readable and is in line with GODEBUG options.

The special key "all" can be used to set features to the minimal set of features required by the current go implementation. So GOCPU=all=0 can be used going forward to run and test go programs with minimal cpu feature requirements. This way some builders with all=0 could be
set up to detect breakages related to some basic implementations of some algorithms.

Another feature for cpu/internal that i think would be useful also for testing of internal/cpu itself is to support a variable in GODEBUG e.g. cpudetail=1 that prints the detected and disabled features. Could also be in GOCPU but it seems cleaner to me to use GOCPU only for cpu features.

Functions in the runtime and e.g. suppport_popcnt checks emitted by the compiler wont be covered currently but unifying/merging runtime and internal/cpu into one overall detection functionality in go would solve that. (happy to work on this too for go1.11)

martisch on 4 Feb 2018

I don't understand the goal. Is it just for regression testing? Would it be documented and exposed to the users?

rasky on 4 Feb 2018

It would be documented and exposed like GODEBUG.

One use case is testing (#12805), another I see is benchmarking different implementations that require different cpu capabilities against each other without the need for code changes. It could also be used to force running the same code paths on two different machines with different cpu capabilities for better debugging or reproducibility of errors.

martisch on 5 Feb 2018

Change https://golang.org/cl/91737 mentions this issue: internal/cpu: use GOCPU environment variable to disable cpu features

gopherbot on 10 May 2018

Closing this issue as there is internal/cpu and x/sys/cpu now.

martisch on 31 May 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

cmd/compile: regression in 69a7c15

OneOfOne · 3Comments

x/tools/cmd/godoc: not work with symbolic links

gopherbot · 3Comments

cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64

rsc · 3Comments

x/build/cmd/coordinator: ssh proxy should support scp

bradfitz · 3Comments

proposal: cmd/vet: vet should warn when time.Time type (or types embed it) is used as map keys.

go101 · 3Comments