We have a lot of packages in the std that uses specialized instructions
that are not always present. Most of the times, the package will detect
the required features by themselves. This is fine but will lead to code
duplication. And as Bryan Chan mentioned on https://golang.org/cl/22201,
even if the processor provides way to detect certain optional features,
it's still better to use AT_HWCAP from Linux auxv because that also
takes kernel support into account.
Only the runtime can access auxv, so it makes sense for the runtime
to query the processor capabilities and provide that to the packages.
I propose that we add an internal package internal/cpu that exposes
capability flags for the current processor so that each std packages
could query it directly instead of having a runtime detection routine
that duplicates the work.
Another benefit is that, some processors, like ARM, doesn't provide
a way to do runtime capability detection, so we have to rely on the
kernel to provide this information. Different kernels provide different
mechanisms for this (sysctl for BSD and auxv for linux), so providing
a package that abstracts those OS-dependent feature away is also
beneficial.
We might promote the package to runtime/cpu if deemed fit, but that's
out of the scope for this proposal.
The package could be modeled after the Linux's AT_HWCAP bits,
and it will be processor dependent.
I'm happy to work on adding this (particularly the s390x bit).
I'm interested in what thoughts people have for the API. The three options that occur to me are:
cpu.Has("ssse3", "popcnt")cpu.Has(cpu.SSSE3, cpu.POPCNT))cpu.SSSE3 && cpu.POPCNT)I'm not sure if we want to prefix with the CPU type to avoid naming conflicts. If so then perhaps s/SSSE3/AMD64.SSSE3/.
It might also be nice if the function (or bools) could be inlined and constant folded. In that case I suspect we'd need to limit the number of features to be checked to one per call. This could be useful in scenarios where a feature is optional on say i386/ppc64, but mandatory on newer versions of the architecture such as amd64/ppc64le.
Another possibility is that we do this in the runtime/internal/sys package. Something like sys.Feature("ssse3") perhaps? There are variables in sys that might be useful to an application as well and could be made part of a public cpu package were we to ever go down that route.
BTW @minux when you say:
it will be processor dependent
Do you mean you want the API to be processor dependent?
I mean the constants will be cpu dependent. There is no constant folding
though, this package is to check cpu capability at runtime, so I think an
API that looks like this is fine:
cpu.Has(feats cpu.Feature...)
E.g.:
cpu.Has(cpu.FloatingPoint, cpu.AES, ...)
Yeah, we need to figure out a way to name and category the CPU features.
Should each arch be in a separate package (e.g. internal/CPU/arm?) Or
prefix each feature with arch name? Do we consolidate common features
across arches?
I think prefixing them with the arch name would be good. That seems to be the convention used for many other constants in golang (opcodes, relocations, etc.). That way there would be no confusion in case some cpu features are close but don't mean exactly the same thing on different architectures.
One idea is to use the same naming conventions that we have in glibc, for consistency (i.e. glibc/sysdeps/powerpc/dl-procinfo.c:_dl_powerpc_cap_flags). This way, the names match the ones shown when LD_SHOW_AUXV=1:
AT_HWCAP: vsx arch_2_06 dfp ic_snoop smt mmu fpu altivec ppc64 ppc32
AT_HWCAP2: tar isel ebb dscr htm arch_2_07
Querying for the capabilities should take into account whether an arch has a HWCAP2 or not, because there is no way to know (a priori) if a capability bit is in HWCAP or HWCAP2. My suggestion here is to use a concatenated 64-bit HWCAP+HWCAP2 mask, so we can easily map the bits to the capabilities. That's what we did in glibc/gcc to implement __builtin_cpu_is() / __builtin_cpu_supports() for Power.
@minux are you already working on this? Thanks.
No, I'm not. Please go ahead if you want to. Thanks.
@mundaym I think that starting with a runtime/internal/sys approach is a good idea for now. My idea was to create something that would work similarly to __builtin_cpu_supports() in gcc. What do you think?
I had a play with this last weekend. I ended up getting a bit lost in circular dependencies... You might want to just add (perhaps hackily) what you need directly into the runtime package for now and let it get cleaned up later.
Along those lines it would be easy to add the variables runtime.hwcap (already there for arm) and runtime.hwcap2 and then grab them using assembly in other packages when necessary. Ideally they'd both have the type uint32 and be defined and set in os_linux.go (rather than in an arch-dependent file).
The constants representing features could go in the runtime/internal/sys package but that means they can only be accessed in the runtime package. If you instead put them in a global package like internal/cpu then the runtime package can't get at them. That could be a way to get this proposal started though.
@mundaym OK. We'll need these checks later, when we start to add the new ISA 3.0 (POWER9) instructions and write runtime optimizations using those.
For now, I was thinking about something trivial, like: https://github.com/ceseo/go/commit/ca09310b2e7deafbfbbd34b1532273a00eb69b2b
then evolve from that.
CL https://golang.org/cl/31149 mentions this issue.
CL https://golang.org/cl/32330 mentions this issue.
For an internal package, it seems fine to experiment through the usual code review process. No approval needed here. There's runtime/internal/sys, for example. If some of that gets promoted to plain internal/sys or some other name, that seems OK.
@minux @laboger @ceseo
Did somebody plan to work on this in the near future or did already work on implementing an internal cpu package?
If not i would like to start on a CL for a small internal/cpu package and then have seperate CLs to clean up the std lib uses for cpu feature detection as listed in #19739.
I would just start by providing:
cpu.GOARCH.FLAG bools
and populate them on package init. That seems most similar to the current uses of the runtime flags i have seen and also decouples the std lib packages usages from the runtime.
FLAG would be named after the names Linux uses but in upper case.
e.g. for AMD64 linux uses: mmx, sse, sse2, sse3, sse4_1, see4_2, popcnt, aes, avx2, bmi1, bmi2, erms, ...
which results in: cpu.AMD64.AVX2 and cpu.AMD64.SSE4_1
In further iterations we can add HWCAP bit vectors and if needed more complex query functions or change the init such that the runtime queries the information first and we copy from there.
@martisch no, it's not in my list. I already have a functional workaround for ppc64x. Feel free to start working on your CL.
I can help later by adding any code to make it work for ppc64x.
@martisch, sounds reasonable, if LOUD.
After exploring the implementation details and replacing the uses of feature flags in the std lib - i changed the approach for the first version to:
cpu.Has_avx2 , cpu.Has_sse4_2 ...
This allows us to use the bools directly in assembler and go code.
Also has the advantage that we dont need extra code versions for amd64, 386, amd64p32 as they all can use the same code to query the internal/cpu package since the naming doesn't differ. Downsides are we can not put padding before and after the variables as easily as when they would be collected in an arch specific struct
and that the naming is less LOUD.
Different architectures can implement additional flags. When we find a name that conflicts between architectures they can share it as i guess for the foreseeable future go will only ever run on one architecture at the same time within a single go runtime.
I plan to have the first version ready for review next week after i finalized the initialization code and tested it on a few different cpus.
On Fri, Apr 7, 2017 at 4:38 PM, Martin M枚hrmann notifications@github.com
wrote:
After exploring the implementation details and replacing the uses of
feature flags in the std lib - i changed the approach for the first version
to:
cpu.Has_avx2 , cpu.Has_sse4_2 ...These are somewhat non-idiomatic. Maybe HasAVX2, HasSSE42, etc?
True. Was reading to many C/C++ style cpuid libraries it seems.
Will change the naming according to your suggestion. Fortunately
we dont seem to need HasSSSE3 currently :)
CL https://golang.org/cl/41950 mentions this issue.
CL https://golang.org/cl/41476 mentions this issue.
The current proposal seems to head to a direction where cpu.HasAVX2 wouldn't be defined for any platform but x86. While I understand the rationale, I'm sure this would eventually cause some small pain to programmers for little benefit. I suggest to design the package in a way that if GOARCH == "amd64" && cpu.HasAVX2 works and can be compiled on all platforms.
I don't see any problem with cpu.HasAVX2 being defined on non-x86 platforms. It isn't useful, but it won't hurt anyone. It's likely that cpu.HasAVX2 is only referenced from _am64.go files, so HasAVX2 won't even make it to the binary.
But I think the proposal has names like like cpu.amd64.HasAVX2, which is clearer, provides a better godoc experience, and avoids potential name conflicts between architectures.
@randall77
Assuming amd64 is a struct.
Can we reference cpu.amd64.HasAVX2 in go assembler without hard coding a numeric offset into the struct?
To answer myself: it seems we can use something like cpu路Amd64+Amd64_HasAVX2(SB)
if we define a type for the struct. Will test.
@martisch: There are macros defined in assembly which contain struct offsets. So I think something like:
MOVQ cpu.amd64+amd64_HasAVX2(SB), AX
should work. I'm not sure if those offsets are defined across packages.
A package per architecture might work better.
thanks randall. Will test it.
Not sure if we want a package/struct per architecture.
E.g having one X86 package/struct would allows us to write cpu.X86.HasSSSE3 instead of cpu.Amd64.HasSSSE3 || cpu.Amd64p32.HasSSSE3 || cpu.386.HasSSSE3. And the detection code is identical. On the other hand amd64, amd64p32 and 386 is more familiar since it is used by goarch.
with a struct type x86 it works within the package when adding #include "go_asm.h"
e.g. for CMPB internal鈭昪pu路X86+x86_HasSSE41(SB), $1
It does indeed not seem to work across package boundaries.
A possible workaround is to declare a package level variable
var useSSE41 = cpu.X86.HasSSE41
and then use the useSSE41 variable in assembler.
Basic support for feature flag detection has been added and the std lib usage of cpuid for x86 has been unified.
The runtime x86 cpuid detection has been consolidated a bit too and is separate from internal/cpu. There are some more cleanups i like to add in go1.10. e.g. padding of runtime cpu flags and moving the rest of cpuflags_amd64.go into rt0.go.
I also plan to work on extending internal/cpu in go1.10 by allowing to mask cpu features in internal/cpu so we can test different code paths in the std lib #12805.
I have a question: what's the "correct" way of implementing this for architectures that do not have a cpuid instruction? For example, on ppc64x, we currently rely on HWCAP/HWCAP2 bits for that.
@ceseo, whatever the hardware provides, I guess. If needed (if the instruction is expensive) we can call it once on process startup and cache the results in a global variable.
It can be more challenging if we have to test instructions & catch unimplemented faults, or ask the OS. We can fight that hurdle if it becomes necessary.
@randall77 Power doesn't have anything in the hardware for identifying capabilities/ISA level. That's why we use the HWCAP/HWCAP2 bits exposed by the kernel. In glibc, for instance, I had to modify the TLS ABI and write the HWCAP/HWCAP2 bits inside the TCB, so that __builtin_cpu_is/__builtin_cpu_supports in gcc can read the information from somewhere.
In Go, we currently initialize a struct in runtime/os_linux_ppc64x.go and read from there in runtime to identify capabilities. However, that's heavily tied to the initialization procedures in runtime/os_linux.go.
Do you have any suggestions for adapting what we currently have so that it can reside inside the new internal/cpu package?
For internal/ you can just use linkname to expose otherwise unexported parts of of its api.
For instance, you could have the runtime call into internal/cpu and pass it the auxv array so it can initialize itself. Initialization order is tricky but that's always the case with early initialization like this.
I would favor we keep as much isolation of code between runtime and internal/cpu as possible.
However i see how that can not be achieved 100% for e.g. ppc64.
Since runtime and internal/cpu go versions will stay in sync i think we can expose (not officially export) the HWCAP/HWCAP2 bits the runtime uses and then read those from a internal/cpu (assembler) function.
https://github.com/golang/go/blob/master/src/cmd/compile/internal/gc/builtin/runtime.go
This would keep any code of internal/cpu out of the runtime and if we ever need to read that information elsewhere e.g. for something similar like the popcnt intrinsic for x86 we already have that info readable.
There might be downsides to that approach vs using linkname i am missing. If we want to stay with Go code linkname seems the way to go.
You don't need assembly, you just need to define in internal/cpu:
var hwcap uint32
and then in runtime:
//go:linkname cpu_hwcap internal/cpu.cap
var cpu_hwcap uint32
and then when hwcap is processed in runtime today, just set cpu_hwcap too. Then internal/cpu can reads the hwcap variable.
Alternately, if runtime can import internal/cpu (I don't particularly see why not), then internal/cpu can export a SetHwcap function and runtime can just call it, no linkname magic required.
I think generally importing internal/cpu in runtime wont work currently as it will result in an import cycle.
I don't see any imports in internal/cpu at all. It should be fine for low-level runtime to import it. I would wait for Go 1.10 though.
I checked with
go list -f '{{range .Deps}} {{.}} {{end}}' $PACKAGENAME
similar to what cmd/dist/mkdeps.bash is using
which even for an empty package in internal/ gives the dependencies:
runtime runtime/internal/atomic runtime/internal/sys unsafe
in runtime/internal:
runtime/internal/sys
and ./make.bash and cmd/dist/mkdeps.bash give "can't load package: import cycle not allowed" errors for internal/cpu.
Could be it is only the tooling that needs to be adjusted for the import to work without cycle errors.
The go tool knows that every package outside runtime, other than unsafe, depends on runtime. See Package.load in cmd/go/internal/load/pkg.go. If we want the runtime package to import internal/cpu that code will have to be adjusted.
@ianlancetaylor I see. So, do you think using linknames is a more reasonable approach in this case?
@ceseo I think it would be fine to adjust the go tool to permit the internal/cpu package to not depend on the runtime package, assuming of course that internal/cpu never needs to import runtime. It's an internal package, so while doing this would permit core Go developers to make a horrible error it should never affect any users of Go.
What's the correct way of re-generating cmd/dist/deps.go? Just call mkdeps.bash directly?
Yes.
Change https://golang.org/cl/53830 mentions this issue: runtime, internal/cpu: CPU capabilities detection for ppc64x
Since many interested parties are already in this thread and it is still open i reply here (if wanted i can create a new proposal).
I started prototyping disabling cpu features via internal/cpu e.g. for testing. https://golang.org/cl/91737
The idea would be that for now one can only disable them from the beginning via an environment variable and not enable or disable them during runtime so packages can cache combined feature variables and initialize lookup tables for some special implementations on init.
We already have GODEBUG and GOGC so i would propose GOCPU to disable cpu features so that whatever GO package implements cpu feature detection is the only consumer of GOCPU and there are no overlaps with GODEBUG.
An example to run a test with AVX and SSE41 on amd64 disabled could look like:
GOCPU=avx=0,sse41=0 go test ...
I would propose lower case feature names as it seems more readable and is in line with GODEBUG options.
The special key "all" can be used to set features to the minimal set of features required by the current go implementation. So GOCPU=all=0 can be used going forward to run and test go programs with minimal cpu feature requirements. This way some builders with all=0 could be
set up to detect breakages related to some basic implementations of some algorithms.
Another feature for cpu/internal that i think would be useful also for testing of internal/cpu itself is to support a variable in GODEBUG e.g. cpudetail=1 that prints the detected and disabled features. Could also be in GOCPU but it seems cleaner to me to use GOCPU only for cpu features.
Functions in the runtime and e.g. suppport_popcnt checks emitted by the compiler wont be covered currently but unifying/merging runtime and internal/cpu into one overall detection functionality in go would solve that. (happy to work on this too for go1.11)
I don't understand the goal. Is it just for regression testing? Would it be documented and exposed to the users?
It would be documented and exposed like GODEBUG.
One use case is testing (#12805), another I see is benchmarking different implementations that require different cpu capabilities against each other without the need for code changes. It could also be used to force running the same code paths on two different machines with different cpu capabilities for better debugging or reproducibility of errors.
Change https://golang.org/cl/91737 mentions this issue: internal/cpu: use GOCPU environment variable to disable cpu features
Closing this issue as there is internal/cpu and x/sys/cpu now.
Most helpful comment
@martisch, sounds reasonable, if LOUD.