const cacheLineSize
and thought that could be useful if it were exported.My use case is the same as the package, adding padding to structs to prevent false sharing.
/cc @bradfitz @ianlancetaylor
I think it'd be fine to export cacheLineSize
in x/sys/cpu
.
If these are exported i suggest to add a warning in the documentation that these are best effort guesses based on GOARCH and might not be the cpus actual cache line size.
They shouldn't be used anywhere where their correctness matters for the programs correctness to avoid issues like:
http://www.mono-project.com/news/2016/09/12/arm64-icache/
C++17 introduced std::hardware_{constructive,destructive}_interference_size
, it might be worth considering a similar approach if we are exposing the API:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html
Like @martisch, I think it's problematic to export cacheLineSize
as a constant. It's not the same on different processors with the same GOARCH
value.
I would be OK with exporting it as a variable if we can arrange to set the variable correctly.
@ianlancetaylor How about instead of a const CacheLineSize
there's a type Pad [cacheLineSize]byte
so it's clear the intended usage is only to add padding to structs.
An array type doesn't sound very flexible, as it doesn't permit adjusting the padding to be only what is required to avoid cache contention.
A constant like MaxCacheLineSize
would be OK with me.
Regarding the var CacheLineSize
, Intel's cpuid package has this implemented for x86 so maybe this would be useful as a reference.
Change https://golang.org/cl/111775 mentions this issue: cpu: export cache line size
I'm uncomfortable with the name MaxCacheLineSize
. It could, for example, refer to the L2 cache line size if that is larger than the L1 cache line size.
Since the only use case identified so far is to avoid false sharing I think we should use a name like DestructiveIntereferenceSize
, FalseSharingSize
, SharedInterferenceSize
or similar. This would also allow us to increase it if there are architectural quirks, such as cache line prefetching, that make it desirable to pad (and maybe align in Go 2) data structures to more than the cache line size.
Calling it CacheLineSize or MaxCacheLineSize does seem to promise more than we need and maybe more than we can deliver.
Would it be enough to introduce:
type CacheLinePad [64]byte
appropriately sized for each system, and leave it at that? At least then it's clear what it's for, and people would be discouraged from using it for other things.
Still doesn't seem particularly clean but maybe this is just not something we can do cleanly.
In https://github.com/golang/go/issues/25203#issuecomment-386065278 @ianlancetaylor mentioned that an array type might be a bit too inflexible. I also see @mundaym's and @rsc's point about the name being ambiguous and/or promising something we can't deliver.
I think renaming the const to either DestructiveIntereferenceSize
, FalseSharingSize
, SharedInterferenceSize
might make less of a promise while still allowing a certain kind of flexibility. What do you think?
Since I made the proposal thought I'd give my opinions.
I don't really care what you guys call it, but I do see the benefit of a const
over a type
which allows fine tuning the padding e.g.
type foo struct {
_ [cpu.CacheLineSize]byte
x, y, z int
_ [cpu.CacheLineSize - 24]byte
}
I'm not too sure what's the use case of a var
but since @ianlancetaylor brought it up maybe he knows of one?
A const is fine with me as long as it's clear that it represents the maximum cache line size of a particular architecture. And, of course, it may change over time. I was thinking of a var because within a single architecture different processors have different cache line sizes. But since the main use seems to be for struct padding, a var is not helpful.
An array type is more awkward than a const in some cases but is perhaps more clear for the intended use. The array type can still be used in the same way as the const by calling len
.
I feel like there could be some benefit with naming the const similar to C++17 (less confusion over what it represents), tho that being said DestructiveIntereferenceSize
is way to long.
As far as I can tell, DestructiveIntereferenceSize
and ConstructiveIntereferenceSize
are usually the same value.
So would everyone be ok with just IntereferenceSize
?
Unless there is a situation where DestructiveIntereferenceSize
would be different from ConstructiveIntereferenceSize
then we should probably have both of them?
Unless there is a situation where
DestructiveIntereferenceSize
would be different fromConstructiveIntereferenceSize
then we should probably have both of them?
To avoid destructive interference we want to make sure that the data is in two separate cache lines. To ensure constructive interference we want data to be in the same cache line. That means that for DestructiveInterferenceSize
we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize
we would want the minimum cache line size for the architecture. So yes, I think they can be different.
That said I'm not sure how you could safely/portably use ConstructiveInterferenceSize
in a Go program, but maybe someone else can think of something.
That means that for DestructiveInterferenceSize we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize we would want the minimum cache line size for the architecture. So yes, I think they can be different.
Ahh, now that makes more sense.
In the sample code from the site you linked, ConstructiveInterferenceSize
usage looks like
static_assert(sizeof(together) <= hardware_constructive_interference_size);
So I was thinking maybe just put this check into a test?
What is the min and max cache line size?
Check ConstructiveInterferenceSize and DestructiveInterferenceSize.
Why is it called that?
To _prevent_ confusion.
There is very little chance that this will prevent confusion outside of a specific set of users who are already familiar with this hideous name. It doesnt even make too much sense in an engineering context unless you hunt down the specific proposal in the c++ realm and extrapolate its meaning from that document. I have never heard this word in the context given here, unlike cachelinesize.
Its also confusing because of the semantics of constructive and destructive -> min, max.
Sorry for leaving the bike without hair, but I think the name is bad.
@as since you down voted every comment with a proposed name, even MaxCacheLineSize
What would you suggest be a good name?
Sorry just noticed you up voted type CacheLinePad [64]byte
In that case, where would the min cache size fit? Or do you not think it's needed?
The c++ proposal:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html
Destructive interference size: a number that’s _suitable_ as an offset between two objects to _likely_ avoid false-sharing due to different runtime access patterns from different threads.
Constructive interference size: a number that’s _suitable_ as a _limit_ on two objects’ _combined_ memory footprint size and base alignment to _likely_ promote true-sharing between them.
A quote from the author of the proposal who introduced the proposed terms that C++ uses. The author wrote (this was from a stackoverflow question):
I authored the proposal, great answer! To clarify one point you made: "I would almost always expect these values to be the same. I believe the only reason they are declared separately is for completeness.". Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort). On constexpr: it's required to constexpr for the value to be usable in determining struct layout. Runtime values are useful in other circumstances. – JF Bastien
It appears that the two values should be the same, and were only included for _completeness_. I don't know what completeness means in this context exactly. A few articles suggest that false sharing on L2 is not possible where as at least one family of benchmarks suggests performance gains occur when data are padded to a multiple of the L1 cache line size (it implies that some prefetching is done to L2 in pairs, so that multiple is 2 and the value used is 128 instead of 64).
The question I have is: should Go be clever like C++, and create effect-based variables/values like Constructive/Destructive
or should it stick to naming the values for what they represent? To me there is an additional layer of incomprehensible _magic_ with the effect-based names, the primary reason I would prefer not to use those names (the second is they are too long).
Regarding Min/Max
cache line size, I don't know what the right answer is because it is not intuitive what Min/Max means either. Whereas cache line size is a phrase almost always associated with the size of the L1 cache lines / cache blocks. To speculatively use min/max without the possibility of >2 qualifiers seems to make a lot of assumptions that I don't think are clearly stated.
@as the statement:
It appears that the two values should be the same, and were only included for completeness
Would appear to be contradicted by the authors answer :
Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort).
So in situations where an ISA has shipped with different cache line sizes (like the arm64 case linked in @martisch's reply) the values may be different. Since gc doesn't expose a way to specify the target arch the constructive and destructive interference sizes will therefore always be different on arm64. The same is presumably true for the new wasm backend.
I am also starting to think that the type CacheLinePad [N]byte
suggestion might be best however. It is easy to understand while not making any promises about being the cache line size (so, for example, it could presumably be double the maximum L1 cache size if that is desirable for a given arch).
Seems using the cache line size as padding to avoid false sharing is not sufficient on newer Intel architectures and on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too:
Intel® 64 and IA-32 Architectures Optimization Reference Manual
2.3.5.4 Data Prefetching
Spatial Prefetcher: This prefetcher strives to complete every cache line fetched to the L2 cache with
the pair line that completes it to a 128-byte aligned chunk
This was discussed and benchmarked in https://groups.google.com/forum/#!topic/mechanical-sympathy/i3-M2uCYTJE.
Also seems a case were Destructive and Constructive inference size can diverge.
Using CacheLInePad[N]byte or another name sounds like the better option to account for the original use case that should be solved. This also avoids pinning the meaning to another programming language/standard when using e.g. ConstructiveInterferenceSize.
For the points given above, I too would agree type CacheLInePad [N]byte
makes more sense here.
So should we talk about a const
or func
to test for true sharing since we're already on the topic? Or should that just go to a new proposal?
@martisch
on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too
[…]
Also seems a case were Destructive and Constructive inference size can diverge.
How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.
How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.
I was thinking about it in the context of the amd64 platform as a whole as i assumed we likely wont do runtime detection of these sizes for use in padding structs if we use the current const approach. In amd64 we have some cpu doing spatial prefetching and some not. So Constructive seems to be ok at 64bytes (does somebody know a am64 compatible cpu thats used widely with less than 64byte cache lines?) but for some newer amd64 compatible cpus there is some Destructive interference within 128byte (might not be as bad as within the same cache line).
Soo I wrote a false/true sharing benchmark using amd64 numbers (64 and 128). Instead of guessing, if anyone has access to those cpus, just run the benchmark and let the numbers speak for themselves.
https://play.golang.org/p/hp_MXQWEJvB (edit: removed useless middle align tests)
I have an old intel so constructive and destructive sizes are the same.
Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
BenchmarkFalseSharing/NoPad-12 5000 270183 ns/op
BenchmarkFalseSharing/Pad64Start-12 10000 165659 ns/op
BenchmarkFalseSharing/Pad64StartEnd-12 10000 165744 ns/op
BenchmarkFalseSharing/Pad128Start-12 10000 164528 ns/op
BenchmarkFalseSharing/Pad128StartEnd-12 10000 167642 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-12 10000 231060 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-12 5000 241121 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-12 5000 247388 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-12 10000 169730 ns/op
BenchmarkTrueSharing/<64-12 2000 602211 ns/op
BenchmarkTrueSharing/>64-12 500 2680956 ns/op
BenchmarkTrueSharing/>128-12 500 3555779 ns/op
I don't see much difference here between 64 and 128 bytes.
Middle has some difference, but alignment is weird for those examples so I'm not sure we can conclude anything from them.
If the cache line were 64 bytes then I can understand why middle 128 would be better than 64.
64 - 12 = 52 bytes
So for pad 64 middle the first 64 bytes would contain field x and so it's the same as no padding. I guess I should remove the middle benches.
Weird that on my i7 middle performs the same on 64 vs 128. (cpuid reports cache line of 64 bytes)
BenchmarkFalseSharing/NoPad-4 10000 194714 ns/op
BenchmarkFalseSharing/Pad64Start-4 10000 168577 ns/op
BenchmarkFalseSharing/Pad64StartEnd-4 10000 172899 ns/op
BenchmarkFalseSharing/Pad128Start-4 10000 170881 ns/op
BenchmarkFalseSharing/Pad128StartEnd-4 10000 174534 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-4 10000 156478 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-4 10000 155024 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-4 10000 148957 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-4 10000 151861 ns/op
BenchmarkTrueSharing/<64-4 2000 690212 ns/op
BenchmarkTrueSharing/>64-4 500 2205934 ns/op
BenchmarkTrueSharing/>128-4 500 2675777 ns/op
OK, does anyone object to CacheLinePad
(as in https://github.com/golang/go/issues/25203#issuecomment-387189236), with a size yet to be determined?
@rsc nope, I think pretty much everyone who commented here think that makes the most sense for the padding.
Revising my earlier comment, instead of:
type CacheLinePad [64]byte
We should make this an opaque struct so that the only interesting property of the type is that it has the given size:
type CacheLinePad struct { _ [64]byte }
Then code cannot depend on it being an array somehow. But otherwise, proposal accepted.
Now that I think about it, using unsafe.Sizeof() is prob better/clearer than len() and other advance usage of the padding size prob would result in using unsafe anyway.
So I 100% agree the struct type is the best so far.
Change https://golang.org/cl/116276 mentions this issue: runtime,internal/cpu: replace sys.CacheLineSize for padding by cpu.CacheLinePad
Made above CL to see how CacheLinePad could be used when added to internal/cpu (the package x/sys/cpu was forked from) and applied to the runtime package which already imports internal/cpu. Most uses of runtime/internal/sys.CacheLineSize (except two) are for padding. This would also remove the redundant definition of CacheLineSize per Arch in runtime/internal/sys and internal/cpu.
https://golang.org/cl/111775 (the CL for x/sys/cpu
) is now also updated to provide a CacheLinePad
type as suggested in https://github.com/golang/go/issues/25203#issuecomment-394492749 and as implemented for internal/cpu
by @martisch in https://golang.org/cl/116276.
Change https://golang.org/cl/126601 mentions this issue: runtime: replace sys.CacheLineSize for padding by cpu.CacheLinePad
Most helpful comment
@as the statement:
Would appear to be contradicted by the authors answer :
So in situations where an ISA has shipped with different cache line sizes (like the arm64 case linked in @martisch's reply) the values may be different. Since gc doesn't expose a way to specify the target arch the constructive and destructive interference sizes will therefore always be different on arm64. The same is presumably true for the new wasm backend.
I am also starting to think that the
type CacheLinePad [N]byte
suggestion might be best however. It is easy to understand while not making any promises about being the cache line size (so, for example, it could presumably be double the maximum L1 cache size if that is desirable for a given arch).