Go: x/sys/cpu: add type CacheLinePad

Created on 1 May 2018 · 36Comments · Source: golang/go

24843 added a new package for CPU feature detection, when looking through the package I noticed `const cacheLineSize` and thought that could be useful if it were exported.

My use case is the same as the package, adding padding to structs to prevent false sharing.

FrozenDueToAge Proposal Proposal-Accepted

Source

AlexRouSg

Most helpful comment

@as the statement:

It appears that the two values should be the same, and were only included for completeness

Would appear to be contradicted by the authors answer :

Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort).

So in situations where an ISA has shipped with different cache line sizes (like the arm64 case linked in @martisch's reply) the values may be different. Since gc doesn't expose a way to specify the target arch the constructive and destructive interference sizes will therefore always be different on arm64. The same is presumably true for the new wasm backend.

I am also starting to think that the type CacheLinePad [N]byte suggestion might be best however. It is easy to understand while not making any promises about being the cache line size (so, for example, it could presumably be double the maximum L1 cache size if that is desirable for a given arch).

mundaym on 13 May 2018

👍3

All 36 comments

/cc @bradfitz @ianlancetaylor

I think it'd be fine to export cacheLineSize in x/sys/cpu.

tklauser on 2 May 2018

If these are exported i suggest to add a warning in the documentation that these are best effort guesses based on GOARCH and might not be the cpus actual cache line size.

They shouldn't be used anywhere where their correctness matters for the programs correctness to avoid issues like:
http://www.mono-project.com/news/2016/09/12/arm64-icache/

martisch on 2 May 2018

👍2

C++17 introduced std::hardware_{constructive,destructive}_interference_size, it might be worth considering a similar approach if we are exposing the API:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html

mundaym on 2 May 2018

Like @martisch, I think it's problematic to export cacheLineSize as a constant. It's not the same on different processors with the same GOARCH value.

I would be OK with exporting it as a variable if we can arrange to set the variable correctly.

ianlancetaylor on 2 May 2018

@ianlancetaylor How about instead of a const CacheLineSize there's a type Pad [cacheLineSize]byte so it's clear the intended usage is only to add padding to structs.

AlexRouSg on 2 May 2018

An array type doesn't sound very flexible, as it doesn't permit adjusting the padding to be only what is required to avoid cache contention.

A constant like MaxCacheLineSize would be OK with me.

ianlancetaylor on 2 May 2018

👍2

Regarding the var CacheLineSize, Intel's cpuid package has this implemented for x86 so maybe this would be useful as a reference.

https://github.com/intel-go/cpuid

AlexRouSg on 5 May 2018

Change https://golang.org/cl/111775 mentions this issue: cpu: export cache line size

gopherbot on 7 May 2018

👍1

I'm uncomfortable with the name MaxCacheLineSize. It could, for example, refer to the L2 cache line size if that is larger than the L1 cache line size.

Since the only use case identified so far is to avoid false sharing I think we should use a name like DestructiveIntereferenceSize, FalseSharingSize, SharedInterferenceSize or similar. This would also allow us to increase it if there are architectural quirks, such as cache line prefetching, that make it desirable to pad (and maybe align in Go 2) data structures to more than the cache line size.

mundaym on 7 May 2018

👍3 👎1

Calling it CacheLineSize or MaxCacheLineSize does seem to promise more than we need and maybe more than we can deliver.

Would it be enough to introduce:

type CacheLinePad [64]byte

appropriately sized for each system, and leave it at that? At least then it's clear what it's for, and people would be discouraged from using it for other things.

Still doesn't seem particularly clean but maybe this is just not something we can do cleanly.

rsc on 7 May 2018

👍1

In https://github.com/golang/go/issues/25203#issuecomment-386065278 @ianlancetaylor mentioned that an array type might be a bit too inflexible. I also see @mundaym's and @rsc's point about the name being ambiguous and/or promising something we can't deliver.

I think renaming the const to either DestructiveIntereferenceSize, FalseSharingSize, SharedInterferenceSize might make less of a promise while still allowing a certain kind of flexibility. What do you think?

tklauser on 8 May 2018

Since I made the proposal thought I'd give my opinions.

I don't really care what you guys call it, but I do see the benefit of a const over a type which allows fine tuning the padding e.g.

type foo struct {
    _ [cpu.CacheLineSize]byte
    x, y, z int
    _ [cpu.CacheLineSize - 24]byte
}

I'm not too sure what's the use case of a var but since @ianlancetaylor brought it up maybe he knows of one?

AlexRouSg on 8 May 2018

A const is fine with me as long as it's clear that it represents the maximum cache line size of a particular architecture. And, of course, it may change over time. I was thinking of a var because within a single architecture different processors have different cache line sizes. But since the main use seems to be for struct padding, a var is not helpful.

An array type is more awkward than a const in some cases but is perhaps more clear for the intended use. The array type can still be used in the same way as the const by calling len.

ianlancetaylor on 8 May 2018

👍2

I feel like there could be some benefit with naming the const similar to C++17 (less confusion over what it represents), tho that being said DestructiveIntereferenceSize is way to long.

As far as I can tell, DestructiveIntereferenceSize and ConstructiveIntereferenceSize are usually the same value.

So would everyone be ok with just IntereferenceSize?

Unless there is a situation where DestructiveIntereferenceSize would be different from ConstructiveIntereferenceSize then we should probably have both of them?

AlexRouSg on 10 May 2018

Unless there is a situation where DestructiveIntereferenceSize would be different from ConstructiveIntereferenceSize then we should probably have both of them?

To avoid destructive interference we want to make sure that the data is in two separate cache lines. To ensure constructive interference we want data to be in the same cache line. That means that for DestructiveInterferenceSize we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize we would want the minimum cache line size for the architecture. So yes, I think they can be different.

That said I'm not sure how you could safely/portably use ConstructiveInterferenceSize in a Go program, but maybe someone else can think of something.

mundaym on 11 May 2018

That means that for DestructiveInterferenceSize we would want the maximum cache line for the architecture and for ConstructiveInterferenceSize we would want the minimum cache line size for the architecture. So yes, I think they can be different.

Ahh, now that makes more sense.

In the sample code from the site you linked, ConstructiveInterferenceSize usage looks like
static_assert(sizeof(together) <= hardware_constructive_interference_size);

So I was thinking maybe just put this check into a test?

AlexRouSg on 11 May 2018

What is the min and max cache line size?

Check ConstructiveInterferenceSize and DestructiveInterferenceSize.

Why is it called that?

To _prevent_ confusion.

There is very little chance that this will prevent confusion outside of a specific set of users who are already familiar with this hideous name. It doesnt even make too much sense in an engineering context unless you hunt down the specific proposal in the c++ realm and extrapolate its meaning from that document. I have never heard this word in the context given here, unlike cachelinesize.

Its also confusing because of the semantics of constructive and destructive -> min, max.

Sorry for leaving the bike without hair, but I think the name is bad.

as on 12 May 2018

@as since you down voted every comment with a proposed name, even MaxCacheLineSize
What would you suggest be a good name?

AlexRouSg on 12 May 2018

Sorry just noticed you up voted type CacheLinePad [64]byte
In that case, where would the min cache size fit? Or do you not think it's needed?

AlexRouSg on 12 May 2018

The c++ proposal:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html

Destructive interference size: a number that’s _suitable_ as an offset between two objects to _likely_ avoid false-sharing due to different runtime access patterns from different threads.
Constructive interference size: a number that’s _suitable_ as a _limit_ on two objects’ _combined_ memory footprint size and base alignment to _likely_ promote true-sharing between them.

A quote from the author of the proposal who introduced the proposed terms that C++ uses. The author wrote (this was from a stackoverflow question):

I authored the proposal, great answer! To clarify one point you made: "I would almost always expect these values to be the same. I believe the only reason they are declared separately is for completeness.". Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort). On constexpr: it's required to constexpr for the value to be usable in determining struct layout. Runtime values are useful in other circumstances. – JF Bastien

It appears that the two values should be the same, and were only included for _completeness_. I don't know what completeness means in this context exactly. A few articles suggest that false sharing on L2 is not possible where as at least one family of benchmarks suggests performance gains occur when data are padded to a multiple of the L1 cache line size (it implies that some prefetching is done to L2 in pairs, so that multiple is 2 and the value used is 128 instead of 64).

The question I have is: should Go be clever like C++, and create effect-based variables/values like Constructive/Destructive or should it stick to naming the values for what they represent? To me there is an additional layer of incomprehensible _magic_ with the effect-based names, the primary reason I would prefer not to use those names (the second is they are too long).

Regarding Min/Max cache line size, I don't know what the right answer is because it is not intuitive what Min/Max means either. Whereas cache line size is a phrase almost always associated with the size of the L1 cache lines / cache blocks. To speculatively use min/max without the possibility of >2 qualifiers seems to make a lot of assumptions that I don't think are clearly stated.

as on 13 May 2018

👍1

@as the statement:

It appears that the two values should be the same, and were only included for completeness

Would appear to be contradicted by the authors answer :

Yes, they should always be the same unless: 1) the ISA has shipped with different cacheline sizes, and no target arch is given; 2) you're targeting a virtual ISA such as WebAssembly for which the actual ISA is unknown (then you get best-effort).

mundaym on 13 May 2018

👍3

Seems using the cache line size as padding to avoid false sharing is not sufficient on newer Intel architectures and on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too:

Intel® 64 and IA-32 Architectures Optimization Reference Manual
2.3.5.4  Data Prefetching
Spatial Prefetcher: This prefetcher strives to complete every cache line fetched to the L2 cache with 
the pair line that completes it to a 128-byte aligned chunk

This was discussed and benchmarked in https://groups.google.com/forum/#!topic/mechanical-sympathy/i3-M2uCYTJE.

Also seems a case were Destructive and Constructive inference size can diverge.

Using CacheLInePad[N]byte or another name sounds like the better option to account for the original use case that should be solved. This also avoids pinning the meaning to another programming language/standard when using e.g. ConstructiveInterferenceSize.

martisch on 13 May 2018

👍2

For the points given above, I too would agree type CacheLInePad [N]byte makes more sense here.

So should we talk about a const or func to test for true sharing since we're already on the topic? Or should that just go to a new proposal?

AlexRouSg on 13 May 2018

@martisch

on the amd64 arch we should consider 2x cache line size for struct padding since the spatial prefetcher pulls in the adjacent cache line too

[…]

Also seems a case were Destructive and Constructive inference size can diverge.

How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.

bcmills on 14 May 2018

How so? The prefetch itself should cause the two cache lines to interfere constructively, but the need to re-mark the cache lines as dirty on write (and the extra bus traffic for the cache invalidations) should cause them to interfere destructively: it seems like the destructive and constructive interference sizes are both two cache lines.

I was thinking about it in the context of the amd64 platform as a whole as i assumed we likely wont do runtime detection of these sizes for use in padding structs if we use the current const approach. In amd64 we have some cpu doing spatial prefetching and some not. So Constructive seems to be ok at 64bytes (does somebody know a am64 compatible cpu thats used widely with less than 64byte cache lines?) but for some newer amd64 compatible cpus there is some Destructive interference within 128byte (might not be as bad as within the same cache line).

martisch on 14 May 2018

Soo I wrote a false/true sharing benchmark using amd64 numbers (64 and 128). Instead of guessing, if anyone has access to those cpus, just run the benchmark and let the numbers speak for themselves.
https://play.golang.org/p/hp_MXQWEJvB (edit: removed useless middle align tests)

I have an old intel so constructive and destructive sizes are the same.

AlexRouSg on 14 May 2018

Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

BenchmarkFalseSharing/NoPad-12              5000        270183 ns/op
BenchmarkFalseSharing/Pad64Start-12        10000        165659 ns/op
BenchmarkFalseSharing/Pad64StartEnd-12     10000        165744 ns/op
BenchmarkFalseSharing/Pad128Start-12       10000        164528 ns/op
BenchmarkFalseSharing/Pad128StartEnd-12                10000        167642 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-12          10000        231060 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-12          5000        241121 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-12                 5000        247388 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-12               10000        169730 ns/op
BenchmarkTrueSharing/<64-12                                         2000        602211 ns/op
BenchmarkTrueSharing/>64-12                                          500       2680956 ns/op
BenchmarkTrueSharing/>128-12                                         500       3555779 ns/op

I don't see much difference here between 64 and 128 bytes.
Middle has some difference, but alignment is weird for those examples so I'm not sure we can conclude anything from them.

randall77 on 21 May 2018

If the cache line were 64 bytes then I can understand why middle 128 would be better than 64.

64 - 12 = 52 bytes
So for pad 64 middle the first 64 bytes would contain field x and so it's the same as no padding. I guess I should remove the middle benches.

Weird that on my i7 middle performs the same on 64 vs 128. (cpuid reports cache line of 64 bytes)

BenchmarkFalseSharing/NoPad-4              10000            194714 ns/op
BenchmarkFalseSharing/Pad64Start-4         10000            168577 ns/op
BenchmarkFalseSharing/Pad64StartEnd-4      10000            172899 ns/op
BenchmarkFalseSharing/Pad128Start-4        10000            170881 ns/op
BenchmarkFalseSharing/Pad128StartEnd-4             10000            174534 ns/op
BenchmarkFalseSharing/Pad64StartEndAligned-4       10000            156478 ns/op
BenchmarkFalseSharing/Pad128StartEndAligned-4      10000            155024 ns/op
BenchmarkFalseSharing/Pad64StartEndAlignedMiddle-4                 10000            148957 ns/op
BenchmarkFalseSharing/Pad128StartEndAlignedMiddle-4                10000            151861 ns/op
BenchmarkTrueSharing/<64-4                                          2000            690212 ns/op
BenchmarkTrueSharing/>64-4                                           500           2205934 ns/op
BenchmarkTrueSharing/>128-4                                          500           2675777 ns/op

AlexRouSg on 21 May 2018

OK, does anyone object to CacheLinePad (as in https://github.com/golang/go/issues/25203#issuecomment-387189236), with a size yet to be determined?

rsc on 21 May 2018

@rsc nope, I think pretty much everyone who commented here think that makes the most sense for the padding.

AlexRouSg on 21 May 2018

👍1

Revising my earlier comment, instead of:

type CacheLinePad [64]byte

We should make this an opaque struct so that the only interesting property of the type is that it has the given size:

type CacheLinePad struct { _ [64]byte }

Then code cannot depend on it being an array somehow. But otherwise, proposal accepted.

rsc on 4 Jun 2018

👍2

Now that I think about it, using unsafe.Sizeof() is prob better/clearer than len() and other advance usage of the padding size prob would result in using unsafe anyway.

So I 100% agree the struct type is the best so far.

AlexRouSg on 5 Jun 2018

👍1

Change https://golang.org/cl/116276 mentions this issue: runtime,internal/cpu: replace sys.CacheLineSize for padding by cpu.CacheLinePad

gopherbot on 5 Jun 2018

Made above CL to see how CacheLinePad could be used when added to internal/cpu (the package x/sys/cpu was forked from) and applied to the runtime package which already imports internal/cpu. Most uses of runtime/internal/sys.CacheLineSize (except two) are for padding. This would also remove the redundant definition of CacheLineSize per Arch in runtime/internal/sys and internal/cpu.

martisch on 5 Jun 2018

👍2

https://golang.org/cl/111775 (the CL for x/sys/cpu) is now also updated to provide a CacheLinePad type as suggested in https://github.com/golang/go/issues/25203#issuecomment-394492749 and as implemented for internal/cpu by @martisch in https://golang.org/cl/116276.

tklauser on 5 Jun 2018

👍1

Change https://golang.org/cl/126601 mentions this issue: runtime: replace sys.CacheLineSize for padding by cpu.CacheLinePad

gopherbot on 28 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings