Go: runtime: sloppy struct field arrangement could be re-arranged for more compact structs and save RAM

Created on 6 Nov 2020  路  9Comments  路  Source: golang/go

Coming here from tip ecc3f5112ebaf23c4b1ac4c5eedfa406d82ecc9a, with one of the static analysis tools, we have developed at Orijtech, Inc., we've found a bunch of fields in the Go standard library that could consume less space
for example, given
https://github.com/golang/go/blob/ecc3f5112ebaf23c4b1ac4c5eedfa406d82ecc9a/src/runtime/trace.go#L760-L769

the size on 64-bit machines is 40 bytes, but it could really be 32 bytes by re-arranging it to

type traceStack struct {
    stk  [0]uintptr
    link traceStackPtr
    hash uintptr
    n    int
    id   uint32
}

and here is the exhibit https://play.golang.org/p/bG4u6hqXZvv or inlined below

package main

import "unsafe"

type traceStack1 struct {
    link traceStackPtr
    hash uintptr
    id   uint32
    n    int
    stk  [0]uintptr // real type [n]uintptr
}

type traceStackPtr uintptr

type traceStack2 struct {
    stk  [0]uintptr
    link traceStackPtr
    hash uintptr
    n    int
    id   uint32
}

func main() {
    println(unsafe.Sizeof(traceStack1{}))
    println(unsafe.Sizeof(traceStack2{}))
}

which when run shows

40
32

which is a 20% reduction in RAM that'll be used in traceStack

There are about 8 reported optimal changes like

/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mfixalloc.go:27:15: struct has size 72 (size class 80), could be 64 (size class 64), rearrange to struct{size uintptr; first func(arg unsafe.Pointer, p unsafe.Pointer); arg unsafe.Pointer; list *mlink; chunk uintptr; inuse uintptr; stat *sysMemStat; nchunk uint32; zero bool} for optimal size (20.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgc.go:1009:10: struct has size 400 (size class 416), could be 376 (size class 384), rearrange to struct{wbufSpans struct{lock mutex; free mSpanList; busy mSpanList}; assistQueue struct{lock mutex; q gQueue}; sweepWaiters struct{lock mutex; list gList}; heap2 uint64; heap1 uint64; bytesMarked uint64; heap0 uint64; pauseStart int64; pauseNS int64; tstart int64; tEnd int64; nFlushCacheRoots int; empty lfstack; nBSSRoots int; nSpanRoots int; nStackRoots int; tMarkTerm int64; tMark int64; bgMarkReady note; full lfstack; mode gcMode; tSweepTerm int64; totaltime int64; initialHeapLive uint64; nDataRoots int; heapGoal uint64; cycles uint32; stwprocs int32; maxprocs int32; _ uint32; markDoneSema uint32; startSema uint32; nwait uint32; nproc uint32; markrootJobs uint32; markrootNext uint32; bgMarkDone uint32; pad0 cpu.CacheLinePad; userForced bool} for optimal size (7.69% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgcscavenge.go:161:14: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; g *g; timer *timer; sysmonWake uint32; parked bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:1977:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; newm muintptr; wake note; haveTemplateThread uint32; waiting bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:5159:17: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{schedwhen int64; syscallwhen int64; schedtick uint32; syscalltick uint32} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime2.go:753:10: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{runnable gQueue; n int32; user bool} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/trace.go:761:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{stk [0]uintptr; link traceStackPtr; hash uintptr; n int; id uint32} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime_test.go:265:10: struct has size 16 (size class 16), could be 8 (size class 8), rearrange to struct{z struct{}; n int64} for optimal size (50.00% savings)

We plan on releasing the tool as open source code with a permissive license, and it is already an x/tools/go/analysis/pass that can be plugged into IDEs or into cmd/vet if needed later on.

Cheers to @cuonglm our core developer at Orijtech, whom we tasked with writing it!

EDIT: now accounts for size classes!

NeedsInvestigation

Most helpful comment

How does this interact with size classes? I鈥檓 sure there is a 32byte size class, but if 40 bytes is rounded up to 48 the saving could be even larger.

All 9 comments

How does this interact with size classes? I鈥檓 sure there is a 32byte size class, but if 40 bytes is rounded up to 48 the saving could be even larger.

Great question @davecheney, and thank you!
@josharian also independently raised this to me offline. We don't yet have that baked in and
hadn't thought about, but I've filed an issue internal issue after @josharian's same question :)

Thanks @davecheney @josharian, we now account for size classes and also show the percentage savings too #GEICOLizardSavings for example now

/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mfixalloc.go:27:15: struct has size 72 (size class 80), could be 64 (size class 64), rearrange to struct{size uintptr; first func(arg unsafe.Pointer, p unsafe.Pointer); arg unsafe.Pointer; list *mlink; chunk uintptr; inuse uintptr; stat *sysMemStat; nchunk uint32; zero bool} for optimal size (20.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgc.go:1009:10: struct has size 400 (size class 416), could be 376 (size class 384), rearrange to struct{wbufSpans struct{lock mutex; free mSpanList; busy mSpanList}; assistQueue struct{lock mutex; q gQueue}; sweepWaiters struct{lock mutex; list gList}; heap2 uint64; heap1 uint64; bytesMarked uint64; heap0 uint64; pauseStart int64; pauseNS int64; tstart int64; tEnd int64; nFlushCacheRoots int; empty lfstack; nBSSRoots int; nSpanRoots int; nStackRoots int; tMarkTerm int64; tMark int64; bgMarkReady note; full lfstack; mode gcMode; tSweepTerm int64; totaltime int64; initialHeapLive uint64; nDataRoots int; heapGoal uint64; cycles uint32; stwprocs int32; maxprocs int32; _ uint32; markDoneSema uint32; startSema uint32; nwait uint32; nproc uint32; markrootJobs uint32; markrootNext uint32; bgMarkDone uint32; pad0 cpu.CacheLinePad; userForced bool} for optimal size (7.69% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgcscavenge.go:161:14: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; g *g; timer *timer; sysmonWake uint32; parked bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:1977:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; newm muintptr; wake note; haveTemplateThread uint32; waiting bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:5159:17: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{schedwhen int64; syscallwhen int64; schedtick uint32; syscalltick uint32} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime2.go:753:10: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{runnable gQueue; n int32; user bool} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/trace.go:761:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{stk [0]uintptr; link traceStackPtr; hash uintptr; n int; id uint32} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime_test.go:265:10: struct has size 16 (size class 16), could be 8 (size class 8), rearrange to struct{z struct{}; n int64} for optimal size (50.00% savings)

Note that there are other reasons why a different struct layout then the most packed may be wanted:

  • alignment of some fields for more performant access might be better not packing everything together
  • fields that are accessed together should be in the same cache line
  • frequently accessed fields at the beginning so the rest of the struct stays cold in cache
  • adding a frequently accessed field (e.g. len) of a map at the beginning so the instruction constructing the pointer to the field is short which affects binary size
  • ordering pointers at the beginning of the struct so the garbage collector only needs to scan the beginning of the struct (this is likely not incompatible with packing but should be taken into account)
  • ordered so that a specific field is 64bit aligned on 32bit and 64bit architectures

If the [0]uintptr field is moved from the end of the struct, it would need to be statically sized rather than dynamically sized.

/cc @aclements @randall77 @mknyszek @prattmic

In general, we've already analyzed this tradeoff for a lot of the popular runtime structures (e.g. mspan), though I'm sure there are some we missed. Patches are welcome of course to fix those up; I don't think we recently ran an automated tool against the source.

Broadly-speaking, I do think a lot of the big low-hanging fruit has already been found, just based on my own familiarity with the runtime and my discussions with other people who work on the runtime. For instance, we already track the sizes of some of these structures explicitly in tests to avoid accidental growth (e.g. the g struct). @martisch gave a very good list of reasons that we explicitly _don't_ do this. For a real-world example of one of these, the first few fields g struct I think might not be optimally ordered, but these fields are accessed quite often so we'd like to keep them together, in the same cache line ideally.

As @BenLubar notes in this particular case we can't actually move the [0]uinptr because it's one of those borrowed-from-C variable-sized-struct tricks. The size of this type isn't actually 40 bytes. A smaller "header" on this type is still probably useful, though I'm not even sure if this type gets heap allocated. Someone would need to take a closer look, but feel free to send a CL and we can figure it out.

Rearranging fields for struct packing is like writing code in assembly - there may be a performance win but it also makes code harder to maintain, and not all struct can be reordered. The cost here is only worth paying if there is a benefit. So just like an assembly function needs to be a CPU time hot spot, careful struct packing needs to be a memory hotspot - a struct there are many of in a typical program, so that it is a significant source of memory usage.

Looking at the ones you identified:

  • src/runtime/mfixalloc.go:27 (type fixalloc) - Only 5 of these in the entire runtime.
  • src/runtime/mgc.go:1009 (var work) - Only 1 of these (it's a var not a type).
  • src/runtime/mgcscavenge.go:161 (var scavenge) - Only 1.
  • src/runtime/proc.go:1977 (var newmHandoff) - Only 1.
  • src/runtime/proc.go:5159 (type sysmontick) - One per p, so size classes don't matter, and the p is big anyway and there are relatively few.
  • src/runtime/runtime2.go:753 (struct inside type schedt) - Only 1 schedt in program, and this is a struct inside a struct, so size class doesn't directly matter.
  • src/runtime/trace.go:761 (type traceStack) - As noted, this is not allocated using the usual allocator and can't be rearranged.
  • src/runtime/runtime_test.go:265 (type T2 inside TestTrailingZero) - This is in a test.

I would strongly suggest trying to combine your tool with memory profiling results, so that it can focus on the cases that matter most.

Given that none of these would be a significant savings, closing this issue.

Was this page helpful?
0 / 5 - 0 ratings