The compiler currently compiles large structs conservatively. If a struct type has more than 4 fields (or a few other conditions), we treat that type as unSSAable. All operations on variables of that type go to the stack, as if their address was taken.
This is suboptimal in various ways. For example:
type T struct {
a, b, c, d int
}
func f(x *T) {
t := T{}
*x = t
}
type U struct {
a, b, c, d, e int
}
func g(x *U) {
u := U{}
*x = u
}
f is compiled optimally, to:
XORPS X0, X0
MOVQ "".x+8(SP), AX
MOVUPS X0, (AX)
MOVUPS X0, 16(AX)
RET
g is quite a bit worse:
SUBQ $48, SP
MOVQ BP, 40(SP)
LEAQ 40(SP), BP
MOVQ $0, "".u(SP)
XORPS X0, X0
MOVUPS X0, "".u+8(SP)
MOVUPS X0, "".u+24(SP)
MOVQ "".u(SP), AX
MOVQ "".x+56(SP), CX
MOVQ AX, (CX)
LEAQ 8(CX), DI
LEAQ "".u+8(SP), SI
DUFFCOPY $868
MOVQ 40(SP), BP
ADDQ $48, SP
RET
We zero a temporary variable on the stack, then copy it to the destination.
We should process large structs through SSA as well. This will require a fair amount of work in the SSA backend to introduce struct builders, selectors of arbitrary width, stack allocation of large types, maybe heap allocation if they are really huge, etc.
Arrays of size > 1 are in a similar state, but they are somewhat harder because non-constant indexes add an additional complication.
Just to ask an obvious question, is the problem that we want to handle all n, or just that n<=4 is too small? If the latter, and a reasonable cap is say 16, it might be easier to just extend the current approach, maybe using a bit of codegen magic. Probably we should Do It Right, but worth asking...
My intent is to handle all n.
Looks like this should also help https://github.com/golang/go/issues/20859
Change https://golang.org/cl/106495 mentions this issue: cmd/compile: add some generic composite type optimizations
Change https://golang.org/cl/206937 mentions this issue: cmd/compile: optimize big structs
Most helpful comment
Change https://golang.org/cl/206937 mentions this issue:
cmd/compile: optimize big structs