All local variables in Zig require a stack, and all globals are allocated space in the binary. The optimiser inlines these into registers wherever it can, but this is never guaranteed, so this memory must always be available, even when it isn't needed. This leads to situations such as startcode requiring an extra global allocation for the initial value of the stack pointer, and 81 lines later, a five-armed switch to extract it from the appropriate register, which will break immediately on a new architecture. This could very easily be done in asm without touching memory, and with more logical organisation. I say we should be able to do the same in Zig.
I propose: Flat Variables. These variables have type @Flat(X), where X is a type that fits in a register:
var x: @Flat(i32) = 3;
const y: @Flat(u64) = 4_300_000_000; // on 64-bit arch's
var z: @Flat(*SomeType) = &some_instance;
Values of these types are never allocated binary or stack space except to facilitate procedure calls. They are stored in registers for the lifetime of the scope that declared them -- declaring more of them than there are registers is a compile error. This allows them to be used even in callconv(.Naked) contexts:
// This example is of course horribly contrived, but I hope it communicates
// what @Flats will enable us to do in more contexts.
fn _op(x: i32, y: i32) callconv(.Naked) i32 {
const sum: @Flat(i32) = x + y; // Casts between @Flat and non-@Flat values always allowed
// POSSIBLE: implicitly cast all declarations to @Flats inside callconv(.Naked)?
var prod: @Flat(_) = x * y;
var acc: @Flat(_) = 0;
while (prod != 0) : (prod -= 1) {
acc += sum;
if (prod == 3) break;
}
return acc;
}
Type-level flat declaration, as in the above examples, allows the compiler to choose whichever register it likes for the value, and even move it between registers. Reserving a specific register is done with @regReserve, and reading a non-reserved register is done with @regRead:
var z = @regReserve(u32, "={r12}", 17); // type, register, init value
var sp = @regReserve([*]u8, "={esp}", stack_base);
var starting_stack_ptr = @regRead([*]u8, "={rsp});
Reserving the same register twice in a scope, or globally and anywhere else in the program, is a compile error. Calling a function that reserves the same register as one of your own flats is allowed, as the compiler is allowed to push flats to the stack for procedure calls.
So now, a simple bare metal OS entry point looks like this (some detail omitted):
var stack: [stack_height]usize align(builtin.os.stack_align) = undefined;
const sp_reg = switch (builtin.arch) {
.x86_64 => "={rsp}",
.aarch64, .riscv64 => "={sp}",
// etc.
}
fn _start() callconv(.Naked) noreturn {
const stack_base: @Flat([*]usize) = &stack + stack.len;
_ = @regReserve([*]usize, sp_reg, stack_base);
kmain();
while (true) {}
}
And std.start._start looks like this:
fn _start() callconv(.Naked) noreturn {
// ...
comptime const sp_reg = switch (builtin.arch) {
// etc.
}
const starting_stack_ptr: @Flat([*]usize) = @regRead(_, sp_reg);
// posixCallMainAndExit requires a trivial modification
@call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{starting_stack_ptr});
}
Let's take stock:
To me it's a no-brainer -- we _need_ to implement this.
Bonus
When importing linker symbols, C interprets them as addresses of variables, even though that doesn't make sense -- you have to take their address every time if you want correct behaviour, which is tedious and error-prone (just forget one &...). We could do better by declaring them as flat:
extern const _sdata: @Flat(*usize);
And then they'd be interpreted as the value of the symbol itself, which is much more sensible and prevents a footgun.
(Note: this obviously does not make sense for every external symbol, for instance those which are actually variables or functions. Also, this does not require a register per declaration -- when the value is a comptime-known const, it can be inlined wherever it appears, and take up no space.)
The optimiser inlines these into registers wherever it can, but this is never guaranteed, so this memory must always be available, even when it isn't needed.
the only use-case for this in practice sounds like the interesting situation where:
you are right that the zig compiler isnt perfect. why dont we first work on getting the compiler as fast as possible, and see if this is still relevant then?
I don't think performance impact should be considered here.
If the syntax change (and adding builtins) is good, it should be implemented; otherwise, it shouldn't.
Performance of the compiler can wait a bit longer, but as I understand it 0.7 is intended to be the last major breaking revision to the compiler, so anything like this needs to either be in it or discarded.
Personally, unless there's a compelling reason _not_ to, I'm in support of this.
No binary space is allocated for single-use values
Identical logic is only expressed once
Related concepts are grouped right next to each other
Platform dependence is limited to a single place, and can easily be extended
We have the full power of Zig, even in traditionally asm-only domains
Sounds good to me!
All local variables in Zig require a stack, and all globals are allocated space in the binary.
Citation needed. The LLVM optimizer is able to avoid spilling variables on stack, that's one of its main goals. Globals are always emitted in the binary because of their linkage/because you want to emit debug infos for them.
The optimiser inlines these into registers wherever it can, but this is never guaranteed, so this memory must always be available, even when it isn't needed.
Microptimization.
This leads to situations such as startcode requiring an extra global allocation for the initial value of the stack pointer
The global is not needed, you could just fetch the sp value and pass it as an argument to posixCallMainAndExit
and 81 lines later, a five-armed switch to extract it from the appropriate register, which will break immediately on a new architecture.
That's because some architectures don't alias sp to the stack pointer, otherwise you could just use a single inline asm line
This could very easily be done in asm without touching memory, and with more logical organisation.
Microoptimization again.
I say we should be able to do the same in Zig.
You may want to use asm directly if you really care about a single memory spill in a cold code path such as the crt0 routines.
I propose: Flat Variables.
That name means something else.
Citation needed. The LLVM optimizer is able to avoid spilling variables on stack, that's one of its main goals. Globals are always emitted in the binary because of their linkage/because you want to emit debug infos for them.
Microptimization.
As I said, performance shouldn't really be taken into account.
To me, the real question is simple: should there be an equivalent to C's register? Personally, I think that a raw register keyword would be useless (that's the kind of basic optimization I don't have a problem with the compiler performing), but being able to specify "keep this in a specific register" without resorting to inline asm is a good idea.
Being able to specify "keep this in a specific register" without resorting to inline asm is a good idea.
My personal use case: JIT. I'm writing a number of JITs in Zig, one is an emulator, one speeds up a game I'm working on by generating the main loop on the fly to avoid having to check state consistently, etc. A finer level of control over register allocation would be invaluable for dispatcher functions (functions called from within JITed code; depending on the ABI they use, it can be necessary to push/pop registers beforehand to avoid a clash, whereas if you can reserve a register and the compiler knows not to use it it's not necessary), though I can definitely see it causing problems for the exact same reason (if the compiler cannot generate the correct ABI as a needed register is reserved)...
@LemonBoy
The global is not needed, you could just fetch the sp value and pass it as an argument to
posixCallMainAndExit
Well then why doesn't _start() do that? Because that would require a local variable, and you can't have those in naked contexts.
I guess I should have made this clear: the main use case I see for this is in naked contexts. We _could_ just write asm for those cases, but then we give up all of Zig's organisational capabilities. Further, I'd argue that they don't offer much over asm if doing anything useful in them requires inline asm anyway.
Because that would require a local variable, and you can't have those in naked contexts.
I don't think that's correct. I'm fairly certain local vars are usable in naked contexts, and I can't think of any reason they wouldn't be.
Because they require a stack, which naked functions can't assume they have. This came up in Andrew's kernel dev stream: calculating the starting stack top inline wasn't possible, because Zig creates a local variable to store the result, and naked contexts don't allow that. I can't find anything to suggest this restriction has been lifted, especially considering that the current startcode uses the same technique.
I guess I should have made this clear: the main use case I see for this is in naked contexts
Having anything but a single asm block there is bound to blow up in unexpected ways (what if the LLVM backend lowers an operation into a libcall? boom!), given the huge amount of micromanaging that's needed I can't see how some inline asm isn't a better solution there.
Well then why doesn't _start() do that?
My bad, I hadn't noticed that it was marked as naked.
Because that would require a local variable, and you can't have those in naked contexts.
The real problem is that you need the original SP value to find the arg{c,v} and envp and allocating a local may change it.
Because they require a stack, which naked functions can't assume they have.
I think this is highly contextual. Most naked functions can use local variables, though LemonBoy is completely correct that Having anything but a single asm block there is bound to blow up in unexpected ways. Speaking from experience, anything more complex than that will fail, especially if written as normal Zig.
That might be an issue worth solving though; a way to guarantee that, for instance, the backend can't lower an op into a libcall for the specific function, or to explicitly disable the stack and reserve registers.
Given the problems that this would cause, and the fact that #5211 completely subsumes its sensible use cases, I see no reason to keep this open.
Most helpful comment
the only use-case for this in practice sounds like the interesting situation where:
you are right that the zig compiler isnt perfect. why dont we first work on getting the compiler as fast as possible, and see if this is still relevant then?