The relevant structs:
You can see here that each instruction has a src field which tells the corresponding byte offset into the relevant source file. This issue is to optimize resource usage - both CPU and memory - by removing this field from the struct and relying on other ways to correlate to source.
generateSymbol-local struct field with the new value.The main idea here is to reduce the heap allocation size for each instruction, thereby reducing the memory requirements, therefore reducing CPU usage (allocating memory is slow and causes CPU cache misses).
It would be nice to set up some stage2 performance measurements in ziglang/gotta-go-fast so that we can see the effects of such changes over time.
You can see the beginnings of this effort in this branch: stage2-rework-src.
Once this is implemented, it paves the way for the next optimization: Making the IR struct an extern union like Type and Value are and utilizing the lower memory addresses to be repurposed for instructions that have no operands. Here's from the code for Type:
/// This union takes advantage of the fact that the first page of memory
/// is unmapped, giving us 4096 possible enum tags that have no payload.
pub const Type = extern union {
/// If the tag value is less than Tag.no_payload_count, then no pointer
/// dereference is needed.
tag_if_small_enough: usize,
ptr_otherwise: *Payload,
This will save memory for many instructions which have no operands. Currently this set:
.alloc,
.retvoid,
.unreach,
.breakpoint,
Added to this would be 1 instruction for every Value which has its own tag:
// The first section of this enum are tags that require no payload.
u8_type,
i8_type,
u16_type,
i16_type,
u32_type,
i32_type,
u64_type,
i64_type,
usize_type,
isize_type,
c_short_type,
c_ushort_type,
c_int_type,
c_uint_type,
c_long_type,
c_ulong_type,
c_longlong_type,
c_ulonglong_type,
c_longdouble_type,
f16_type,
f32_type,
f64_type,
f128_type,
c_void_type,
bool_type,
void_type,
type_type,
anyerror_type,
comptime_int_type,
comptime_float_type,
noreturn_type,
null_type,
undefined_type,
fn_noreturn_no_args_type,
fn_void_no_args_type,
fn_naked_noreturn_no_args_type,
fn_ccc_void_no_args_type,
single_const_pointer_to_comptime_int_type,
const_slice_u8_type,
enum_literal_type,
anyframe_type,
undef,
zero,
one,
void_value,
unreachable_value,
empty_array,
null_value,
bool_true,
bool_false, // See last_no_payload_tag below.
// After this, the tag requires a payload.
There would be a tag corresponding to an IR instruction which is a constant with each of these values. This needs to be measured but again the idea here would be: fewer heap bytes allocated and therefore reduced CPU usage.
I think it makes more sense to have the benchmarks up before working on
any optimizations to the compiler itself, since it's realistically
impossible to know what the effect will be when we're talking about
things like cache misses unless we measure.
Most helpful comment
I think it makes more sense to have the benchmarks up before working on
any optimizations to the compiler itself, since it's realistically
impossible to know what the effect will be when we're talking about
things like cache misses unless we measure.