Repro code:
const std = @import("std");
const linux = std.os.linux;
pub fn exit() void {
_ = linux.syscall1(.exit, 0);
}
pub fn main() void {
exit();
}
Running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for arm produces an executable that is 707KB in size. However, running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for i386 produces an executable that is just 61KB.
Appending --strip "fixes" the issue as the executables become 5KB and 3KB.
Should we maybe --strip by default when doing small release builds (-OReleaseSmall)?
Stripping small release builds by default would totally make sense.
OK sounds good to make release builds stripped by default for -OReleaseSmall. We need to change the CLI now though, because we need to be able to override the default in both directions. Here's my proposal:
-fstrip - always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe-fno-strip - never strip debug symbols, even for -OReleaseSmallIf you look at the help menu you can see this is a well established pattern for other options.
Sounds like a good plan!
None of you investigated nor explained the difference in size, stripping away everything is only sweeping the problem under the rug.
The culprit of the massive difference in size is... a single line of code.
Running Bloaty McBloatface on the two binaries produced by the code in the ticket shows:
$ ./bloaty foo-arm
FILE SIZE VM SIZE
-------------- --------------
30.0% 206Ki 0.0% 0 .debug_loc
25.2% 173Ki 0.0% 0 .debug_info
10.6% 73.0Ki 0.0% 0 .debug_line
9.1% 63.0Ki 0.0% 0 .debug_pubnames
7.9% 54.5Ki 0.0% 0 .debug_str
6.5% 44.6Ki 94.5% 44.5Ki .text
6.2% 42.8Ki 0.0% 0 .debug_ranges
2.0% 14.1Ki 0.0% 0 .debug_pubtypes
1.0% 7.09Ki 0.0% 0 .debug_frame
0.4% 2.85Ki 0.0% 0 .debug_abbrev
0.3% 2.40Ki 0.0% 0 .symtab
0.3% 2.23Ki 0.0% 0 .strtab
0.3% 1.74Ki 3.6% 1.70Ki .rodata
0.0% 0 0.8% 376 .bss
0.0% 276 0.5% 236 .ARM.exidx
0.0% 276 0.6% 276 [LOAD #1 [R]]
0.0% 253 0.0% 0 .shstrtab
0.0% 120 0.0% 0 [ELF Headers]
0.0% 93 0.0% 0 .ARM.attributes
0.0% 62 0.0% 0 .comment
0.0% 48 0.0% 12 [2 Others]
100.0% 689Ki 100.0% 47.1Ki TOTAL
$ ./bloaty foo-i386
FILE SIZE VM SIZE
-------------- --------------
28.3% 17.0Ki 0.0% 0 .debug_info
27.5% 16.5Ki 0.0% 0 .debug_pubnames
26.9% 16.2Ki 0.0% 0 .debug_str
6.1% 3.69Ki 0.0% 0 .debug_pubtypes
2.9% 1.75Ki 0.0% 0 .debug_line
2.3% 1.37Ki 0.0% 0 .debug_loc
1.4% 849 0.0% 0 .debug_abbrev
1.2% 752 51.3% 711 .text
0.8% 464 0.0% 0 .debug_ranges
0.6% 360 0.0% 0 .symtab
0.0% 0 22.2% 308 .bss
0.5% 306 0.0% 0 .strtab
0.4% 240 0.0% 0 .debug_frame
0.3% 214 0.0% 0 .shstrtab
0.3% 212 15.3% 212 [LOAD #1 [R]]
0.3% 196 11.2% 156 .rodata
0.1% 80 0.0% 0 [ELF Headers]
0.1% 60 0.0% 0 .comment
100.0% 60.2Ki 100.0% 1.35Ki TOTAL
Now forget the debug sections, they are the biggest ones but their size depends on an important factor: the code size.
As you can see the .text section is way bigger on ARM than on i386, this cannot be ascribed to a few compiler_rt builtins as one may naively think, but to something else coming from the stdlib.
This innocently-looking call to panic is enough to bloat the binary so much.
I'm preparing a patch that makes the size drop down to ~1.5K.
I actually was aware of the situation. I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.
I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables. It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.
I also stand by the plan to enable strip by default for ReleaseSmall. Choosing to use abort instead of panic here is functionally the same thing as enabling strip but only for this one panic call. Why not go ahead and do it for the entire compilation?
IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it. The comment in #6735 suggests this would be ugly/non-trivial?
I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.
IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it.
That code path was completely untested. Calling @panic there is definitely not ok (nor is triggering a panic before the initial stuff is initialized) as the panic handler uses TLS to handle panic-in-panic in a thread-safe fashion.
I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables.
Just use --single-threaded then, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.
It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.
It's only a matter of checking if any object has a PT_TLS segment.
I also stand by the plan to enable strip by default for ReleaseSmall.
Release and Small refer to the generated code, no?
ReleaseSmaller is only a --strip away and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.
The comment in #6735 suggests this would be ugly/non-trivial?
I can try to make the check a bit nicer, I think I have an idea that will work.
Just use
--single-threadedthen, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.
Example of the problem there:
Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.
ReleaseSmalleris only a--stripaway and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.
Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.
An important thing to note here is that whether or not debug symbols are stripped is, in addition to affecting the generated code, also a comptime flag that std.debug uses to avoid including DWARF/PDB stack trace printing code.
https://github.com/ziglang/zig/blob/0c355bef9e424fbf06085f12fc28979d73b3d6af/lib/std/debug.zig#L108-L111
I can try to make the check a bit nicer, I think I have an idea that will work.
I opened #6765
Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.
But here you can't elide the TLS initialization as the os expects errno to be thread-local and you don't have control over libc's _start, not a good example IMO.
Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.
ReleaseSmall reads as "optimize for size" akin to gcc's -Os, I don't think that stripping the debug infos too is a good default as it's orthogonal to the code generation process.
My last two cents on the matter, I don't really care.
Most helpful comment
OK sounds good to make release builds stripped by default for
-OReleaseSmall. We need to change the CLI now though, because we need to be able to override the default in both directions. Here's my proposal:-fstrip- always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe-fno-strip- never strip debug symbols, even for -OReleaseSmallIf you look at the help menu you can see this is a well established pattern for other options.