Zig: strip debug info from ReleaseSmall builds by default

Created on 14 Oct 2020  路  10Comments  路  Source: ziglang/zig

Repro code:

const std = @import("std");
const linux = std.os.linux;

pub fn exit() void {
    _ = linux.syscall1(.exit, 0);
}

pub fn main() void {
    exit();
}

Running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for arm produces an executable that is 707KB in size. However, running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for i386 produces an executable that is just 61KB.

Appending --strip "fixes" the issue as the executables become 5KB and 3KB.

accepted breaking proposal

Most helpful comment

OK sounds good to make release builds stripped by default for -OReleaseSmall. We need to change the CLI now though, because we need to be able to override the default in both directions. Here's my proposal:

  • -fstrip - always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe
  • -fno-strip - never strip debug symbols, even for -OReleaseSmall

If you look at the help menu you can see this is a well established pattern for other options.

All 10 comments

Should we maybe --strip by default when doing small release builds (-OReleaseSmall)?

Stripping small release builds by default would totally make sense.

OK sounds good to make release builds stripped by default for -OReleaseSmall. We need to change the CLI now though, because we need to be able to override the default in both directions. Here's my proposal:

  • -fstrip - always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe
  • -fno-strip - never strip debug symbols, even for -OReleaseSmall

If you look at the help menu you can see this is a well established pattern for other options.

Sounds like a good plan!

None of you investigated nor explained the difference in size, stripping away everything is only sweeping the problem under the rug.

The culprit of the massive difference in size is... a single line of code.

Running Bloaty McBloatface on the two binaries produced by the code in the ticket shows:

$ ./bloaty foo-arm                                                            
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  30.0%   206Ki   0.0%       0    .debug_loc
  25.2%   173Ki   0.0%       0    .debug_info
  10.6%  73.0Ki   0.0%       0    .debug_line
   9.1%  63.0Ki   0.0%       0    .debug_pubnames
   7.9%  54.5Ki   0.0%       0    .debug_str
   6.5%  44.6Ki  94.5%  44.5Ki    .text
   6.2%  42.8Ki   0.0%       0    .debug_ranges
   2.0%  14.1Ki   0.0%       0    .debug_pubtypes
   1.0%  7.09Ki   0.0%       0    .debug_frame
   0.4%  2.85Ki   0.0%       0    .debug_abbrev
   0.3%  2.40Ki   0.0%       0    .symtab
   0.3%  2.23Ki   0.0%       0    .strtab
   0.3%  1.74Ki   3.6%  1.70Ki    .rodata
   0.0%       0   0.8%     376    .bss
   0.0%     276   0.5%     236    .ARM.exidx
   0.0%     276   0.6%     276    [LOAD #1 [R]]
   0.0%     253   0.0%       0    .shstrtab
   0.0%     120   0.0%       0    [ELF Headers]
   0.0%      93   0.0%       0    .ARM.attributes
   0.0%      62   0.0%       0    .comment
   0.0%      48   0.0%      12    [2 Others]
 100.0%   689Ki 100.0%  47.1Ki    TOTAL
$ ./bloaty foo-i386                                                           
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  28.3%  17.0Ki   0.0%       0    .debug_info
  27.5%  16.5Ki   0.0%       0    .debug_pubnames
  26.9%  16.2Ki   0.0%       0    .debug_str
   6.1%  3.69Ki   0.0%       0    .debug_pubtypes
   2.9%  1.75Ki   0.0%       0    .debug_line
   2.3%  1.37Ki   0.0%       0    .debug_loc
   1.4%     849   0.0%       0    .debug_abbrev
   1.2%     752  51.3%     711    .text
   0.8%     464   0.0%       0    .debug_ranges
   0.6%     360   0.0%       0    .symtab
   0.0%       0  22.2%     308    .bss
   0.5%     306   0.0%       0    .strtab
   0.4%     240   0.0%       0    .debug_frame
   0.3%     214   0.0%       0    .shstrtab
   0.3%     212  15.3%     212    [LOAD #1 [R]]
   0.3%     196  11.2%     156    .rodata
   0.1%      80   0.0%       0    [ELF Headers]
   0.1%      60   0.0%       0    .comment
 100.0%  60.2Ki 100.0%  1.35Ki    TOTAL

Now forget the debug sections, they are the biggest ones but their size depends on an important factor: the code size.
As you can see the .text section is way bigger on ARM than on i386, this cannot be ascribed to a few compiler_rt builtins as one may naively think, but to something else coming from the stdlib.

This innocently-looking call to panic is enough to bloat the binary so much.

I'm preparing a patch that makes the size drop down to ~1.5K.

I actually was aware of the situation. I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.

I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables. It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.

I also stand by the plan to enable strip by default for ReleaseSmall. Choosing to use abort instead of panic here is functionally the same thing as enabling strip but only for this one panic call. Why not go ahead and do it for the entire compilation?

IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it. The comment in #6735 suggests this would be ugly/non-trivial?

I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.

IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it.

That code path was completely untested. Calling @panic there is definitely not ok (nor is triggering a panic before the initial stuff is initialized) as the panic handler uses TLS to handle panic-in-panic in a thread-safe fashion.

I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables.

Just use --single-threaded then, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.

It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.

It's only a matter of checking if any object has a PT_TLS segment.

I also stand by the plan to enable strip by default for ReleaseSmall.

Release and Small refer to the generated code, no?
ReleaseSmaller is only a --strip away and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.

The comment in #6735 suggests this would be ugly/non-trivial?

I can try to make the check a bit nicer, I think I have an idea that will work.

Just use --single-threaded then, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.

Example of the problem there:

https://github.com/ziglang/zig/blob/0c355bef9e424fbf06085f12fc28979d73b3d6af/lib/std/c/dragonfly.zig#L8-L11

Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.

ReleaseSmaller is only a --strip away and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.

Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.

An important thing to note here is that whether or not debug symbols are stripped is, in addition to affecting the generated code, also a comptime flag that std.debug uses to avoid including DWARF/PDB stack trace printing code.

https://github.com/ziglang/zig/blob/0c355bef9e424fbf06085f12fc28979d73b3d6af/lib/std/debug.zig#L108-L111

I can try to make the check a bit nicer, I think I have an idea that will work.

I opened #6765

Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.

But here you can't elide the TLS initialization as the os expects errno to be thread-local and you don't have control over libc's _start, not a good example IMO.

Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.

ReleaseSmall reads as "optimize for size" akin to gcc's -Os, I don't think that stripping the debug infos too is a good default as it's orthogonal to the code generation process.

My last two cents on the matter, I don't really care.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bronze1man picture bronze1man  路  3Comments

daurnimator picture daurnimator  路  3Comments

andrewrk picture andrewrk  路  3Comments

jorangreef picture jorangreef  路  3Comments

jorangreef picture jorangreef  路  3Comments