Zig: strip debug info from ReleaseSmall builds by default

Created on 14 Oct 2020 · 10Comments · Source: ziglang/zig

Repro code:

const std = @import("std");
const linux = std.os.linux;

pub fn exit() void {
    _ = linux.syscall1(.exit, 0);
}

pub fn main() void {
    exit();
}

Running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for arm produces an executable that is 707KB in size. However, running zig build-exe main.zig -target arm-linux-none -O ReleaseSmall for i386 produces an executable that is just 61KB.

Appending --strip "fixes" the issue as the executables become 5KB and 3KB.

accepted breaking proposal

Source

atipls

Most helpful comment

OK sounds good to make release builds stripped by default for -OReleaseSmall. We need to change the CLI now though, because we need to be able to override the default in both directions. Here's my proposal:

-fstrip - always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe
-fno-strip - never strip debug symbols, even for -OReleaseSmall

If you look at the help menu you can see this is a well established pattern for other options.

andrewrk on 17 Oct 2020

👍2

All 10 comments

Should we maybe --strip by default when doing small release builds (-OReleaseSmall)?

FireFox317 on 14 Oct 2020

👍2

Stripping small release builds by default would totally make sense.

jedisct1 on 14 Oct 2020

-fstrip - always strip debug symbols, even for -ODebug, -OReleaseFast, and -OReleaseSafe
-fno-strip - never strip debug symbols, even for -OReleaseSmall

If you look at the help menu you can see this is a well established pattern for other options.

andrewrk on 17 Oct 2020

👍2

Sounds like a good plan!

jedisct1 on 18 Oct 2020

None of you investigated nor explained the difference in size, stripping away everything is only sweeping the problem under the rug.

The culprit of the massive difference in size is... a single line of code.

Running Bloaty McBloatface on the two binaries produced by the code in the ticket shows:

$ ./bloaty foo-arm                                                            
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  30.0%   206Ki   0.0%       0    .debug_loc
  25.2%   173Ki   0.0%       0    .debug_info
  10.6%  73.0Ki   0.0%       0    .debug_line
   9.1%  63.0Ki   0.0%       0    .debug_pubnames
   7.9%  54.5Ki   0.0%       0    .debug_str
   6.5%  44.6Ki  94.5%  44.5Ki    .text
   6.2%  42.8Ki   0.0%       0    .debug_ranges
   2.0%  14.1Ki   0.0%       0    .debug_pubtypes
   1.0%  7.09Ki   0.0%       0    .debug_frame
   0.4%  2.85Ki   0.0%       0    .debug_abbrev
   0.3%  2.40Ki   0.0%       0    .symtab
   0.3%  2.23Ki   0.0%       0    .strtab
   0.3%  1.74Ki   3.6%  1.70Ki    .rodata
   0.0%       0   0.8%     376    .bss
   0.0%     276   0.5%     236    .ARM.exidx
   0.0%     276   0.6%     276    [LOAD #1 [R]]
   0.0%     253   0.0%       0    .shstrtab
   0.0%     120   0.0%       0    [ELF Headers]
   0.0%      93   0.0%       0    .ARM.attributes
   0.0%      62   0.0%       0    .comment
   0.0%      48   0.0%      12    [2 Others]
 100.0%   689Ki 100.0%  47.1Ki    TOTAL
$ ./bloaty foo-i386                                                           
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  28.3%  17.0Ki   0.0%       0    .debug_info
  27.5%  16.5Ki   0.0%       0    .debug_pubnames
  26.9%  16.2Ki   0.0%       0    .debug_str
   6.1%  3.69Ki   0.0%       0    .debug_pubtypes
   2.9%  1.75Ki   0.0%       0    .debug_line
   2.3%  1.37Ki   0.0%       0    .debug_loc
   1.4%     849   0.0%       0    .debug_abbrev
   1.2%     752  51.3%     711    .text
   0.8%     464   0.0%       0    .debug_ranges
   0.6%     360   0.0%       0    .symtab
   0.0%       0  22.2%     308    .bss
   0.5%     306   0.0%       0    .strtab
   0.4%     240   0.0%       0    .debug_frame
   0.3%     214   0.0%       0    .shstrtab
   0.3%     212  15.3%     212    [LOAD #1 [R]]
   0.3%     196  11.2%     156    .rodata
   0.1%      80   0.0%       0    [ELF Headers]
   0.1%      60   0.0%       0    .comment
 100.0%  60.2Ki 100.0%  1.35Ki    TOTAL

Now forget the debug sections, they are the biggest ones but their size depends on an important factor: the code size.
As you can see the .text section is way bigger on ARM than on i386, this cannot be ascribed to a few compiler_rt builtins as one may naively think, but to something else coming from the stdlib.

This innocently-looking call to panic is enough to bloat the binary so much.

I'm preparing a patch that makes the size drop down to ~1.5K.

LemonBoy on 18 Oct 2020

👀3

I actually was aware of the situation. I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.

I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables. It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.

I also stand by the plan to enable strip by default for ReleaseSmall. Choosing to use abort instead of panic here is functionally the same thing as enabling strip but only for this one panic call. Why not go ahead and do it for the entire compilation?

andrewrk on 19 Oct 2020

IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it. The comment in #6735 suggests this would be ugly/non-trivial?

daurnimator on 19 Oct 2020

I personally think the panic function is the right thing to do there but I would support your decision to use abort() instead.

IMO a much better fix would be eliding the HWCAP check when the CPU is known to support it.

That code path was completely untested. Calling @panic there is definitely not ok (nor is triggering a panic before the initial stuff is initialized) as the panic handler uses TLS to handle panic-in-panic in a thread-safe fashion.

I also intend to look into eliminating the TLS initialization code entirely when zig can prove there will be no threadlocal variables.

Just use --single-threaded then, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.

It's not a trivial problem but I think we can get there once the linker code is sophisticated enough to report whether there are any threadlocals in static libs and objects, and maybe an explicit CLI flag in case zig cannot infer correctly.

It's only a matter of checking if any object has a PT_TLS segment.

I also stand by the plan to enable strip by default for ReleaseSmall.

Release and Small refer to the generated code, no?
ReleaseSmaller is only a --strip away and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.

The comment in #6735 suggests this would be ugly/non-trivial?

I can try to make the check a bit nicer, I think I have an idea that will work.

LemonBoy on 19 Oct 2020

Just use --single-threaded then, the TLS is used by the panic handler and one day will be used by the stack protector and other thread-local state that we're now ignoring.

Example of the problem there:

https://github.com/ziglang/zig/blob/0c355bef9e424fbf06085f12fc28979d73b3d6af/lib/std/c/dragonfly.zig#L8-L11

Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.

ReleaseSmaller is only a --strip away and, if you're really concerned about binary bloat caused by them, you can always use the split-DWARF/split-dSYM/split pdb functionality and get the best of both worlds. Stripping by default is IMO the wrong choice.

Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.

An important thing to note here is that whether or not debug symbols are stripped is, in addition to affecting the generated code, also a comptime flag that std.debug uses to avoid including DWARF/PDB stack trace printing code.

https://github.com/ziglang/zig/blob/0c355bef9e424fbf06085f12fc28979d73b3d6af/lib/std/debug.zig#L108-L111

I can try to make the check a bit nicer, I think I have an idea that will work.

I opened #6765

andrewrk on 22 Oct 2020

Even with --single-threaded, an extern theadlocal variable has ABI implications, and, as you can see, for this particular OS you can't even make syscalls without it.

But here you can't elide the TLS initialization as the os expects errno to be thread-local and you don't have control over libc's _start, not a good example IMO.

Likewise, ReleaseSmallPlusDebug with this proposal would only be a -fno-strip away. It is a question of defaults. I think it is correct to assume that, by default, ReleaseSmall is concerned with binary bloat, especially considering Debug is a different optimization mode.

ReleaseSmall reads as "optimize for size" akin to gcc's -Os, I don't think that stripping the debug infos too is a good default as it's orthogonal to the code generation process.

My last two cents on the matter, I don't really care.

LemonBoy on 22 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings