It appears we preserve CodeSourceMap objects in obfuscated mode. They are used for building string representations of stack traces (one frame might be decoded into multiple functions) as well as for storing pc offset -> token position mappings.
This information is not necessary for the AOT runtime for correctness, only for stack traces and exception messages. We could therefore consider moving this information out of AOT snapshot, similarly to how we have an external obfuscation map.
These CodeSourceMap's sometimes take up 15% of the isolate snapshot (upwards of half a MB).
Would this map not be part of the APK, we would provide some tool for folks to run in order to get sane exception messages?
Would this map not be part of the APK,
No.
we would provide some tool for folks to run in order to get sane exception messages?
Precisely.
With --obfuscate mode the stack trace is already now not readable. To make it readable one needs the obfuscation map, which is generated at AOT compile-time (via --save-obfuscation-map=<file>) but not included in the APK.
What would change is that the stack trace would become shorter (i.e. we wouldn't expand frames on the stack to the actual inlined functions) and have special markers as line numbers (probably token positions). Some exception messages (in addition to stack traces) would have special markers in them.
Just like with the obfuscation map we already produce, we would produce more metadata which can expand stack frames, map token positions to source file + line numbers and special tokens in exceptions to the actual names referenced.
A tool would read in the already exiting obfuscation map and the other metadata and can produce the original stack + exception message.
When --dwarf_stack_traces=true, this information in encoded in DWARF form and the CodeSourceMaps are dropped.
When --dwarf_stack_traces=true, this information in encoded in DWARF form and the CodeSourceMaps are dropped.
That is a bug then: This information is used in AOT runtime for generation of exception messages. It probably crashes the VM if the code source maps are missing:
From runtime/vm/runtime_entry.cc:
DEFINE_RUNTIME_ENTRY(NullError, 0) {
...
const uword pc_offset = caller_frame->pc() - code.PayloadStart();
...
const CodeSourceMap& map =
CodeSourceMap::Handle(zone, code.code_source_map());
ASSERT(!map.IsNull());
CodeSourceMapReader reader(map, Array::null_array(),
Function::null_function());
const intptr_t name_index = reader.GetNullCheckNameIndexAt(pc_offset);
RELEASE_ASSERT(name_index >= 0);
const ObjectPool& pool = ObjectPool::Handle(zone, code.GetObjectPool());
const String& member_name =
String::CheckedHandle(zone, pool.ObjectAt(name_index));
NullErrorHelper(zone, member_name);
}
Even if we fix this particular code to use DWARF instead, we still are in a situation where the runtime needs the information. What I'm proposing is we a) move the information out of AOT snapshot and b) make the runtime work without it c) provide a tool to decode runtime stacks (and possibly exception messages) using the metadata on the side
That's a bug in the NullError runtime entry then, as the DWARF mechanism is older. The general case exception machinery already knows how to handle the absence of the CodeSourceMap and produce stack traces that can be symbolized, either manually with addr2line/llvm-symbolizer or more conveniently with ndk-stack.
Fixed NullError runtime entry to work in case of missing CodeSourceMap.
Assigning to @sstrickl to investigate the ways we could split this information out of the snapshot. This might really benefit some internal use cases @mkustermann reports around 10-20% in CodeSourceMap in AOT snapshots.
@sstrickl Heads up: Ryan has a change out, namely cl/81323 to directly generate ELF from gen_snapshot. This should allow us to get rid of the CodeSourceMap objects, since that can be encoded in the ELF file.
Once we have the ELF file, we can split out the source map information to reduce the ELF size (see discussion further above) and used on the side for decoding stack traces.
@mkustermann @mraleph @alexmarkov When use --dwarf_stack_traces to build snapshot, the CodeSourceMaps are dropped, but the exception information missed the stack traces, just like this:
flutter: โโโก EXCEPTION CAUGHT BY WIDGETS LIBRARY โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
flutter: The following NoSuchMethodError was thrown building CupertinoAlertDemo:
flutter: The method 'debugDescribeChildren' was called on null.
flutter: Receiver: null
flutter: Tried calling: debugDescribeChildren()
flutter:
flutter: When the exception was thrown, this was the stack:
flutter:
flutter: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
So, what's the reason and how to get the whole stack traces?
using the metadata on the side
When --dwarf_stack_traces=true, this information in encoded in DWARF form and the CodeSourceMaps are dropped.
That is a bug then: This information is used in AOT runtime for generation of exception messages. It probably crashes the VM if the code source maps are missing:
From runtime/vm/runtime_entry.cc:
DEFINE_RUNTIME_ENTRY(NullError, 0) { ... const uword pc_offset = caller_frame->pc() - code.PayloadStart(); ... const CodeSourceMap& map = CodeSourceMap::Handle(zone, code.code_source_map()); ASSERT(!map.IsNull()); CodeSourceMapReader reader(map, Array::null_array(), Function::null_function()); const intptr_t name_index = reader.GetNullCheckNameIndexAt(pc_offset); RELEASE_ASSERT(name_index >= 0); const ObjectPool& pool = ObjectPool::Handle(zone, code.GetObjectPool()); const String& member_name = String::CheckedHandle(zone, pool.ObjectAt(name_index)); NullErrorHelper(zone, member_name); }Even if we fix this particular code to use DWARF instead, we still are in a situation where the runtime needs the information. What I'm proposing is we a) move the information out of AOT snapshot and b) make the runtime work without it c) provide a tool to decode runtime stacks (and possibly exception messages) using the metadata on the side
Is there any progress?
So, what's the reason and how to get the whole stack traces?
The VM would need to print a list of return addresses / PCs from the stack. Those PCs probably need to be relative to the .text segment of the .so/.dylib file.
That should be sufficient for an external tool to read those PC offsets and use the DWARF information (which is not available on device to safe space) to decode the stack trace.
Is there any progress?
@sstrickl Has been occupied with another task, but we will come back to this in the very near future.
To give an update on this:
A cl/121857 is in-progress for making our AOT compiler output a second ELF file containing the necessary dwarf information for decoding stack traces. We still need to ensure this will be sufficient for all use-cases and ensure tooling is available to decode stack traces. We will keep the issue updated.
As a side note, flutter users can already get this size decrease by passing --extra-gen-snapshot-options=--dwarf-stack-traces to flutter tools, e.g.:
f/examples/flutter_gallery % ../../bin/flutter build aot --target-platform=android-arm64
f/examples/flutter_gallery % /bin/ls build/aot/app.so
11318192 build/aot/app.so
f/examples/flutter_gallery % ../../bin/flutter build aot --target-platform=android-arm64 --extra-gen-snapshot-options=--dwarf-stack-traces
f/examples/flutter_gallery % ls build/aot/app.so
10433456 build/aot/app.so // <== around -8%
Would a mode with no stack traces at all give an even bigger improvement? (I can imagine some developers being happy with such a trade-off.)
Would a mode with no stack traces at all give an even bigger improvement? (I can imagine some developers being happy with such a trade-off.)
Probably not:
When code source maps are not included in the AOT snapshot all the VM knows are return addresses on the stack, which it can convert to relative DSO addresses and print.
=> So it doesn't keep any additional data around we could remove.
Those DSO-relative addresses can be used for offline decoding (via DWARF).
(But there is of course other size improvement possibilities.)
@rmacnak-google any updates?
The ability to remove CodeSourceMaps from snapshots and make the information available in DWARF format has existed since March 2017 (https://codereview.chromium.org/2723213002).
As @rmacnak-google points out the support has been there for a long time.
We are just working on some tools to make it more convenient to use
addr2line / atos, which is not always available)strip / objcopy, which is not necessarily always available)and small fixes (like making inlined frames work decode correctly)
Our hope is that with the additional tooling available, we can drop the CodeSourceMaps by default - so all flutter users automatically will get smaller AOT apps in flutter-release mode.
It might be better for us to not make stripping CodeSourceMaps the default in flutter-release. It will make framework developers jobs a bit harder. The people most likely to care about the binary size are the ones that would be technically savvy enough to "symbolicate" their stacktraces.
I was playing with nim this weekend and they have 2 levers to control this --stackTrace:on and --lineTrace:on. One gives you control of having procedure names, the other lines of code. Maybe there is an option that leaves CodeSourceMaps in the binary but thins them out like nim does? That might be a better default?
@gaaclarke users in general tend to care about the binary size and I would argue that users expect default build settings to provide smallest possible side. I would not be surprised that people take Flutter "Hello, World!" with default settings and then make conclusions based on the size they see.
In any case @sstrickl is going to work on a more formal proposal which we will circulate for discussion among Flutter team.
@gaaclarke We get a _lot_ of complaints about our binary size, so I think it's safe to say that a substantial portion of our developer base does care, enough that we should make making the binary size smaller for release the default even at the cost of harder-to-symbolicate stack traces. (We already make C++ stack traces hard to symbolicate for this reason.)
@mraleph awesome. Please keep us posted!
User: Hey Flutter, my customers are getting crashes when the app starts up.
Flutter: Can we see your logs?
User: Here you go.
Flutter: Looks like there is an IO exception getting thrown, can we see your Dart debugger symbol archive for the build you submitted to the store so we can symbolicate the stacktrace?
User: What's that?
Flutter: It's that thing that's get generated when you compile your Flutter app in release mode.
User: Oh, I didn't save that.
Flutter: Hey do you care about binary size?
Users: Oh, yeah!
Flutter: Hey do you care about making it easy to debug customer/Flutter issues?
Users: Ohhhh, yeah!
Stop me if I'm wrong. I'm just sayin: thinned out stack trace info might be a better default with the option to go whole hog.
I mean, we need to base this on data. Right now, based on our surveys and such, as far as I can tell the scenes are actually:
User: Hey Flutter, my customers are getting crashes when the app starts up.
Flutter: Figure out the steps to reproduce and the devices affected, submit a bug with a reduced test case.
User: Will do.
Flutter: Hey do you care about making it easy to debug customer/Flutter issues?
Users: Can you make the binary smaller?
Okay, have finished doing my cleanups and tool creation and also have an initial version of documentation on what DWARF stack traces are and how they can work for you. PTAL, @Hixie and others interested.
I'm very interested in getting this wired up in the tool. But whether or not it's on by default, we'll need the symbolication library before we can proceed. What are the current plans for publishing it?
@jonahwilliams We have had discussions on what to name the package and agreed upon package:vm_stack_traces now. The cl/132281 is ready to land.
It will land today or tomorrow and we'll publish and ping the https://github.com/flutter/flutter/issues/48726 once it's on pub
All the infrastructure should be landed, the package:native_stack_traces is uploaded to pub, so I'll close the issue.
Please re- open if there's something missing.