We need to figure out what parts of mono.wasm and mono.asm.js we are using and remove everything else.
The original approach is described below.
I was thinking about giving this a try, but I'm not sure what After exercising all reachable code paths entails. I also couldn't find an annotateWast function referenced anywhere, but I think I might be able to get decent results out of Wasabi.
The name Wasabi stands for WebAssembly analysis using binary instrumentation,
A framework for dynamic analysis of WebAssembly programs, developed in the Software Lab at TU Darmstadt.
@javiercn was there a particular set of actions you had in mind, or just executing a large portion of the existing demos?
@danroth27 @stevesandersonms
@drewdelano Thanks for your interest.
We don't yet have a final plan for how to know what to strip out from mono.wasm. The most realistic approach is probably to find ways of understanding what's in mono.wasm and making case-by-case decisions about what could be removed. For example, if large parts of it deal with low-level APIs for TCP listeners, we could remove them, because web apps can't listen for direct incoming TCP connections.
The approach mentioned by @javiercn above is really a description of a proof-of-concept form of optimization I did a while ago. It's not a totally realistic approach, since in reality you can't enumerate every possible code path. If you have any clever ideas about how this could be make to work reliably that would definitely be interesting.
If you have time to investigate things in this area, it would be great to find out what we can about how the payload size of mono.wasm breaks down into different bits of functionality so we could reason about what might be viable to remove. If you find that tools like Wasabi provide good insights, or even any info at all, it would be great to know about that!
@SteveSandersonMS I'll take a look at it this weekend.
The two approaches I see are:
Instrument mono.wasm and set the pre-call hook to batch and then report which functions are called. Then merge this change into master and wait for better data. The downside is that then there needs to be some sort of server collection point and it seems a little sketchy from a privacy point of view.
Same basic concept as above, but with a local collection point per site. The idea being the debug build would always have the full mono.wasm, but release build mode would trim mono.wasm to only the functions that were hit while in debug mode. The downside is that if you miss something in local testing it would break in release mode and if you stopped using something you'd need some way to clear the local cache of used functions or it would always include some functions you weren't using.
That's great to hear. I'll be interested in what you find!
As for making optimisations that could be built into the product, we'd probably need to be able to reason about why a particular thing is legit to strip out (as in, why we believe it's never going to be used in anyone's app, e.g., because it's some code specific to running on Win32 or macOS which isn't applicable on WebAssembly). Observing that something happens not to be used during a particular test pass is interesting and can guide us towards knowing what to remove, but on its own wouldn't be enough of a guarantee that it's safe to remove in all cases.
Wasabi didn't work, sadly. It just results in an "Aw, Snap!" page in Chrome.
I moved on from there to a more manual approach:
mono.wasm to text so I could edit it (func (;instr;) (type 1) (param i32)
i32.const 134211724
i32.const 0xdeadbeef ;; [134211724] = 0xdeadbeef
i32.store
i32.const 134211728
get_local 0
i32.add
i32.const -1
i32.store8 ;; [134211728 + funcNum] = 0xff
)
This is a very hacky function that always writes a breadcrumb (0xdeadbeef) + 0xff at offset just past our breadcrumb + the function number offset. If a function WAS called then there should be 0xff at it's offset; otherwise, it should be 0. There's probably a way better way to do this, but I was having difficulty understanding the WASM memory model.
Expand this, if you get something like this error: expected 2 arguments, got 1 (func (;instr;) (type 1) (param i32)
It means the types have shifted at the top of the wat file you you'll need to update the type. Scroll to the top of the file and find the one that takes a single i32 parameter and returns nothing like this:
(type (;2;) (func (param i32)))
Then update the (type 1) in the function definition ((func (;instr;) (type 1) (param i32)) to have the same type as found at the top of the wat file e.g.:
(func (;instr;) (type 2) (param i32)
var lines = File.ReadAllLines(@"PATH_TO_PROJECT\FlightFinder\FlightFinder.Client\bin\Debug\netstandard2.0\dist\_framework\wasm\mono.txt").ToList();
for(var i = 0; i < lines.Count;i++) {
var line = lines[i];
if (line.TrimStart(' ').StartsWith("(func (;") && !line.Contains(";instr;")) {
var funcNumStr = line.Split(';')[1];
var funcNum = int.Parse(funcNumStr);
var targetLine = i + 1;
while(lines[targetLine].Contains("(local ")) {
targetLine++;
}
lines.InsertRange(targetLine, new[] {
"i32.const "+funcNumStr,
"call 5425" //********* ...I said it would come up later
});
}
}
File.WriteAllLines(@"PATH_TO_PROJECT\FlightFinder\FlightFinder.Client\bin\Debug\netstandard2.0\dist\_framework\wasm\mono.txt", lines);
This adds two instructions to each function: a const of the function offset + a call to our instrumentation function
mono.js:function dumpUsage() {
var arr = new Uint8Array(Module.wasmMemory.buffer);
for(var i = 0; i < arr.length; i++) {
if(arr[i] == 0xde && arr[i - 1] == 0xad && arr[i - 2] == 0xbe && arr[i - 3] == 0xef) {
console.log("found deadbeef at: " + i);
break;
}
}
console.log("done looking...");
var whereDeadbeefWasFound = 134211727;
arr = arr.subarray(whereDeadbeefWasFound + 4, whereDeadbeefWasFound + 4 + 5424); // 4 to skip past the breadcrumb, 5424 the number of functions to potentially cover
var funcsCovered = arr.filter(function(e) { return e == 255 }).length;
var badData = arr.filter(function(e) { return e != 255 && e != 0 }).length;
console.log("funcs covered: " + funcsCovered);
console.log("percent covered: " + funcsCovered / arr.length * 100);
console.log("bad data: " + badData);
}
This is also pretty hacky. I basically scan through the entire memory space available to the Blazor app (around 128 MB worth) until I find the deadbeef breadcrumb and write that out. Once I had that I update the mono.js again to set whereDeadbeefWasFound to the number the function executed on the first step. This could be refactored to make it much better, but it was good enough to get some raw metrics out of and surprisingly fast (less than a second execution time for dumpUsage on my 4 year old laptop).
All of this yields some pretty high level stats, but I plan to drill into them more next weekend:

Now I just need a way to figure out which WASM func corresponds to which mono method 馃
@SteveSandersonMS any idea if there's anything like symbol information for the mono binaries?
That's really interesting. Thanks for the update, @drewdelano!
any idea if there's anything like symbol information for the mono binaries?
@kumpera would be the best person to ask.
The closest to symbol information is the strings custom section in the debug build of mono.wasm.
There's mono.wasm.map that gives you source location as well.
mono.wasm.map sounds like a good place to start. Thanks @kumpera
For anyone else trying to perform similar steps, I'm following a lot of the steps contained in the "How to upgrade mono" docs: https://github.com/aspnet/Blazor/blob/master/src/mono/HowToUpgradeMono.md
I had to add -s ERROR_ON_UNDEFINED_SYMBOLS=0 to mono.csproj's <CommonEmccArgs> node to get things building
Then I ran across two other interesting switches you can add to the CommonEmccArgs from emcc --help:
* "-g4": Preserve LLVM debug information. This is the highest
level of debuggability. If "-g" was used when compiling the
C/C++ sources, this shows line number debug comments, and
generates source maps.
Note:
* This debugging level may make compilation at
optimization level -O1 and above significantly slower,
because JavaScript optimization will be limited to one
core (the default in "-O0").
* Source maps allow you to view and debug the *C/C++
source code* in your browser's debugger! This works in
Firefox, Chrome and Safari.
"--emit-symbol-map"
Save a map file between the minified global names and the original
function names. This allows you, for example, to reconstruct
meaningful stack traces.
Note: This is only relevant when *minifying* global names, which
happens in "-O2" and above, and when no "-g" option was specified
to prevent minification.
md5-408bd5125ba49f6e3633c43fd0bf51de
0:Math_pow
1:Math_exp
2:Math_log
3:abort
4:enlargeMemory
5:getTotalMemory
6:abortOnCannotGrowMemory
7:___buildEnvironment
8:___clock_gettime
9:___lock
...
LINQPad script to turn symbols + wat instructions into something usable:
we want both the function offset + number of instructions so we can sort them later
var watFile = @"C:\Users\Drew\Desktop\Blazor\src\mono\dist\optimized\wasm\mono.txt";
var lines = File.ReadAllLines(watFile).ToList();
var insts = new Dictionary<string, int>();
var paranCount = 0;
var lastFunc = "";
for(var i = 0; i < lines.Count; i++) {
var line = lines[i];
// look for the start of methods
if (line.TrimStart(' ').StartsWith("(func (;") && !line.Contains(";instr;")) {
paranCount = (line.Split('(').Length - 1) - (line.Split(')').Length - 1);
lastFunc = line.Split(';')[1];
insts.Add(lastFunc , 0);
} else if(paranCount != 0) {
paranCount += (line.Split('(').Length - 1) - (line.Split(')').Length - 1);
insts[lastFunc]++;
}
}
var syms = @"C:\Users\Drew\Desktop\Blazor\src\mono\dist\optimized\wasm\mono.js.symbols";
var lines2 = File.ReadAllLines(syms);
File.WriteAllLines(syms, lines2
.Select((l, idx) => /* see foot note */ insts.ContainsKey(idx.ToString()) ? "{ \"name\": \"" + l.Split(':')[1] + "\", \"insts\": " + insts[idx.ToString()] + " }," : null)
.Where(m => m != null)
.ToList());
// foot note: import funcs also count as functions which means the first 140 or so functions don't have an implementation, so skip them as they aren't instrumented anyway
.
Updated mono.js to easily sort unused functions by their weight:
Configure to show only the top 300 as that's still probably more than we can comfortably work with
function dumpUsage() {
var arr = new Uint8Array(Module.wasmMemory.buffer);
for(var i = 0; i < arr.length; i++) {
if(arr[i] == 0xde && arr[i - 1] == 0xad && arr[i - 2] == 0xbe && arr[i - 3] == 0xef) {
console.log("found deadbeef at: " + i);
break;
}
}
console.log("done looking...");
var whereDeadbeefWasFound = 134211727;
arr = arr.subarray(whereDeadbeefWasFound + 4, whereDeadbeefWasFound + 4 + 5568); // 4 to skip past the breadcrumb, 5568 the number of functions to potentially cover
var funcsCoveredCnt = arr.filter(function(e) { return e == 255 }).length;
var badData = arr.filter(function(e) { return e != 255 && e != 0 }).length;
console.log("funcs covered: " + funcsCoveredCnt);
console.log("percent covered: " + funcsCoveredCnt / arr.length * 100);
console.log("bad data: " + badData);
var uncoveredFuncs = [];
var j = 0;
while (j < dumpUsageLookup.length - 1) {
if (arr[j] == 0) {
uncoveredFuncs.push(dumpUsageLookup[j]);
}
j++;
}
var sortedFuncs = uncoveredFuncs
.sort((a, b) => a.insts == b.insts ? 0 : a.insts > b.insts ? -1 : 1)
.map(m => m.name + " (" + m.insts + ")")
.slice(0, 300);
console.log("Uncovered funcs:");
console.log(sortedFuncs.join("\n"));
}
var dumpUsageLookup = [
{ "name": "__growWasmMemory", "insts": 4 },
{ "name": "stackAlloc", "insts": 16 },
...
];
.
Top 300 results weighted by instruction count:
Still need to chase these down in Mono and find out why they aren't being called
_interp_exec_method_full (29846)
_generate (27682)
_malloc (3420)
_SHA1Transform (3303)
_mono_profiler_cleanup (3195)
_major_scan_object_no_evacuation (3153)
_major_scan_object_with_evacuation (2886)
_mono_w32process_ver_language_name (2361)
_mono_class_setup_vtable_general (1973)
_mono_metadata_compute_size (1856)
_mono_class_layout_fields (1835)
_emit_marshal_array_ilgen (1727)
_mono_handle_exception_internal (1725)
_sgen_gc_init (1679)
_fmt_fp (1625)
_iconv (1563)
_load_aot_module (1477)
_mono_interp_transform_method (1425)
_decfloat (1348)
_vfscanf (1348)
_process_create (1274)
_printf_core (1259)
_decode_method_ref_with_target (1236)
_emit_marshal_object_ilgen (1206)
_decode_exception_debug_info (1181)
_instantiate_info (1147)
_mono_dlfree (1122)
___rem_pio2_large (1099)
_decode_patch (1077)
_mono_lookup_pinvoke_call (1074)
_simple_nursery_serial_scan_object (1017)
_mono_class_create_bounded_array (983)
_mono_class_create_runtime_vtable (947)
_inflate_info (937)
_free (894)
_tmalloc_large (856)
_emit_marshal_vtype_ilgen (845)
_mono_debug_symfile_get_seq_points (843)
_emit_native_icall_wrapper_ilgen (838)
_handle_exception_first_pass (831)
_simple_nursery_serial_scan_vtype (831)
_dispose_chunk (814)
_mono_class_setup_methods (807)
___intscan (788)
_emit_managed_allocater_ilgen (780)
_create_custom_attr (776)
_emit_managed_wrapper_ilgen (761)
_decode_llvm_mono_eh_frame (741)
_mono_class_init (729)
_emit_native_wrapper_ilgen (728)
_do_jit_call (720)
_mono_marshal_get_native_wrapper (716)
_prepend_alloc (698)
__mono_reflection_parse_type (686)
_create_object_handle_from_sockaddr (649)
_collect_nursery (649)
_mono_image_close_except_pools (636)
_mono_domain_free (631)
_mono_class_setup_events (624)
_mono_w32handle_wait_multiple (618)
_major_describe_pointer (606)
_register_icalls (603)
_mono_marshal_get_delegate_invoke_internal (603)
_monoeg_g_markup_parse_context_parse (601)
_mono_marshal_load_type_info (595)
_eg_utf8_to_utf16_general (586)
_hexfloat (577)
_emit_invoke_call (572)
_class_type_info (570)
_emit_object_to_ptr_conv (568)
_mono_error_prepare_exception (568)
_mono_runtime_class_init_full (548)
_emit_ptr_to_object_conv (546)
_mono_runtime_try_invoke_array (536)
_mono_de_process_breakpoint (530)
_release_unused_segments (524)
_emit_delegate_invoke_internal_ilgen (521)
___floatscan (521)
_sgen_unified_suspend_stop_world (512)
_load_function_full (511)
_twoway_strstr (509)
_convert_sockopt_level_and_name (505)
_load_arg (504)
_decode_type (504)
_create_domain_objects (500)
_pin_objects_from_nursery_pin_queue (497)
_mono_dlmalloc (494)
_try_realloc_chunk (493)
_mono_ppdb_get_seq_points (490)
_decode_klass_ref (479)
_load_metadata_ptrs (479)
_build_args_from_sig (473)
___rem_pio2 (471)
_process_wait (468)
_mono_class_is_assignable_from_checked (462)
_emit_marshal_string_ilgen (462)
_major_copy_or_mark_from_roots (462)
_mono_marshal_get_runtime_invoke_full (453)
_tmalloc_small (450)
_sgen_marksweep_init_internal (447)
_inet_pton (444)
_mono_w32file_create (439)
_mono_class_setup_fields (436)
_mono_reflection_get_custom_attrs_info_checked (425)
_load_local (424)
_major_scan_ptr_field_with_evacuation (423)
_init_method (420)
_do_icall (419)
_mono_thread_detach_internal (416)
_mono_w32socket_ioctl (414)
_mono_de_process_single_step (413)
_mono_interp_dis_mintop (411)
_major_dump_heap (409)
_mono_w32process_get_modules (404)
_mono_class_from_typeref_checked (403)
_mono_os_event_wait_multiple (400)
_add_segment (399)
_create_sockaddr_from_handle (397)
_ves_icall_System_Globalization_CultureData_fill_number_data (383)
_mono_dump_jit_offsets (380)
_CopyFile (379)
_mini_resolve_imt_method (376)
_scan_card_table_for_block (375)
_get_entropy_from_egd (373)
_major_copy_or_mark_object_canonical (372)
_compile_special (369)
_ves_icall_get_trace (368)
_mono_threads_transition_request_resume (368)
_mono_method_get_name_full (368)
_mono_walk_stack_full (365)
_mono_debug_symfile_lookup_location (364)
_sgen_alloc_obj_nolock (363)
_mono_w32process_module_get_name (357)
_dump_threads (356)
_do_load_header (355)
_insert_breakpoint (352)
_handle_branch (351)
_sgen_memgov_calculate_minor_collection_allowance (350)
_do_mono_metadata_type_equal (350)
_save_seq_points (349)
_jit_info_table_realloc (344)
_mono_w32file_find_next (344)
_mono_jit_runtime_invoke (339)
_find_method_4138 (338)
_mono_array_new_full_checked (338)
_mono_get_method_from_token (335)
_mono_marshal_get_synchronized_wrapper (334)
_print_jit_stats (331)
_get_manifest_resource_info_internal (329)
_mono_metadata_parse_method_signature_full (328)
_load_image (327)
_mono_lookup_internal_call_full (326)
_mono_debug_add_method (325)
_emit_marshal_safehandle_ilgen (324)
_mono_class_from_name_checked_aux (323)
_method_body_object_construct (323)
_interp_generate_mae_throw (322)
_decode_lsda (321)
_sgen_perform_collection_inner (319)
_store_arg (319)
_mini_parse_debug_option (314)
_resolve_vcall (314)
_mini_get_interp_in_wrapper (313)
_wait_callback (313)
_mono_aot_get_class_from_name (311)
_monoeg_g_convert (310)
_is_managed_binary (308)
_method_from_memberref (308)
_mono_thread_small_id_alloc (307)
_get_wrapper_shared_type (307)
_mono_threads_get_thread_dump (306)
_expm1 (306)
_ves_icall_System_Array_CreateInstanceImpl (303)
_mono_llvmonly_runtime_invoke (303)
_qsort (302)
_mono_w32socket_connect (301)
_sgen_los_sweep (301)
_mono_w32process_ver_query_value (301)
_sgen_qsort_rec (297)
_emit_synchronized_wrapper_ilgen (297)
_mono_gc_run_finalize (296)
_mono_llvm_match_exception (295)
_create_runtime_invoke_info (292)
_ves_icall_System_Reflection_Assembly_InternalGetType (289)
_mono_aot_get_unbox_trampoline (288)
_expm1f (287)
_mono_reflection_get_token_checked (286)
_sgen_output_log_entry (285)
_do_mono_image_load (283)
_mono_domain_try_unload (283)
_ves_icall_type_GetTypeCodeInternal (282)
_mono_error_box (282)
_mono_thread_attach_internal (282)
_mono_marshal_get_array_address (281)
_ves_icall_get_frame_info (281)
_mono_make_shadow_copy (281)
_mono_lock_free_free (280)
_do_mono_metadata_parse_type (278)
_get_basic_blocks (277)
_mono_ftnptr_to_delegate_handle (275)
_probe_for_partial_name (275)
_fmodf (275)
_sgen_pin_stats_report (275)
_invoke_array_extract_argument (275)
_emit_thunk_invoke_wrapper_ilgen (274)
_major_finish_collection (274)
_two_arg_branch (274)
_split_cmdline (272)
_ves_icall_System_Globalization_CalendarData_fill_calendar_data (272)
_mono_get_local_interfaces (271)
_mono_network_get_data (270)
_check_usable (269)
_wrap_non_exception_throws (267)
_ves_icall_MonoEventInfo_get_event_info (267)
_finish_gray_stack (264)
_mono_w32handle_signal_and_wait (263)
_mono_type_get_object_checked (262)
_handle_dim_conflicts (262)
_mono_ldtoken_checked (261)
_mono_w32handle_wait_one (260)
_mini_add_method_wrappers_llvmonly (260)
_sys_alloc (259)
_mono_conc_hashtable_remove (258)
_MoveFile (258)
_emit_array_address_ilgen (257)
_mono_get_address_info (257)
_mono_threads_transition_request_suspension (256)
_wait_or_register_method_to_compile (256)
_store_inarg (254)
_get_generic_context_from_stack_frame (254)
_ves_icall_RuntimeType_GetPropertiesByName_native (253)
_monoeg_g_iconv (252)
_create_thread (251)
_register_jit_stats (250)
_mono_assembly_load_with_partial_name_internal (250)
_monoeg_g_shell_unquote (249)
_scanexp (249)
_mono_llvmonly_get_imt_trampoline (249)
_mono_w32handle_new_internal (248)
_mono_llmult_ovf (246)
_mono_image_load_module_checked (246)
_load_tables (245)
_file_write (245)
_mono_resolve_generic_virtual_call (245)
_mono_try_assembly_resolve_handle (244)
_mono_lock_free_queue_dequeue (244)
_call_unhandled_exception_delegate (244)
_mono_w32mutex_abandon (243)
_sgen_build_nursery_fragments (243)
_field_from_memberref (243)
_ves_icall_System_Array_GetValue (242)
_ves_icall_System_Enum_compare_value_to (242)
_file_getfilesize (242)
_mono_unwind_decode_llvm_mono_fde (240)
_ves_icall_RuntimeType_GetInterfaces (240)
_mono_thread_suspend_all_other_threads (240)
_mono_w32socket_transmit_file (240)
_mono_w32socket_convert_error (240)
_file_setendoffile (239)
_mono_callspec_eval (238)
_ves_icall_System_ValueType_InternalGetHashCode (238)
_acos (238)
_pop_arg (237)
_fill_runtime_generic_context (237)
_decode_cached_class_info (236)
_sgen_finalize_in_range (236)
_major_start_major_collection (236)
_mono_code_manager_reserve_align (235)
_internal_memalign_13532 (235)
_ves_icall_System_Threading_ThreadPool_RequestWorkerThread (234)
_msort_method_addresses_internal (234)
_mono_print_method_from_ip (233)
_mini_add_method_trampoline (230)
_sgen_client_cardtable_scan_object (230)
_mono_gc_memmove_aligned (230)
_mono_w32file_get_attributes_ex (229)
_mono_assembly_load_friends (229)
_ves_icall_MonoPropertyInfo_get_property_info (229)
_mono_get_delegate_virtual_invoke_impl (228)
_bb_insert (228)
_mono_class_name_from_token (227)
_mini_get_gsharedvt_in_sig_wrapper (227)
_ves_icall_RuntimeTypeHandle_type_is_assignable_from (227)
_mint_type (227)
_mono_native_state_add_version (226)
_predef_writable_update (226)
_mono_ppdb_lookup_method_async_debug_info (226)
_inflate_generic_signature_checked (226)
_ves_icall_System_Threading_Mutex_ReleaseMutex_internal (225)
_mini_get_shared_gparam (225)
_atan2f (225)
_mono_ppdb_load_file (224)
_ves_icall_System_Reflection_Assembly_GetFilesInternal (224)
_mono_image_load_file_for_image_checked (224)
_mono_marshal_get_delegate_end_invoke (223)
_memcpy (223)
_parse_attributes (222)
_mono_get_exception_type_initialization_handle (222)
_mono_ppdb_lookup_location (221)
_mini_get_gsharedvt_out_sig_wrapper (221)
.
So far it looks like a random bag of stuff that we probably don't use: threading, file system stuff, sockets, some things I'm surprised aren't called (_mono_jit_runtime_invoke, _interp_exec_method_full) and some things that might be called if the sample code changed (atan2f, _memcpy, _ves_icall_System_Globalization_CalendarData_fill_calendar_data).
@SteveSandersonMS Does anything else stand out to you? Think I should drill into _why_ those exist in Mono or just try stubbing them and seeing what happens? Or do you think I should try something else?
@drewdelano the simple way to remove this functions is by changing how mono is built.
This is done by introducing a "enable-minimal" feature and use it through the code to disable the codepaths we don't want.
I'll cook up a sample PR that shows how to do this sort of work on mono.
Hmm, process is not a good case for enable-minimal, but to replace the unix port with a wasm specific.
Here's a skeleton commit that sets up the stage to remove a lot of the non-sense platform stuff like process/socket support. https://github.com/kumpera/mono/commit/5a4fcf5e27a5456790b2febab1f0c6fe2c5b8e80
It still requires some work
I was going to start by testing this concept locally by just removing the instructions for those functions in the wat file and then rebuilding it. Just to test that all of this is actually working.
But I agree changing it in mono is probably the best long term goal.
I got distracted with some other stuff. I should be returning to this next weekend.
This should be covered as part of #22432.