Some time ago I migrated my crate to serde from rustc-serialize and I was somewhat dissapointed that my binary size increased approximately 2x (and so did compilation times).
If I run nm --size-sort -S, I get a lot of symbols like this:
0000000100120c40 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h0683deb22b9931ffE
0000000100123bb0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h2fd21583584d6bf1E
00000001001471c0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h539ca1b6dbf1cb43E
000000010014e650 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5806957261f0bc9bE
0000000100096cd0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb74320b766f864cbE
00000001001a59e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcbf6c07be4ec0d3bE
0000000100109790 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcc2221ebbabcbce1E
000000010015c1e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcccb666beff69b03E
0000000100191100 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hd99ca0f77ada84c1E
00000001001117e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hec08322395fba03eE
000000010010c440 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf3862946ff0b8b3cE
...
00000001000b5140 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h21580c3267c18d93E
0000000100138910 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5c55d12e7448b4a3E
0000000100193c30 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5d14cb7b1aee2532E
00000001000ab9f0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5df1c90391bf8f60E
000000010014b7e0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h6b721180eb040b2dE
00000001000d6210 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h85b05d12103203d6E
00000001000b00f0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h9385aa3d135608f9E
0000000100106900 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb58e2b12ab99f16dE
00000001000f46a0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb9a69783daa5ea47E
00000001000bf7c0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hbca6642924436262E
0000000100144350 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17he7129a26a2066f08E
00000001001355e0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf7e87397027fcc4aE
00000001000ffbb0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf8b195b1ed175901E
With rustc-serialize, seems like there is no such symbol mutiplication.
OS X rustc 1.9.0-nightly (a43eb4e77 2016-04-12), serde 0.7.0 (I also tried on Windows and results are the same)
Are you compiling with optimizations? Have you tried enabling link time optimizations?
I run with -O3. I'll try with -C lto soon.
But it's still suspicious. Both rustc-serialize and serde are compiled with same compiler settings and I get different results.
Well... Rust's monomorphisation doesn't merge identical implementations yet. We could improve on this probably by adjusting the macros that generate all the code
-C lto doesn't help much. ~3% difference in binary size.
From IRC on how eddyb creates a count of monomorphized functions in order to debug compiler memory usage (and also executable size):
eddyb: unoptimized LLVM IR
eddyb: first used grep '^define' to get only the lines defining function bodies
eddyb: then regex replace in my editor to remove everything before @ and everything after (
eddyb: then sort | uniq -c
I will try this and see if there is any low-hanging fruit.
Command to show which symbols contribute the most to binary size:
nm -S target/debug/json-benchmark \
| awk '{
print $4 |& "rustfilt";
"rustfilt" |& getline id;
sub(/::h[0-9a-f]{16}$/, "", id);
sums[id] += strtonum("0x"$2);
counts[id] += 1
}
END{
for (id in sums) {
printf "%8s %4s %s\n", sums[id], counts[id], id
}
}' \
| sort -nr \
| head -16
The first column is total number of bytes associated with that function across all the different instantiations of it, second column is number of different instantiations of the function (with different generic types etc). I ran it on json-benchmark for lack of a better thing to run it on. Debug mode:
594744 85 <serde_json::de::Deserializer<R>>::parse_value
187045 85 <serde_json::de::Deserializer<R>>::parse_integer
183834 85 <serde_json::de::Deserializer<R>>::parse_exponent
180242 85 <serde_json::de::Deserializer<R>>::parse_decimal
129856 64 <serde_json::de::SeqVisitor<'a, R> as serde::de::SeqVisitor>::visit_seed
76946 26 <serde_json::de::MapVisitor<'a, R> as serde::de::MapVisitor>::visit_key_seed
73037 85 <serde_json::de::Deserializer<R>>::visit_f64_from_parts
72937 85 <serde_json::de::Deserializer<R>>::parse_number
69700 85 <serde_json::de::Deserializer<R>>::parse_exponent_overflow
68943 85 <serde_json::de::Deserializer<R>>::parse_long_integer
65195 85 serde_json::error::Error::fix_position
59979 1 <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
44385 55 <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeMap>::serialize_value
36773 1 <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize for json_benchmark::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
35042 1 <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
26925 1 <json_benchmark::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize for json_benchmark::citm_catalog::CitmCatalog>::deserialize::__Visitor as serde::de::Visitor>::visit_map
Release mode:
174550 42 <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
53537 4 <serde_json::de::Deserializer<R>>::parse_value
29883 61 <serde_json::de::Deserializer<R>>::parse_integer
19929 61 <serde_json::de::Deserializer<R>>::parse_exponent
18492 61 <serde_json::de::Deserializer<R>>::visit_f64_from_parts
16438 61 <serde_json::de::Deserializer<R>>::parse_decimal
15927 1 <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
12567 1 <json_benchmark::serde_json::Library as json_benchmark::Library>::parse_struct
12425 1 <json_benchmark::serde_json::Library as json_benchmark::Library>::stringify_struct
12240 1 <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize for json_benchmark::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
11186 1 json_benchmark::twitter::_IMPL_SERIALIZE_FOR_Status::<impl serde::ser::Serialize for json_benchmark::twitter::Status>::serialize
10693 1 json_benchmark::num_trials
10084 61 <serde_json::de::Deserializer<R>>::parse_exponent_overflow
10076 52 <serde_json::de::Deserializer<R>>::parse_number
9570 1 stats_arena_print
8628 21 <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeStruct>::serialize_field
The number parsing is definitely something we can improve. Those functions are generic on the Visitor type but they do a lot of work that does not touch the visitor and could be factored out.
I don't see any other obvious culprits. We just need to work through these and reduce the number of redundant copies of code being compiled.
If anyone has large binary projects using Serde where you care about compile time / executable size, please comment with a link to the repo. I would like to make sure that the things we optimize will actually improve your situation.
https://github.com/serde-rs/json/issues/313 has a good minimal case to reproduce this.
extern crate pandoc_ast;
fn main() {
pandoc_ast::filter(String::new(), |p| p);
}
Instantiations are listed in https://github.com/serde-rs/json/issues/313#issuecomment-298806623. Again I think we should tackle the number parsing area first.
I'm seeing a pretty big impact on xi-editor.
Debug:
399029 88 <serde::private::de::content::ContentDeserializer<'de, E> as serde::de::Deserializer<'de>>::deserialize_any
198961 3289 core::ptr::drop_in_place
181294 502 <core::option::Option<T>>::map
173009 68 serde_json::value::de::<impl serde::de::Deserializer<'de> for serde_json::value::Value>::deserialize_any
165551 401 <core::result::Result<T, E>>::map
76056 86 <alloc::raw_vec::RawVec<T, A>>::double
64426 20 <serde::private::de::content::ContentRefDeserializer<'a, 'de, E> as serde::de::Deserializer<'de>>::deserialize_any
62602 116 <core::result::Result<T, E>>::map_err
59192 28 <alloc::btree::map::IntoIter<K, V> as core::iter::iterator::Iterator>::next
54314 134 <alloc::btree::node::NodeRef<BorrowType, K, V, Type>>::ascend
51567 63 <alloc::raw_vec::RawVec<T, A>>::reserve
51300 76 <alloc::raw_vec::RawVec<T, A>>::allocate_in
50644 108 <serde::de::value::MapDeserializer<'de, I, E>>::end
43181 29 std::collections::hash::map::search_hashed
42266 67 <alloc::vec::Vec<T>>::extend_desugared
41073 67 <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::spec_extend
Release:
204186 1732 core::ptr::drop_in_place
168838 54 <core::marker::PhantomData<T> as serde::de::DeserializeSeed<'de>>::deserialize
126183 11 <serde::private::de::content::ContentDeserializer<'de, E> as serde::de::Deserializer<'de>>::deserialize_any
72758 1 <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
55573 1 <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_EditNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::EditNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
48550 7 serde_json::value::de::<impl serde::de::Deserializer<'de> for serde_json::value::Value>::deserialize_any
46759 1 <xi_core_lib::internal::plugins::rpc::_IMPL_DESERIALIZE_FOR_PluginNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::rpc::PluginNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
41345 1 xi_core_lib::internal::tabs::Documents::handle_notification
40838 1 <xi_core_lib::internal::plugins::rpc::_IMPL_DESERIALIZE_FOR_PluginNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::rpc::PluginNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
35555 1 <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
32647 1 <xi_core_lib::internal::plugins::manifest::_IMPL_DESERIALIZE_FOR_PluginDescription::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::manifest::PluginDescription>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
32604 111 <alloc::arc::Arc<T>>::drop_slow
31838 1 <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_EditNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::EditNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
24323 1 <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreRequest::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreRequest>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
22424 3 ref.1B
21805 8 serde::de::SeqAccess::next_element
That's measured on linux. Happy to do more digging, and try low-hanging fruit that might improve things.
Thanks Raph! Those ContentDeserializer::deserialize_any, ContentRefDeserializer::deserialize_any, and Value::deserialize_any numbers definitely should not be that high. I will see what I can do. So far there has not been any effort to optimize any of those.
ContentRefDeserializer shows up when dealing with untagged enums and ContentDeserializer shows up in internally tagged and adjacently tagged enums. They are the mechanism Serde uses for buffering data until we are able to determine which variant of the enum we are looking at.
@dtolnay We're heavily using the tag mechanism on enums (see https://github.com/google/xi-editor/blob/master/rust/core-lib/src/rpc.rs#L89), so not surprising if that's the code path affected.
Thanks for any help in improving this.
As a start, I opened https://github.com/google/xi-editor/pull/617 with a 22% improvement from upgrading to serde 1.0.39 and serde_json 1.0.15.
I opened https://github.com/google/xi-editor/pull/628 with a 12% improvement this time focusing on the release binary.
Most helpful comment
Command to show which symbols contribute the most to binary size:
The first column is total number of bytes associated with that function across all the different instantiations of it, second column is number of different instantiations of the function (with different generic types etc). I ran it on json-benchmark for lack of a better thing to run it on. Debug mode:
Release mode:
The number parsing is definitely something we can improve. Those functions are generic on the Visitor type but they do a lot of work that does not touch the visitor and could be factored out.
I don't see any other obvious culprits. We just need to work through these and reduce the number of redundant copies of code being compiled.