Examples of pairs of fields we can replace:
https://github.com/rust-lang/rust/blob/c6ac57564852cb6e2d0db60f7b46d9eb98d4b449/src/librustc/ty/mod.rs#L905-L909
https://github.com/rust-lang/rust/blob/c6ac57564852cb6e2d0db60f7b46d9eb98d4b449/src/libsyntax_pos/symbol.rs#L447-L448
https://github.com/rust-lang/rust/blob/c6ac57564852cb6e2d0db60f7b46d9eb98d4b449/src/libsyntax_pos/span_encoding.rs#L114-L115
(any interner that relies on indices can be switched to IndexSet, really)
We could also wrap Index{Set,Map}, like IndexedVec wraps Vec, to use non-usize indices.
(Or we could make our own that stores indices smaller than usize, not sure how much work that is, or if we want to do it at all, cc @bluss)
FWIW, indexmap already does a little bit of size hacking with its Pos type:
https://github.com/bluss/indexmap/blob/0a06966af88c0f48f2d69d20dacfc89cebfbbf3f/src/map.rs#L76-L91
The API always presents usize though.
Huh, so this is quite different from the builtin HashMap.
cc @Amanieu I wonder how the novel parts of SwissTable/hashbrown would play with this.
Also, we never need more than 32-bit indices so we could skip the capacity-based logic.
I wonder how the novel parts of
SwissTable/hashbrownwould play with this.
I actually played with that too:
https://github.com/bluss/indexmap/compare/master...cuviper:hashbrown
cargo benchcmp
$ cargo benchcmp master hashbrown
name master ns/iter hashbrown ns/iter diff ns/iter diff % speedup
entry_hashmap_150 2,873 2,927 54 1.88% x 0.98
entry_orderedmap_150 4,329 5,393 1,064 24.58% x 0.80
few_retain_hashmap_100_000 951,637 963,837 12,200 1.28% x 0.99
few_retain_ordermap_100_000 1,520,027 2,130,924 610,897 40.19% x 0.71
grow_fnv_hashmap_100_000 1,857,807 1,849,671 -8,136 -0.44% x 1.00
grow_fnv_ordermap_100_000 4,671,162 4,274,375 -396,787 -8.49% x 1.09
half_retain_hashmap_100_000 1,033,810 1,033,141 -669 -0.06% x 1.00
half_retain_ordermap_100_000 1,590,546 2,125,692 535,146 33.65% x 0.75
hashmap_merge_shuffle 589,718 593,282 3,564 0.60% x 0.99
hashmap_merge_simple 438,375 484,048 45,673 10.42% x 0.91
insert_hashmap_100_000 2,358,045 2,356,133 -1,912 -0.08% x 1.00
insert_hashmap_10_000 204,200 203,430 -770 -0.38% x 1.00
insert_hashmap_150 3,048 3,026 -22 -0.72% x 1.01
insert_hashmap_int_bigvalue_10_000 244,731 242,986 -1,745 -0.71% x 1.01
insert_hashmap_str_10_000 229,241 227,691 -1,550 -0.68% x 1.01
insert_hashmap_string_10_000 1,089,635 1,083,683 -5,952 -0.55% x 1.01
insert_hashmap_string_10_000 1,170,007 1,068,880 -101,127 -8.64% x 1.09
insert_hashmap_string_oneshot_10_000 1,120,157 1,017,050 -103,107 -9.20% x 1.10
insert_orderedmap_100_000 2,555,141 4,359,721 1,804,580 70.63% x 0.59
insert_orderedmap_10_000 259,896 370,496 110,600 42.56% x 0.70
insert_orderedmap_150 3,788 5,477 1,689 44.59% x 0.69
insert_orderedmap_int_bigvalue_10_000 339,440 353,765 14,325 4.22% x 0.96
insert_orderedmap_str_10_000 295,646 276,151 -19,495 -6.59% x 1.07
insert_orderedmap_string_10_000 1,000,420 1,064,138 63,718 6.37% x 0.94
insert_orderedmap_string_10_000 1,019,860 1,073,530 53,670 5.26% x 0.95
iter_black_box_hashmap_10_000 26,777 26,172 -605 -2.26% x 1.02
iter_black_box_orderedmap_10_000 3,258 3,539 281 8.62% x 0.92
iter_sum_hashmap_10_000 16,425 16,179 -246 -1.50% x 1.02
iter_sum_orderedmap_10_000 2,912 2,982 70 2.40% x 0.98
lookup_hashmap_100_000_inorder_multi 100,554 99,595 -959 -0.95% x 1.01
lookup_hashmap_100_000_multi 99,856 99,167 -689 -0.69% x 1.01
lookup_hashmap_100_000_single 21 21 0 0.00% x 1.00
lookup_hashmap_10_000_exist 92,344 92,134 -210 -0.23% x 1.00
lookup_hashmap_10_000_exist_string 158,699 161,272 2,573 1.62% x 0.98
lookup_hashmap_10_000_exist_string_oneshot 128,610 128,177 -433 -0.34% x 1.00
lookup_hashmap_10_000_noexist 83,615 83,064 -551 -0.66% x 1.01
lookup_orderedmap_10_000_exist 143,015 124,320 -18,695 -13.07% x 1.15
lookup_orderedmap_10_000_noexist 143,917 98,109 -45,808 -31.83% x 1.47
lookup_ordermap_100_000_inorder_multi 110,905 130,411 19,506 17.59% x 0.85
lookup_ordermap_100_000_multi 154,486 182,357 27,871 18.04% x 0.85
lookup_ordermap_100_000_single 33 37 4 12.12% x 0.89
lookup_ordermap_10_000_exist_string 210,793 222,767 11,974 5.68% x 0.95
lookup_ordermap_10_000_exist_string_oneshot 192,294 190,735 -1,559 -0.81% x 1.01
many_retain_hashmap_100_000 448,060 449,876 1,816 0.41% x 1.00
many_retain_ordermap_100_000 1,217,638 1,569,468 351,830 28.89% x 0.78
new_hashmap 6 6 0 0.00% x 1.00
new_orderedmap 5 27 22 440.00% x 0.19
ordermap_clone_for_sort_s 446,804 474,899 28,095 6.29% x 0.94
ordermap_clone_for_sort_u32 14,368 36,350 21,982 152.99% x 0.40
ordermap_merge_shuffle 642,188 638,453 -3,735 -0.58% x 1.01
ordermap_merge_simple 443,371 438,121 -5,250 -1.18% x 1.01
ordermap_simple_sort_s 2,840,960 2,953,374 112,414 3.96% x 0.96
ordermap_simple_sort_u32 798,633 830,050 31,417 3.93% x 0.96
ordermap_sort_s 2,292,873 2,403,795 110,922 4.84% x 0.95
ordermap_sort_u32 631,663 646,202 14,539 2.30% x 0.98
pop_ordermap_100_000 1,752,485 2,500,007 747,522 42.65% x 0.70
remove_ordermap_100_000 4,484,051 6,166,361 1,682,310 37.52% x 0.73
with_capacity_10e5_hashmap 933 894 -39 -4.18% x 1.04
with_capacity_10e5_orderedmap 4,103 404 -3,699 -90.15% x 10.16
That was hacked together, so I'm sure there's room for improvement.
I wonder if the slowdown might be due to the use of usize indicies instead of u32. Could you try changing the index type to u32 to see if this improves performance?
Another thing to watch out for is to ensure that all methods are being properly inlined, since the implementation is more complex now. You can try running one of the lookup benchmarks under perf to see if something isn't getting inlined.
@cuviper Nice work on that experiment!
@eddyb I'm delighted you didn't know IndexMap is just a simple bespoke hashmap with zero unsafe code (everything on top of Vec). :blush: Of course, if there is are big enough wins, it can use unsafe.
Examples of pairs of fields we can replace:
I tried doing this one and it hurt performance on several benchmarks. Here are the check-clean instruction counts for benchmarks that saw a change.
avg: 21.4% min: 21.4% max: 21.4%
unicode_normalization-check
avg: 4.6% min: 4.6% max: 4.6%
helloworld-check
avg: 3.1% min: 3.1% max: 3.1%
issue-46449-check
avg: 1.3% min: 1.3% max: 1.3%
token-stream-stress-check
avg: 1.0% min: 1.0% max: 1.0%
unify-linearly-check
avg: 0.8% min: 0.8% max: 0.8%
await-call-tree-check
avg: 0.8% min: 0.8% max: 0.8%
deeply-nested-check
avg: 0.7% min: 0.7% max: 0.7%
tokio-webpush-simple-check
avg: 0.6% min: 0.6% max: 0.6%
regression-31157-check
avg: 0.3% min: 0.3% max: 0.3%
issue-58319-check
avg: 0.3% min: 0.3% max: 0.3%
ripgrep-check
avg: 0.3% min: 0.3% max: 0.3%
many-assoc-items-check
avg: 0.3% min: 0.3% max: 0.3%
encoding-check
avg: 0.3% min: 0.3% max: 0.3%
webrender-wrench-check
avg: 0.2% min: 0.2% max: 0.2%
coercions-check
avg: 0.2%? min: 0.2%? max: 0.2%?
tuple-stress-check
avg: 0.2% min: 0.2% max: 0.2%
syn-check
avg: 0.2% min: 0.2% max: 0.2%
regex-check
avg: 0.2% min: 0.2% max: 0.2%
deep-vector-check
avg: 0.1% min: 0.1% max: 0.1%
piston-image-check
avg: 0.1% min: 0.1% max: 0.1%
ucd-check
avg: 0.1% min: 0.1% max: 0.1%
webrender-check
avg: 0.1% min: 0.1% max: 0.1%
hyper-2-check
avg: 0.1% min: 0.1% max: 0.1%
cranelift-codegen-check
avg: 0.1% min: 0.1% max: 0.1%
futures-check
avg: 0.1% min: 0.1% max: 0.1%
cargo-check
avg: 0.1% min: 0.1% max: 0.1%
serde-serde_derive-check
avg: 0.1% min: 0.1% max: 0.1%
clap-rs-check
avg: 0.1% min: 0.1% max: 0.1%
Here is the top part of the Cachegrind diff for unused-warnings:
Ir
--------------------------------------------------------------------------------
438,295,034 (100.0%) PROGRAM TOTALS
--------------------------------------------------------------------------------
Ir file:function
--------------------------------------------------------------------------------
185,518,839 (42.33%) /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/indexmap-1.0.2/src/map.rs:indexmap::map::IndexMap<K,V,S>::get_full
74,094,065 (16.91%) /build/glibc-t7JzpG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:__memcmp_avx2_movbe
64,470,873 (14.71%) /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/indexmap-1.0.2/src/map.rs:indexmap::map::IndexMap<K,V,S>::entry
35,332,677 ( 8.06%) /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/indexmap-1.0.2/src/map.rs:indexmap::map::VacantEntry<K,V>::insert
28,585,267 ( 6.52%) /home/njn/moz/rustN/src/libcore/slice/mod.rs:indexmap::map::IndexMap<K,V,S>::get_full
26,516,813 ( 6.05%) /home/njn/moz/rustN/src/libcore/slice/mod.rs:indexmap::map::IndexMap<K,V,S>::entry
19,993,527 ( 4.56%) /home/njn/.cargo/registry/src/github.com-1ecc6299db9ec823/indexmap-1.0.2/src/map.rs:indexmap::map::OrderMapCore<K,V>::double_capacity
18,393,298 ( 4.20%) /home/njn/moz/rustN/src/libcore/num/mod.rs:indexmap::map::IndexMap<K,V,S>::get_full
5,567,548 ( 1.27%) /home/njn/moz/rustN/src/libcore/num/mod.rs:indexmap::map::IndexMap<K,V,S>::entry
So that one currently isn't viable.
FYI, I've ported indexmap to hashbrown's RawTable in bluss/indexmap#131, and so far it does appear to mitigate the loss @nnethercote saw. If we land that, and maybe the custom index support I'm playing with too, then I'll open a rust PR making use of it.
pub params: Vec<GenericParamDef>, pub param_def_id_to_index: FxHashMap<DefId, u32>,
This one isn't actually mapping to Vec indexes, but rather param.index, which isn't always the same.
The other examples and more are converted in #75278.
Most helpful comment
@cuviper Nice work on that experiment!
@eddyb I'm delighted you didn't know IndexMap is just a simple bespoke hashmap with zero unsafe code (everything on top of
Vec). :blush: Of course, if there is are big enough wins, it can use unsafe.