So I ran a simple benchmark, and got some interesting results. Here is the benchmark:
describe "Dict"
[ Benchmark.compare "Dict Get"
(benchmark2 "First" Dict.get 1 dict)
(benchmark2 "Last" Dict.get (n - 1) dict)
]
And here are the results:

Retrieving the last/highest/greatest element in a dictionary is slower than retrieving the first/lowest/least. This seemed odd, so I did some investigating. I'm not going to list up everything I tried, but those that made a difference.
Every change I did was made to the compiled javascript of Dict.get. It originally looks like this:
var _elm_lang$core$Dict$get = F2(
function (targetKey, dict) {
get:
while (true) {
var _p15 = dict;
if (_p15.ctor === 'RBEmpty_elm_builtin') {
return _elm_lang$core$Maybe$Nothing;
} else {
var _p16 = A2(_elm_lang$core$Basics$compare, targetKey, _p15._1);
switch (_p16.ctor) {
case 'LT':
var _v20 = targetKey,
_v21 = _p15._3;
targetKey = _v20;
dict = _v21;
continue get;
case 'EQ':
return _elm_lang$core$Maybe$Just(_p15._2);
default:
var _v22 = targetKey,
_v23 = _p15._4;
targetKey = _v22;
dict = _v23;
continue get;
}
}
}
});
If one removes the continue get; statements of that code, as well as the get label above the while-statement, this is the result:

That's a hefty performance improvement for the "first/lowest/least" case. To improve performance for "last/greatest/highest", one needs to change the order of the switch statements, so that the function now looks like this:
var _elm_lang$core$Dict$get = F2(
function (targetKey, dict) {
while (true) {
var _p15 = dict;
if (_p15.ctor === 'RBEmpty_elm_builtin') {
return _elm_lang$core$Maybe$Nothing;
} else {
var _p16 = A2(_elm_lang$core$Basics$compare, targetKey, _p15._1);
switch (_p16.ctor) {
case 'LT':
var _v20 = targetKey,
_v21 = _p15._3;
targetKey = _v20;
dict = _v21;
case 'GT':
var _v22 = targetKey,
_v23 = _p15._4;
targetKey = _v22;
dict = _v23;
default:
return _elm_lang$core$Maybe$Just(_p15._2);
}
}
}
});
Then you get the following performance:

I also tried removing the label and continue-statements from other recursive functions like List.foldl and (my implementation of) Array.get, but saw no performance difference.
break-statement from the initializeFromList function in the new Array implementation confirms this, as performance increased 2-3 times. In other cases, it would simply reduce code size.Dict.get it matters a lot as the EQ case is only triggered once, if at all, while LT and GT cases are triggered "all the time".GT comes right after LT, without removing labels and continue-statements, have no effect. Only when removing the label and continue statements is performance similar.One interesting aspect of this, is that Dict.get after these changes is almost twice as fast as Array.get on my machine. When editing the code so that function calls are called directly (without going through A2, A3...) and all comparisons are done inline (<, >=, >...) then Array.get is faster again.
I know that inline comparisons and inline function calls are optimizations that are not easy to add right now, and is something that might come down the line. But I thought the results were interesting and worth sharing.
I don't know how difficult this is to implement, but it seems worthwhile :)
I originally posted this in the wrong repo. There are some comments worth reading here: https://github.com/elm-lang/core/issues/879
Also, it would be nice if someone could confirm my results. Here is a repo which makes it easy to get started: https://github.com/Skinney/elm-dict-exploration
Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!
Here is what to expect next, and if anyone wants to comment, keep these things in mind.
First off, thank you for exploring this. This kind of stuff is super interesting! :D
I think there is a bug in the reformulated code. In your switch the first two case statements do not end with a break so they fall through. I believe that means that you are immediately going to fall through to the EQ case and return. That means this code would never actually loop.
This suggests that the new function will not crash, but it will be producing the wrong values. This also would explain the performance improvement. It is just doing a lot less work.
Am I seeing it wrong?
You are entirely right. I've apparently been working to much with languages where break in switch-statements is implicit. Sorry for wasting peoples time.
Most helpful comment
You are entirely right. I've apparently been working to much with languages where
breakin switch-statements is implicit. Sorry for wasting peoples time.