Compiler: Faster dict implementation by improving code generation

Created on 26 Jun 2017 · 3Comments · Source: elm/compiler

So I ran a simple benchmark, and got some interesting results. Here is the benchmark:

        describe "Dict"
            [ Benchmark.compare "Dict Get"
                (benchmark2 "First" Dict.get 1 dict)
                (benchmark2 "Last" Dict.get (n - 1) dict)
            ]

And here are the results:

bench_orig

Retrieving the last/highest/greatest element in a dictionary is slower than retrieving the first/lowest/least. This seemed odd, so I did some investigating. I'm not going to list up everything I tried, but those that made a difference.

Every change I did was made to the compiled javascript of Dict.get. It originally looks like this:

var _elm_lang$core$Dict$get = F2(
    function (targetKey, dict) {
        get:
        while (true) {
            var _p15 = dict;
            if (_p15.ctor === 'RBEmpty_elm_builtin') {
                return _elm_lang$core$Maybe$Nothing;
            } else {
                var _p16 = A2(_elm_lang$core$Basics$compare, targetKey, _p15._1);
                switch (_p16.ctor) {
                    case 'LT':
                        var _v20 = targetKey,
                            _v21 = _p15._3;
                        targetKey = _v20;
                        dict = _v21;
                        continue get;
                    case 'EQ':
                        return _elm_lang$core$Maybe$Just(_p15._2);
                    default:
                        var _v22 = targetKey,
                            _v23 = _p15._4;
                        targetKey = _v22;
                        dict = _v23;
                        continue get;
                }
            }
        }
    });

If one removes the continue get; statements of that code, as well as the get label above the while-statement, this is the result:

bench_no_label

That's a hefty performance improvement for the "first/lowest/least" case. To improve performance for "last/greatest/highest", one needs to change the order of the switch statements, so that the function now looks like this:

var _elm_lang$core$Dict$get = F2(
    function (targetKey, dict) {
        while (true) {
            var _p15 = dict;
            if (_p15.ctor === 'RBEmpty_elm_builtin') {
                return _elm_lang$core$Maybe$Nothing;
            } else {
                var _p16 = A2(_elm_lang$core$Basics$compare, targetKey, _p15._1);
                switch (_p16.ctor) {
                    case 'LT':
                        var _v20 = targetKey,
                            _v21 = _p15._3;
                        targetKey = _v20;
                        dict = _v21;
                    case 'GT':
                        var _v22 = targetKey,
                            _v23 = _p15._4;
                        targetKey = _v22;
                        dict = _v23;
                    default:
                        return _elm_lang$core$Maybe$Just(_p15._2);
                }
            }
        }
    });

Then you get the following performance:

bench_final

I also tried removing the label and continue-statements from other recursive functions like List.foldl and (my implementation of) Array.get, but saw no performance difference.

Takeways

Removing labels and continue statements were they are not needed can significantly improve performance for complex recursive functions (were the compiler would generate more than two continue statements). Based on what I've read, JIT compilers have trouble optimizing such loops. Removing a conditional break-statement from the initializeFromList function in the new Array implementation confirms this, as performance increased 2-3 times. In other cases, it would simply reduce code size.
Switch ordering really matters, which surprised me. In the case of Dict.get it matters a lot as the EQ case is only triggered once, if at all, while LT and GT cases are triggered "all the time".
Loops not optimized properly, for instance because of complex loop-logic, can hinder other optimizations. Changing the switch ordering so GT comes right after LT, without removing labels and continue-statements, have no effect. Only when removing the label and continue statements is performance similar.

Array performance

One interesting aspect of this, is that Dict.get after these changes is almost twice as fast as Array.get on my machine. When editing the code so that function calls are called directly (without going through A2, A3...) and all comparisons are done inline (<, >=, >...) then Array.get is faster again.

I know that inline comparisons and inline function calls are optimizations that are not easy to add right now, and is something that might come down the line. But I thought the results were interesting and worth sharing.

Conclusion

I don't know how difficult this is to implement, but it seems worthwhile :)

I originally posted this in the wrong repo. There are some comments worth reading here: https://github.com/elm-lang/core/issues/879

Also, it would be nice if someone could confirm my results. Here is a repo which makes it easy to get started: https://github.com/Skinney/elm-dict-exploration

Source

robinheghan

❤3

Most helpful comment

You are entirely right. I've apparently been working to much with languages where break in switch-statements is implicit. Sorry for wasting peoples time.

robinheghan on 28 Jun 2017

👍2

All 3 comments

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

process-bot on 26 Jun 2017

First off, thank you for exploring this. This kind of stuff is super interesting! :D

I think there is a bug in the reformulated code. In your switch the first two case statements do not end with a break so they fall through. I believe that means that you are immediately going to fall through to the EQ case and return. That means this code would never actually loop.

This suggests that the new function will not crash, but it will be producing the wrong values. This also would explain the performance improvement. It is just doing a lot less work.

Am I seeing it wrong?

evancz on 26 Jun 2017

👍2

You are entirely right. I've apparently been working to much with languages where break in switch-statements is implicit. Sorry for wasting peoples time.

robinheghan on 28 Jun 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings