Dhall-haskell: Reviving fast type-checking branch

Created on 28 Jun 2019 · 7Comments · Source: dhall-lang/dhall-haskell

The purpose of this issue is to document my attempt to revive @AndrasKovacs's unfinished nbe-elaboration branch

Up until now the two main approaches that I've tried were:

Merge the nbe-elaboration branch into master one commit at a time
Merge the master into the nbe-elaboration branch one commit at a time

However, both of these approaches do not work because the initial commit of the nbe-elaboration branch deleted a large amount of code. That means that the only remaining approach left is to reverse engineer what the nbe-elaboration did to improve performance and manually reimplement those changes on top of master.

I'm using this issue to document anything on that branch that appears to be related to the performance improvement and track my progress on forwarding porting these changes to master:

[ ] Use normalization-by-evaluation for the type-checking code
[ ] Interleaving type-checking and normalization
[x] Change the Var constructor to use Int instead of Integer and UNPACK the Text field
[ ] Use pointer-equality fast path for judgmental equality
[ ] Use unsafeCoerce instead of denote
[ ] Use strict fields for abstract syntax tree
[ ] Switch to a monomorphic Val syntax tree
[ ] Use INLINE instead of INLINABLE for Dhall.Map.{toMap,keys}
[ ] Speed up Dhall.Map.insert, which is currently O(N) in the number of existing keys
[ ] Use "linear" programming style where every variable is consumed exactly once

This will make more sense if you study https://github.com/dhall-lang/dhall-haskell/commit/627a6cdea0170336ff08de34851d8bdf5180571d in depth

I'm not sure how much each of the following changes contribute to performance until I attempt to manually forward-port them.

Source

Gabriel439

❤1

Most helpful comment

@f-f link to my minimal implementation: https://github.com/AndrasKovacs/elaboration-zoo/blob/master/02-typecheck/Main.hs

In regards to NbE + type errors: bidirectional TC is orthogonal to NbE, you can have bidirectional-style TC together with any implementation of evaluation and conversion checking. Bidirectional TC is also "just" an implementation choice, which can be used to implement whatever type system is specified in the standard.

The current TC, being not bidirectional, discards some type information which could be used to give more specific error messages. My existing and planned bidirectional elaborators throw the more specific errors when possible, but otherwise try to emulate the old errors; this was a bit tedious to implement but not very difficult.

AndrasKovacs on 13 Jul 2019

❤2

All 7 comments

I intend to do the merging into master this summer; I've been busy since my last bout of activity in the repository, and I'm going to have significant spare time again roughly two weeks from now. I indeed deleted and shuffled lot of code on the nbe-elaboration branch, but in the next code sprint I plan to be more judicious and ideally retain all functionality in the dhall-haskell repository. I also think that the most efficient way to do this would be to reimplement features on the latest master. It doesn't seem too be bad because I'm much more familiar with the sources now than last time.

In any case, I'm happy to answer questions. Significant discussion (and also solicitation of testing, etc.) will be definitely necessary before the merge. I think in terms of work efficiency, it's probably best if I do the merging of elaboration, but it's OK for me if someone groks my implementation and does the same.

Commenting on features contributing to performance:

By far the most important are:

Interleaving elaboration and evaluation in the correct way. Not doing this incurs massive overheads in lots of places, including shifting the entire typing context whenever going under binders, or when infering let-expressions and applications, and also being much more strict when computing types than is necessary; with eval we only compute to head constructors in many cases.
Interleaving elaboration and import resolution, avoiding re-checking imported expressions. This is likely the single biggest issue in current dhall, because it inlines and re-checks the normal forms of every imported expression. This means that in a linear chain of linearly used imports, we typecheck the normal forms of every expression a quadratic number of times. Checking expressions which are known to be well-typed should be avoided, and checking normal forms should be absolutely avoided. On the latest nbe-elaboration commit, I also do some limited re-checking but I will definitely get rid of that.

Other features are far less important.

I strictified data fields as a general preventive measure against sneaky leaks, and it may or may not provide large boosts. If there are no expensive leaks, then it's likely only a modest improvement.
Monomorphic Val is not really an optimization, just cleaning up the unnecessary type parameter.
Pointer equality shortcut is cheap and easy, and I think it's a good improvement if we have large-ish commonly used record or sum type synonyms (which is a prevalent usage pattern from what I have seen). It's not nearly as important as the two big points though.
Inlining, unpacking, and Int usage should all have minor impact, some of them are perhaps irrelevant.
unsafeCoerce instead of denote is fairly important, but I think a significant part of coercion usage are not-so-pretty fixes when we want to do re-checking or other violation of elaboration staging. I tried to make this staging more obvious and enforced, and tried to use these type synonyms everywhere, but I did not fully succeed. I think this is quite important, and ideally I would have a type for raw syntax which is wholly different from core syntax.

I also note that the initial commit on nbe-elaboration has many bugs, and it's a better to look at the latest commit instead.

AndrasKovacs on 28 Jun 2019

❤1

@AndrasKovacs: Alright, that sounds good. I'll probably still try to forward-port the simpler self-contained optimizations (like switching Var to Int, etc.) and leave the larger changes of normalization-by-evaluation and interleaving imports/type-checking to you.

Gabriel439 on 28 Jun 2019

@jneira: Will using GHC.Prim.reallyUnsafePtrEquality cause issues for Eta?

Gabriel439 on 2 Jul 2019

@Gabriel439 it seems it is defined in ghc-prim for eta so i think it should work

jneira on 2 Jul 2019

Happy to see this happening! 🎉

We had a lovely session at ZuriHac where Andras walked me and @ocharles through the NbE algorithm and all the perf improvements happening here, so hopefully I'll be able to help more with an additional pair of eyes 🙂

(@AndrasKovacs in Zurich you mentioned some materials/links with a barebones implementation of NbE like the one that we had at the blackboard; I tried looking at Pie's implementation, but that's possibly even bigger than Dhall's. Could you share them here?)

Another thing that we discussed there but I don't see mentioned here (perhaps because it's not strictly related to perf) is the fact that NbE provides bidirectional type inference (IIRC), but as Dhall is implemented right now we don't have that, and in order to fit the NbE implementation into the current one we're discarding some useful inferred information that might make error messages nicer. IIRC not discarding them would not require changes to the standard (please correct any inaccuracies in my recalling)