The purpose of this issue is to document my attempt to revive @AndrasKovacs's unfinished nbe-elaboration branch
Up until now the two main approaches that I've tried were:
nbe-elaboration branch into master one commit at a timemaster into the nbe-elaboration branch one commit at a timeHowever, both of these approaches do not work because the initial commit of the nbe-elaboration branch deleted a large amount of code. That means that the only remaining approach left is to reverse engineer what the nbe-elaboration did to improve performance and manually reimplement those changes on top of master.
I'm using this issue to document anything on that branch that appears to be related to the performance improvement and track my progress on forwarding porting these changes to master:
Var constructor to use Int instead of Integer and UNPACK the Text fieldunsafeCoerce instead of denoteVal syntax treeINLINE instead of INLINABLE for Dhall.Map.{toMap,keys}Dhall.Map.insert, which is currently O(N) in the number of existing keys[ ] Use "linear" programming style where every variable is consumed exactly once
This will make more sense if you study https://github.com/dhall-lang/dhall-haskell/commit/627a6cdea0170336ff08de34851d8bdf5180571d in depth
I'm not sure how much each of the following changes contribute to performance until I attempt to manually forward-port them.
I intend to do the merging into master this summer; I've been busy since my last bout of activity in the repository, and I'm going to have significant spare time again roughly two weeks from now. I indeed deleted and shuffled lot of code on the nbe-elaboration branch, but in the next code sprint I plan to be more judicious and ideally retain all functionality in the dhall-haskell repository. I also think that the most efficient way to do this would be to reimplement features on the latest master. It doesn't seem too be bad because I'm much more familiar with the sources now than last time.
In any case, I'm happy to answer questions. Significant discussion (and also solicitation of testing, etc.) will be definitely necessary before the merge. I think in terms of work efficiency, it's probably best if I do the merging of elaboration, but it's OK for me if someone groks my implementation and does the same.
Commenting on features contributing to performance:
By far the most important are:
let-expressions and applications, and also being much more strict when computing types than is necessary; with eval we only compute to head constructors in many cases.dhall, because it inlines and re-checks the normal forms of every imported expression. This means that in a linear chain of linearly used imports, we typecheck the normal forms of every expression a quadratic number of times. Checking expressions which are known to be well-typed should be avoided, and checking normal forms should be absolutely avoided. On the latest nbe-elaboration commit, I also do some limited re-checking but I will definitely get rid of that.Other features are far less important.
Val is not really an optimization, just cleaning up the unnecessary type parameter.Int usage should all have minor impact, some of them are perhaps irrelevant. unsafeCoerce instead of denote is fairly important, but I think a significant part of coercion usage are not-so-pretty fixes when we want to do re-checking or other violation of elaboration staging. I tried to make this staging more obvious and enforced, and tried to use these type synonyms everywhere, but I did not fully succeed. I think this is quite important, and ideally I would have a type for raw syntax which is wholly different from core syntax. I also note that the initial commit on nbe-elaboration has many bugs, and it's a better to look at the latest commit instead.
@AndrasKovacs: Alright, that sounds good. I'll probably still try to forward-port the simpler self-contained optimizations (like switching Var to Int, etc.) and leave the larger changes of normalization-by-evaluation and interleaving imports/type-checking to you.
@jneira: Will using GHC.Prim.reallyUnsafePtrEquality cause issues for Eta?
@Gabriel439 it seems it is defined in ghc-prim for eta so i think it should work
Happy to see this happening! 馃帀
We had a lovely session at ZuriHac where Andras walked me and @ocharles through the NbE algorithm and all the perf improvements happening here, so hopefully I'll be able to help more with an additional pair of eyes 馃檪
(@AndrasKovacs in Zurich you mentioned some materials/links with a barebones implementation of NbE like the one that we had at the blackboard; I tried looking at Pie's implementation, but that's possibly even bigger than Dhall's. Could you share them here?)
Another thing that we discussed there but I don't see mentioned here (perhaps because it's not strictly related to perf) is the fact that NbE provides bidirectional type inference (IIRC), but as Dhall is implemented right now we don't have that, and in order to fit the NbE implementation into the current one we're discarding some useful inferred information that might make error messages nicer. IIRC not discarding them would not require changes to the standard (please correct any inaccuracies in my recalling)
@f-f link to my minimal implementation: https://github.com/AndrasKovacs/elaboration-zoo/blob/master/02-typecheck/Main.hs
In regards to NbE + type errors: bidirectional TC is orthogonal to NbE, you can have bidirectional-style TC together with any implementation of evaluation and conversion checking. Bidirectional TC is also "just" an implementation choice, which can be used to implement whatever type system is specified in the standard.
The current TC, being not bidirectional, discards some type information which could be used to give more specific error messages. My existing and planned bidirectional elaborators throw the more specific errors when possible, but otherwise try to emulate the old errors; this was a bit tedious to implement but not very difficult.
I'll close this as a duplicate of #1129, which is more comprehensive
Most helpful comment
@f-f link to my minimal implementation: https://github.com/AndrasKovacs/elaboration-zoo/blob/master/02-typecheck/Main.hs
In regards to NbE + type errors: bidirectional TC is orthogonal to NbE, you can have bidirectional-style TC together with any implementation of evaluation and conversion checking. Bidirectional TC is also "just" an implementation choice, which can be used to implement whatever type system is specified in the standard.
The current TC, being not bidirectional, discards some type information which could be used to give more specific error messages. My existing and planned bidirectional elaborators throw the more specific errors when possible, but otherwise try to emulate the old errors; this was a bit tedious to implement but not very difficult.