Forked from https://functionalprogramming.slack.com/archives/C82N1L0MB/p1592337474256100
My team is investigating Dhall to see if it'd be a good way to ship + customize our complex Kubernetes application on our internal clusters and for our customers to use. sourcegraph/deploy-sourcegraph-dhall-archived is my PoC implementation of our Kubernetes configuration. I like what I see so far, but I'm wondering how I can improve the render times that I'm seeing.
I've posted benchmark's in sourcegraph/deploy-sourcegraph-dhall-archived's README: https://github.com/sourcegraph/deploy-sourcegraph-dhall-archived/blob/a3f32062b10e08435c6ad73085a5df8c4c30c00d/README.md#benchmarks
Going off of https://discourse.dhall-lang.org/t/figuring-out-performance-bottlenecks/251/5, I'd like to call out that my normal form is apparently ~300x the size of the Prelude.
I know that dhall-kubernetes has a lot of types, but is this size expected?
Note: I've vendored https://github.com/dhall-lang/dhall-kubernetes for sourcegraph/deploy-sourcegraph-dhall-archived following @Gabriel439 's advice here: https://functionalprogramming.slack.com/archives/C82N1L0MB/p1592340512256900?thread_ts=1592337474.256100&cid=C82N1L0MB
@ggilmore: I'm still investigating, but I'll report what I know so far and various avenues that I'm exploring.
First, make sure that you are using a newer version of dhall / dhall-to-yaml since they contain caching performance improvements.
For reference, here is the timing I get when I dhall freeze --all --inplace ./pipeline.dhall on my laptop:
$ cat pipeline.dhall
let Sourcegraph =
./package.dhall sha256:e47072cfa23584530258b5c950cbb7921486ff2fe36a49e8828eb256fca31ef7
let ToList = Sourcegraph.ToList
let c = Sourcegraph.Configuration::{=}
in ToList c
$ time dhall-to-yaml --file ./pipeline.dhall
…
real 0m10.717s
user 0m10.043s
sys 0m0.659s
For future reference, the only expression that you need to freeze for benchmarking purposes is the top-most one since no transitive expressions are resolved in the event of a cache hit.
I suspect (but have not confirmed) that most of the time spent is on decoding the cached expression, which is 50 MB:
$ du -hs ~/.cache/dhall/1220e47072cfa23584530258b5c950cbb7921486ff2fe36a49e8828eb256fca31ef7
50M /Users/gabriel/.cache/dhall/1220e47072cfa23584530258b5c950cbb7921486ff2fe36a49e8828eb256fca31ef7
... which means a decoding speed of ~4 MB/s.
There are two main avenues that I'm exploring:
To answer your two initial questions:
You can inspect the normal form by replacing dhall-to-yaml with the dhall command, but there aren't tools for identifying common patterns, yet
From a quick glance at the normal form the main reason the encoded form is large appears to be due to using typesUnion
One of the ways I'm exploring to reduce the size of the cached expression is to see if we can encode things more compactly by specifying the union type only once
Hey @Gabriel439 thanks for looking into this! A few responses:
1.33.1 for the benchmark https://github.com/sourcegraph/deploy-sourcegraph-dhall-archived/blob/285b1515ade18682d1fac33f8e70bb32100dd67c/.tool-versions#L1. 🤦 I realize now that I was freezing everything besides pipeline.dhall for the benchmark. Here are my benchmark numbers after freezing that file:
~/dev/go/src/github.com/sourcegraph/ds-dhall master
❯ ./scripts/post-order-freeze.sh
...
~/dev/go/src/github.com/sourcegraph/ds-dhall master*
❯ dhall freeze --all --inplace ./pipeline.dhall
~/dev/go/src/github.com/sourcegraph/ds-dhall master* 28s
❯ bench 'dhall-to-yaml --file pipeline.dhall'
benchmarking dhall-to-yaml --file pipeline.dhall
time 8.996 s (8.203 s .. NaN s)
0.995 R² (0.991 R² .. 1.000 R²)
mean 8.468 s (8.285 s .. 8.781 s)
std dev 299.8 ms (45.39 ms .. 371.0 ms)
variance introduced by outliers: 19% (moderately inflated)
pipeline.dhall file alone. Can you explain why this is the case?@ggilmore I am so happy not to be an isolated soul on this planet ;-)
I have been so confused around the topic of performance/freeze that I have opened this discourse discussion.
IIRC at the time I did realized it was pointless to freeze everything. It has been quite a surprise mainly because all the generated dhall-packages repositories that I know about (dhall-kubernetes, dhall-openshift, dhall-argocd ) seems to be sending the wrong message on that regard.
At the end of the day when you need to integrate these 3 different generated packages, you do realized the current state is not a straightforward journey (see https://github.com/EarnestResearch/dhall-packages/issues/62). While using a fork as a workaround this kind of code will be on your way.
To contribute in a positive way, I guess a generic api generator is the way to go but you might not have the time/opportunities to do so.
@Gabriel439 Thank you so much for looking at this.
I plan to document more about freezing idioms and when to use them in the Dhall manual but I need to finish fixing a few warts in the import UX that I ran into while documenting things. I can give a quick summary here, though:
The only purpose of freezing transitive dependencies is if you expect users to import them directly instead of the top-level dependency
For example, the Prelude assumes that users might import any file, so it freezes all of them as a precaution.
Most users don't need --cache (yet)
The --cache flag was created back when there were more frequent backwards-incompatible changes to our hashing algorithm. Specifically, the original intent behind the --cache flag was to preserve the caching features of integrity checks without the security features so that older interpreters wouldn't reject imports due to a hash mismatch.
The long-term solution is a feature I plan to propose to add language support for marking an integrity check as optimized for caching purposes rather than security purposes. When I do add such a feature then the --cache flag will transparently begin to use that feature and gain some new features (like not renaming variables to _).
The following code takes ~2.5-3 seconds to evaluate with dhall --file. This snippet omits dhall-k8s imports and the actual data behind bindings like secretVolumeMounts (but I promise it's only two or three small records).
This function gets imported and used in a few other modules, massively increasing the time for those modules to evaluate.
let Container/withSecrets
: kubernetes.Container.Type -> kubernetes.Container.Type
= \(container : kubernetes.Container.Type) ->
container
// { volumeMounts = Some
( merge
{ Some =
\(volumeMounts : List kubernetes.VolumeMount.Type) ->
volumeMounts # secretVolumeMounts
, None = secretVolumeMounts
}
container.volumeMounts
)
}
let PodSpec/withSecrets
: kubernetes.PodSpec.Type -> kubernetes.PodSpec.Type
= \(podSpec : kubernetes.PodSpec.Type) ->
podSpec
// { volumes = Some
( merge
{ Some =
\(volumes : List kubernetes.Volume.Type) ->
volumes # secretVolumes
, None = secretVolumes
}
podSpec.volumes
)
, initContainers = Some
( merge
{ Some =
\(containers : List kubernetes.Container.Type) ->
containers # [ secretsContainer ]
, None = [ secretsContainer ]
}
podSpec.initContainers
)
, containers =
Prelude.List.map
kubernetes.Container.Type
kubernetes.Container.Type
Container/withSecrets
podSpec.containers
}
let DeploymentSpec/withSecrets
: kubernetes.DeploymentSpec.Type -> kubernetes.DeploymentSpec.Type
= \(spec : kubernetes.DeploymentSpec.Type) ->
spec
with template.spec = Some
( merge
{ Some = PodSpec/withSecrets
, None =
PodSpec/withSecrets
kubernetes.PodSpec::{
, containers = [] : List kubernetes.Container.Type
}
}
spec.template.spec
)
let withSecrets
: kubernetes.Deployment.Type -> kubernetes.Deployment.Type
= \(deployment : kubernetes.Deployment.Type) ->
deployment
// { spec = Some
( merge
{ Some = DeploymentSpec/withSecrets
, None =
DeploymentSpec/withSecrets
kubernetes.DeploymentSpec::{
, selector = kubernetes.LabelSelector::{=}
, template = kubernetes.PodTemplateSpec::{
, metadata = kubernetes.ObjectMeta::{
, name = "deployment-spec-with-secrets"
}
}
}
}
deployment.spec
)
}
in { withSecrets }
We have about 4 or 5 similarly complex functions that do deeply nested appends in this module. Including those bring the module's time to evaluate up to ~5 seconds. It does drop to 3 seconds when the dhall-k8s import is frozen, though.
Running dhall --file on another module that effectively calls each of these functions once takes around 30 seconds w/o local import freezes, 22 seconds with local import freezes.
Since dhall to-directory-tree would require an explicit mapping to Prelude.JSON types, generating k8s manifests has been the one spot we're not yet using it. In the meantime, we adopted dhall-render to skip that bit of type coercion. The bash script that would call dhall-to-yaml 11 times (for 11 different configuration environments) would take several minutes to complete, but the dhall-render solution about 20 seconds. We believe this is due to these factors:
parallel resulted in too many context switches for dhall processes (probably due to lock contention when decoding large and cached k8s files?)parallel meant each dhall-to-yaml process had to pay the full cost of decoding k8s-related code in serialimport dhall-k8s and running dhall-to-yaml multiple times is very fastlet a = dhall-k8s.SomeResource::{} with x = y, import a, and then running dhall-to-yaml multiple times started getting very slow. 5 seconds -> 1 minute after introducing one function like the snippet above. Add more functions to a module that works with something common like dhall-k8s.Deployment and we saw 1 minute grow to 5.dhall to-directory-tree could be used here, we'd probably see performance like dhall-render (20 seconds).Here is another measure that might help:
~ → time dhall hash <<< https://raw.githubusercontent.com/PierreR/dhall-packages/master/openshift/package.dhall
sha256:5c0bd30ac3c1d0914770754fe82256720a04471f38a26bbc6b63bd7ebd9eea94
real 1m42.402s
user 1m34.308s
sys 0m1.727s
Not sure why it takes almost 2min to get this hash
I appreciate the additional examples, but they're no longer necessary. At this point I have enough information to reproduce the problem. The thing that will help move this issue forward is one of the two approaches I outlined previously:
@Gabriel439 Thanks for your input. Just a little question. Is dhall hash also affected by the decoder performance or is it just the size of the cached expressions ? It is not quite clear (for me) why this command would decode anything.
@PierreR: Any dhall command that interprets a cached dhall expression will decode an expression from the cache. Essentially, the cache is a content-addressable store of binary-encoded Dhall expressions, so look-up from the cache entails binary decoding.
I wanted to give an update on the approach I I believe will ultimately address this problem.
I'm primarily focusing on reducing the size of cached expressions because I believe that improving decoding performance is a dead end. My reasoning is that even if we could improve decoding performance by 10x (and I don't think we can any time soon) that would still not be the right order of magnitude for where Dhall needs to be in order to build out to the Dhall analog of Helm.
I believe the most promising approach to reducing the size of cached expressions is to preserve let bindings for types in normal forms and inferred types instead of inlining them. In other words, an expression like:
let Age = Natural
in λ(x : Age) → x + 0
... would normalize to:
let Age = Natural
in λ(x : Age) → x
... and have an inferred type of:
let Age = Natural
in ∀(x : Age) → Age
... instead of the current behavior, which is to normalize to:
λ(x : Natural) → x
... and have an inferred type of:
∀(x : Natural) → Natural
There are a few reasons I prefer this solution:
I believe it would lead to a dramatic space reduction, since every type would now only appear once in normal forms
... assuming users let-bind them, which they are highly likely to do.
This would lead to the biggest space savings for code using union literals but even other simpler code would see significant compression (since the types would be no larger than those the user authored by hand, and human-generated code is small).
It would improve the readability of normal forms and inferred types
In fact, something similar has been requested at least once before in https://github.com/dhall-lang/dhall-to-cabal/issues/22 (cc: @ocharles)
I'm still thinking through the implications and details of this change and this would require proposing a change to the language standard which might not necessarily be accepted. However, I believe the improved readability of normal forms will persuade other implementations to agree to the change.
If this change were completed I'm fairly confident it would fix the performance issues that people are reporting when creating both Kubernetes-related utilities and lists of Kubernetes resources. However, I would verify this performance improvement first before proposing the change.
Just another update on this. The original tack of preserving let-bound names in types proved to be more difficult to standardize than I anticipated, mainly because:
I haven't given up on the first approach entirely, but I'm now exploring a second less invasive approach.
The next approach I'm taking is to try adding a new type of integrity check (say, cache:sha256:…) which hashes the import that has been retrieved and parsed but not further interpreted. In other words, it encodes, hashes, and caches the expression exactly the way the user wrote the expression (including unresolved imports), before the expression has been transitively resolved, type-checked, and normalized. The only restriction would be that all transitive imports of the expression would need to be protected by integrity checks of their own (so that the final result of interpreting the expression is still immutable). In other words, this would be effectively storing a Merkle tree in the Dhall binary cache (just like how the Nix store works).
This would give a performance improvement for the same reason as the first approach I suggested: human-authored code tends to be small, so there won't be as much for the interpreter to decode.
I made some progress on debugging the performance problems with sourcegraph-dhall and it turns out that the issue might not be what I thought it was. I originally thought that the problem was storing β-normalized expressions in our cache, but that was only partly true. When I performed an experiment to store and cached non-normalized expressions that did lead to a speed-up (~2x faster), but there was still a larger bottleneck that the optimization revealed.
The problem appears to be that the same expression (i.e. the entirety dhall-kubernetes in this case) appears multiple times in the transitive closure and cache, even when the closure is not β-normalized. To see why, imagine that we have the following chain of imports:
-- ./largeImport.dhall
{- Imagine that this file represents some large expression,
like dhall-kubernetes or some other large package
-}
…
-- ./A.dhall
let largeImport = ./largeImport.dhall
in largeImport.foo
-- ./B.dhall
let largeImport = ./largeImport.dhall
in largeImport.bar
-- ./C.dhall
let A = ./A.dhall
let B = ./B.dhall
in { A, B }
If we attempt to cache ./C.dhall then the largeImport expression will show up twice in the fully-resolved expression, even if we do not β-normalize anything. This duplication is the reason why we end up with abnormally large cache sizes.
If my diagnosis is correct, then the solution is to actually change how we cache things (at least for the Haskell implementation's semi-semantic cache, and possibly also for the standard semantic cache, too), so that we permit caching unresolved expressions, so long as the imports are protected by integrity checks. Moreover, we can add integrity checks on the fly to the cached representation if their transitive dependencies are "pure".
I'll use the above example to make things more concrete. Suppose that I want to cache ./C.dhall. In order to do that I'd need to first cache all of its transitive dependencies in depth-first order, so the algorithm would be:
Hash and cache ./largeImport.dhall
… assume for this example that largeImport.dhall has no transitive imports of its own and that its hash is sha256:fff…fff
Hash and cache ./A.dhall
Specifically, cache this expression:
let largeImport = ./largeImport.dhall sha256:fff…fff
in largeImport.foo
… and assume that its hash is sha256:aaa…aaa
Hash and cache ./B.dhall
Specifically, cache this expression:
let largeImport = ./largeImport.dhall sha256:fff…fff
in largeImport.bar
Hash and cache ./C.dhall
Specifically, cache this expression:
let A = ./A.dhall sha256:aaa…aaa
let B = ./B.dhall sha256:bbb…bbb
in { A, B }
Then when we need to resolve ./C.dhall we would fetch it from the cache and interpret it like a normal Dhall expression (including import resolution). Storing the unresolved representation for each cached file would ensure that we only store large expressions at most once within the cache, which will make things much more compact, which will in turn greatly accelerate decoding speed (because there will be far less to decode).
@Gabriel439 is there another implementation of Dhall (Rust or Go ?) that scales better (to be used with dhall-kubernetes (dhall-to-yaml) ?
@PierreR: I'm not sure. A different implementation might get a constant factor improvement from a faster CBOR decoding logic, but other than that the problem I'm describing would affect all implementations because it's an issue with the language standard that needs to be fixed.
@Gabriel439 is there any trick that we might apply to the cache itself ?
We have been transitioning toward something more "monorepo" like. Unfortunately a full rebuild is now taking up to 30min.
Here are the workarounds we have been applied so far:
dhall-openapi) to include only the k8s resources we needshake to rebuild only what is requiredpackage.dhall at the top of every file (instead of using a general import)Is there something else we could try before the issue gets resolved ?
@PierreR: Not that I know of. The main issue (which is the one I'm trying to fix) is that dhall interpreters do not permit cached expressions to contain unresolved imports, so even if you were to do cache surgery to factor things in the way I described the interpreters would reject the cached expression.
@PierreR: Sorry, I missed your other suggestions when responding. There definitely is the option of trimming dhall-kubernetes down to only the resources you need. This can be done by wrapping the dhall-kubernetes import in a file that only projects out what you need, like this:
-- ./package-small.dhall
let kubernetes =
https://raw.githubusercontent.com/dhall-lang/dhall-kubernetes/master/package.dhall sha256:d541487f153cee9890ebe4145bae8899e91cd81e2f4a5b65b06dfc325fb1ae7e
in
kubernetes.{ DaemonSet, Deployment, … }
… and then if you freeze and import ./package-small.dhall it should make everything across the board smaller.
However, when creating ./package-small.dhall, take care NOT to include Resource, since that's where most of the size comes from. In fact, anything you can do to eliminate the use of the Resource union will greatly improve performance since that's what's bloating the cache.
Most helpful comment
I wanted to give an update on the approach I I believe will ultimately address this problem.
I'm primarily focusing on reducing the size of cached expressions because I believe that improving decoding performance is a dead end. My reasoning is that even if we could improve decoding performance by 10x (and I don't think we can any time soon) that would still not be the right order of magnitude for where Dhall needs to be in order to build out to the Dhall analog of Helm.
I believe the most promising approach to reducing the size of cached expressions is to preserve
letbindings for types in normal forms and inferred types instead of inlining them. In other words, an expression like:... would normalize to:
... and have an inferred type of:
... instead of the current behavior, which is to normalize to:
... and have an inferred type of:
There are a few reasons I prefer this solution:
I believe it would lead to a dramatic space reduction, since every type would now only appear once in normal forms
... assuming users
let-bind them, which they are highly likely to do.This would lead to the biggest space savings for code using union literals but even other simpler code would see significant compression (since the types would be no larger than those the user authored by hand, and human-generated code is small).
It would improve the readability of normal forms and inferred types
In fact, something similar has been requested at least once before in https://github.com/dhall-lang/dhall-to-cabal/issues/22 (cc: @ocharles)
I'm still thinking through the implications and details of this change and this would require proposing a change to the language standard which might not necessarily be accepted. However, I believe the improved readability of normal forms will persuade other implementations to agree to the change.
If this change were completed I'm fairly confident it would fix the performance issues that people are reporting when creating both Kubernetes-related utilities and lists of Kubernetes resources. However, I would verify this performance improvement first before proposing the change.