Dhall-haskell: Using env in import causing a huge performance hit

Created on 23 Jun 2020  Â·  16Comments  Â·  Source: dhall-lang/dhall-haskell

As discussed in discourse , benchmarking some examples from https://github.com/PierreR/dhall-packages reveals that replacing an env variable used to represents an import by its value causes a performance boost of nearly 50% (from ~35s to ~18s) :

- let oc = env:OC -- OC='../package.dhall sha256:088....'
+ let oc = ../package.dhall sha256:088.....

This has been tested with

bench 'dhall-to-yaml --file ./openshift/examples/application1.dhall --output ./openshift/examples/application1.yaml'

Related to dhall-lang/dhall-lang#975

performance

Most helpful comment

The tentative solution I have in mind for this is to add a special case to the import resolution logic to skip type-checking if the import is just a trivial re-export. I need to actually try it, though, before I can say for sure if that solution will work

All 16 comments

In discourse @lisael provides a nice explanation together with a suggestion to mitigate the problem.

Thanks for making a proper report, @PierreR! :)

Since this seems to be a perf issue with the Haskell implementation, I'll move it to dhall-haskell.

Just to clarify, it's not using an environment variable import per se that is the problem. The issue is that if you wrap an import with an integrity check inside of another import without an integrity check then performance slows down. The reason why is because any import without an integrity check needs to be type-checked.

@sjakobi Thanks for moving the issue (I didn't think twice before submitting it).

Would like to say this is a big deal for us because the overall performance that we face while working with our attempt to provide a more user-friendly high-level openshift dhall package is well problematic.

Because this repo will be use in many other dhall files, forcing us to provide the whole import with its integrity hash in each of these files represents a major operational burden.

I also believe this change of performance behaviour is quite difficult to understand from the point of view of the end user (at least for me it is ;-)

The tentative solution I have in mind for this is to add a special case to the import resolution logic to skip type-checking if the import is just a trivial re-export. I need to actually try it, though, before I can say for sure if that solution will work

While trying to find out workarounds around our dhall-kubernetes performance problems, I have realized that this issue is not fixed for me (dhall v1.34 or the latest v1.35.0)

As a reminder, given:

→ echo $OC
http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

Here is the benchmark for a file that uses such an import:

  1. Using let oc = env:OC
dhall type --file its/application/prd/prometheus/application.dhall --quiet  33.22s user 1.13s system 99% cpu 34.540 total
  1. Using let oc = http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695
dhall type --file its/application/prd/prometheus/application.dhall --quiet  18.22s user 0.77s system 99% cpu 19.066 total

or

Using let oc = ../../../../../../bric/cicd/dhall/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

dhall type --file its/application/prd/prometheus/application.dhall --quiet  17.14s user 0.62s system 99% cpu 17.793 total

The difference between the 2 is consistently significant.

Is there anything I could try to avoid repeating http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695 in every files apart from writing a script that would update such information anytime we release a new version of our package.dhall ?

It looks like my best option is to do:

let oc =
      env:OC sha256:9121f94c10754651cef3558bb607cff9821341d23360b65a2616ca392e195695

and use dhall freeze to update all the sha256 when an updates is pushed.

@Gabriel439 I don't quite understand why but is this correct ?

@PierreR: I think it will be better for me to focus on the other import-related optimizations rather than try to solve this specific issue, since they will have a larger impact on performance

@Gabriel439 thanks ! Actually doing a dhall freeze across all files explodes my 8G Ram very quickly ;-) (In the meanwhile I can always do a sed )

@Gabriel439 thanks ! Actually doing a dhall freeze across all files explodes my 8G Ram very quickly ;-) (In the meanwhile I can always do a sed )

That sounds like a separate perf issue. Could you open an issue, ideally with instructions how to reproduce the problem?

@sjakobi I am pretty sure the explosion of RAM is a consequence of the performance issue discussed in #1890. I was using fd to do a dhall freeze across multiple files which means it was happening in parallel. I have similar issues with shake if I use a shakeThreads above 1.

I can fill another issue but I think it is known that given the current performance profile trying to put parallelism in the mix is not the best idea.

Unfortunately sometimes parallelism is so natural that you don't realize immediately that it is the source of the explosion. Sorry for the confusion. I should have been more precise.

@Gabriel439 Could we re-open this issue ? Even if it is going to be fix later on by the new import-related optimizations ?

Exposing sha256 in all files is far from ideal for us because:

  • We are exposing internal optimizations to our non technical clients. Purely on a image/selling point of view that's not good because we can't build a non technical abstraction for our user.
  • Updating the sha256 is now a PR of hundred files instead of just in one single clean place

@PierreR: I'd need to know more details, but I believe you do not need sha256 hashes on all internal imports. I'm guessing you probably only need to protect top-level imports with integrity checks.

@Gabriel439 Thanks for the suggestion. For some reason I haven't thought about it ...

I have just tried it and I have got surprising figures, performance wise.

Situation 1

use an env variable

let oc =
      env:OC sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693

OC is an env variable with this value: http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184

Situation 2

use a normal import indirection

let oc = ../../../oc.dhall

_oc.dhall_ has this content:

http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184 sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693

I have a constant factor of 2 between both setup :

# situation 1
→ time dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhall --output ./its/application/cicd-docs/application.yaml
dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhal  10.79s user 0.41s system 99% cpu 11.296 total

# situation 2
→ time dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhall --output ./its/application/cicd-docs/application.yaml
dhall-to-yaml --documents --file ./its/application/cicd-docs/application.dhal  20.03s user 0.63s system 99% cpu 20.695 total

Situation 2 (that solves my issue) is twice slower than situation 1. Given the current performance issue, I can't afford to double the generation time ;-)

@PierreR: Yeah, this is a case where I'm pretty sure I understand what is going on. The issue here is being caused by the "semi-semantic" cache, which caches intermediate imports automatically (even ones without integrity checks). So what's happening is that the large expression stored in ./oc.dhall is being decoded from the cache twice:

  • Once when importing http://stash.cirb.lan/projects/CICD/repos/dhall/raw/openshift/package.dhall?at=0.9.184 sha256:1afc2ce9652ef7d281dff47594dea75b32009a7149db88cfc6927e87696be693
  • Another time when importing ./oc.dhall (which is decoded from the local cache)

This is still a case where I believe it would be fixed by the import-related optimizations that I'm working on. In particular, one of the optimizations I'm testing is to not store imports inline within the cache (to avoid them being decoded multiple times) and instead storing references to separate cache entries to deduplicate decoding work.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chris-martin picture chris-martin  Â·  5Comments

DrSensor picture DrSensor  Â·  4Comments

DrSensor picture DrSensor  Â·  5Comments

SiriusStarr picture SiriusStarr  Â·  5Comments

vmchale picture vmchale  Â·  5Comments