Dhall-haskell: Discuss safety options for Dhall

Created on 13 Oct 2017  Â·  8Comments  Â·  Source: dhall-lang/dhall-haskell

The purpose of this issue is to discuss what the command-line API for the dhall/dhall-* executables should be when users want to disable certain Dhall features for additional safety

The most common requests I've gotten for how to improve safety are:

  • Import-related safety:

    • Option 1: Disable all imports except ipfs imports



      • Disable all URLs except https://ipfs.io/*


      • Disable all paths except /ipfs/* (for use in conjunction with ipfs mount)


      • Disable all environment variable imports


      • This permits ease of distribution of Dhall libraries (like the Prelude)


      • This makes Dhall expressions mostly pure





        • https://ipfs.io still vulnerable to network-related interference (i.e. DNS changes)



        • Nothing enforces that /ipfs is used by ipfs mount






    • Option 2: Disable all remote imports



      • this prevents Dhall as a denial-of-service vector


      • this prevents Dhall from leaking information through GET requests


      • this prevents Dhall from leaking information through support for custom headers



    • Option 3: Disable all imports



      • this makes Dhall 100% pure


      • this prevents Dhall from being tricked into an infinite import loop by malicious hosts





        • i.e. http://example.com/1 → http://example.com/2 → http://example.com/3 → ...






  • Normalization-related safety:

    • Option 4: Enforce that the input expression must already be in normal form



      • This gives a linear upper bound on how long it takes to interpret a configuration file


      • People who prefer this option either:





        • Don't want to use Dhall as a programming language at all, or:



        • Want to resolve/normalize Dhall expressions ahead-of-time






Mitigating factors:

  • Dhall has one built-in safety feature, which is that remote imports cannot reference local imports (i.e paths and environment variables)
  • The only IO that Dhall permits is imports

The Dhall API already supports all of the above features and they can all be implemented in linear time. The main question is what should the command-line API support and what should the granularity of the API be

For example, one extreme is that the compiler just adds one new --safe flag which enforces the maximum possible safety by banning all imports and enforcing that the expression is in normal form. This is simple but some users might want a solution with an intermediate amount of safety.

Another extreme is that we add four new flags (for each of the above options), which bloats the command-line API but satisfies everybody's use cases (that I'm aware of)

Most helpful comment

@Profpatsch: --import-local-only is fairly straightforward to implement and standardize. I would only suggest renaming it to --local for conciseness.

I would also suggest renaming --import-normalized-only to --normalized, and clarifying that this changes the normalization phase of interpretation to only check that the expression (which has already been fully-resolved and type-checked) is in normal form. The reason for this clarification is:

  • to ensure that integrity checks (which also require a normalization phase) will fail if the imported expression is not normalized
  • to ensure that imported expressions can still import other expressions as long as the fully resolved expression has no β-reducible expressions

So I think the combination of those two flags plus the existing support for semantic hashes covers almost all of the needs addressed in the original post:

  • IPFS-based imports are superseded by semantic hashing. IPFS is still a convenient distribution mechanism, but semantic hashing becomes the authoritative way to guarantee expression integrity. In the future, Dhall might even use the semantic hashes to provide a Dhall-native distribution system where the semantic hashes are the addresses of imports (i.e. you could have an import that was just sha256:... and Dhall could resolve it if somebody else had evaluated it before)
  • Disabling all remote imports is covered by --local
  • Normalization safety is covered by --normalized

The only thing that's not covered is:

  • Disabling all imports (including local imports)

However, I think it's fine to always support local imports since Dhall is still fundamentally a configuration language and it wouldn't make much sense to have a configuration language without any configuration files. Also, local imports are the only mechanism for defining "type synonyms" in Dhall.

All 8 comments

So the first conclusion I've come to is that there is a better approach for improving the safety of imports. We can add to the language an optional integrity check for imports to ensure that they match the expected contents. For example, you could write:

./path/to/import sha256 fe42006425ceda5423226ffd5829e713a1ba1011b69b7fe231794d16ecb0be27

... and that would ensure that the imported expression has the expected hash

The benefit of this approach is that it ensures that the import is protected regardless of how insecure the import channel is. It's similar in spirit to an IPFS import except the integrity check is built-in to Dhall and can't be spoofed by impersonating the IPFS url through networking-related attacks.

Note that this still does not protect against Dhall being used as a denial-of-service vector, nor does it protect against Dhall leaking information, but I believe this is the best solution for resolving people's concerns about code injection from a malicious user

The main things to decide are:

  • What message digest algorithm to use for the integrity check?

    • I'm inclined to pick sha256 since it's the message digest algorithm with the shortest key for which there are no known attacks

  • What are the contents that the hash algorithm hashes?

    • Does it hash the text of the import or a serialized representation of the abstract syntax tree?

My inclination is for the hash algorithm to hash the text of the import, for two main reasons:

  • Dhall does not need to parse the expression in order to verify the import's integrity

    • This eliminates the possibility of an attacker exploiting vulnerabilities or inefficiencies in Dhall's parser

  • It's simpler to implement and standardize
  • You can compute the hash without using Dhall (i.e. using normal command-line tools)

If there are no objections then I'll add this as a new language feature

So after thinking about this more I believe that Dhall actually should hash a serialized representation of the normal form and not the raw file contents

The main reason for my reversal was realizing that hashing the raw contents was pretty useless since the imported file can still have impure non-hashed imports of its own. Hashing the fully resolved and normalized representation gives you a much stronger guarantee that the meaning of the imported program has not changed

I’d vote for two (orthogonal) “acceptance” features: --import-local-only and --import-normalized-only.

The “meaning” part is already solved by hashes, as hashing the normalized form is basically the ultimate form of a fixed input.
Then there’s only two kinds of attacks remaining, a CPU-/time-based attack by non-normalized inputs and a network-/time-based attack by slowly replying remote hosts. The hashes already mitigate the “malicious remote content” vector.

The functionality should also be required by the standard in my opinion, every client library must be able to turn these features off by request of the user.

@Profpatsch: --import-local-only is fairly straightforward to implement and standardize. I would only suggest renaming it to --local for conciseness.

I would also suggest renaming --import-normalized-only to --normalized, and clarifying that this changes the normalization phase of interpretation to only check that the expression (which has already been fully-resolved and type-checked) is in normal form. The reason for this clarification is:

  • to ensure that integrity checks (which also require a normalization phase) will fail if the imported expression is not normalized
  • to ensure that imported expressions can still import other expressions as long as the fully resolved expression has no β-reducible expressions

So I think the combination of those two flags plus the existing support for semantic hashes covers almost all of the needs addressed in the original post:

  • IPFS-based imports are superseded by semantic hashing. IPFS is still a convenient distribution mechanism, but semantic hashing becomes the authoritative way to guarantee expression integrity. In the future, Dhall might even use the semantic hashes to provide a Dhall-native distribution system where the semantic hashes are the addresses of imports (i.e. you could have an import that was just sha256:... and Dhall could resolve it if somebody else had evaluated it before)
  • Disabling all remote imports is covered by --local
  • Normalization safety is covered by --normalized

The only thing that's not covered is:

  • Disabling all imports (including local imports)

However, I think it's fine to always support local imports since Dhall is still fundamentally a configuration language and it wouldn't make much sense to have a configuration language without any configuration files. Also, local imports are the only mechanism for defining "type synonyms" in Dhall.

So I took a stab at implementing --normalized and ran into one issue, which I can summarize with this tweet:

I'm going to hold off on implementing that flag until I know for sure the answer to that question

Also, I forgot that Dhall already provides protection against information leaks via custom headers. Specifically, if you assume that local files are trusted and remote files are untrusted, then an untrusted import can't trick you into leaking local secrets, thanks to the referential opacity check here:

https://github.com/dhall-lang/dhall-haskell/blob/f1c28e65c911a7f7cee922fb87a1a3fd5e9920c5/src/Dhall/Import.hs#L782-L784

For example, consider the following header configuration from the tutorial:

[ { header = "Authorization", value = "token ${env:GITHUB_TOKEN as Text}" } ]

This is an example of a custom header which contains a secret. In some cases, this is desired behavior (i.e. you want to authorize access to a private GitHub repository as in the above example), but in other cases this could be a secret leak if the above set of headers were computed from an untrusted source.

However, Dhall prevents a remote import from computing a value that contains local imports (thanks to the referential opacity check). Interestingly, this is a Dhall feature that originates from Morte (since Dhall originated as a fork of Morte), and the original motivation for this feature wasn't security but rather sanity: local references like environment variables don't even make sense in the context of a remote import. The fact that this doubles as a security feature is just a bonus.

You can see an example of this in action by saving the above Dhall expression to a remote URL such as this one:

http://lpaste.net/raw/360855

... and then try to use that remote import to customize another request:

$ GITHUB_TOKEN=SECRET dhall <<< 'https://httpbin.org/headers using http://lpaste.net/raw/360855 as Text'

↳ https://httpbin.org/headers using http://lpaste.net/raw/360855  as Text

↳ http://lpaste.net/raw/360855 

Referentially opaque import: env:GITHUB_TOKEN as Text

... and dhall rejects the request for the GITHUB_TOKEN because the request originated in the context of a remote import. However, if you instead save the same header configuration to a local file, such as ./headers.dhall, then Dhall fulfills the request:

$ GITHUB_TOKEN=SECRET dhall <<< 'https://httpbin.org/headers using ./headers.dhall as Text'
Text

"{\n  \"headers\": {\n    \"Accept-Encoding\": \"gzip\", \n    \"Authorization\": \"token SECRET\", \n    \"Connection\": \"close\", \n    \"Host\": \"httpbin.org\"\n  }\n}\n"

I should mention that since this issue was created the dhall executable now provides dhall type and dhall normalize subcommands so that you can completely disable import resolution if you want

I will go ahead and close this since I think we've implemented everything we intended regarding import safety. The key features are:

  • The dhall package can compile out support for remote imports using the -f -with-http cabal configure flag
  • The dhall command now provides dhall type and dhall normalize subcommands which disallow import resolution
  • Dhall's import resolution system now supports semantic integrity checks which protect remote imports from tampering
Was this page helpful?
0 / 5 - 0 ratings

Related issues

mgajda picture mgajda  Â·  3Comments

Gabriel439 picture Gabriel439  Â·  7Comments

akshaymankar picture akshaymankar  Â·  3Comments

SiriusStarr picture SiriusStarr  Â·  5Comments

vmchale picture vmchale  Â·  5Comments