The purpose of this issue is to discuss what the command-line API for the dhall/dhall-* executables should be when users want to disable certain Dhall features for additional safety
The most common requests I've gotten for how to improve safety are:
ipfs importshttps://ipfs.io/*/ipfs/* (for use in conjunction with ipfs mount)https://ipfs.io still vulnerable to network-related interference (i.e. DNS changes)/ipfs is used by ipfs mountGET requestshttp://example.com/1 → http://example.com/2 → http://example.com/3 → ...Mitigating factors:
The Dhall API already supports all of the above features and they can all be implemented in linear time. The main question is what should the command-line API support and what should the granularity of the API be
For example, one extreme is that the compiler just adds one new --safe flag which enforces the maximum possible safety by banning all imports and enforcing that the expression is in normal form. This is simple but some users might want a solution with an intermediate amount of safety.
Another extreme is that we add four new flags (for each of the above options), which bloats the command-line API but satisfies everybody's use cases (that I'm aware of)
So the first conclusion I've come to is that there is a better approach for improving the safety of imports. We can add to the language an optional integrity check for imports to ensure that they match the expected contents. For example, you could write:
./path/to/import sha256 fe42006425ceda5423226ffd5829e713a1ba1011b69b7fe231794d16ecb0be27
... and that would ensure that the imported expression has the expected hash
The benefit of this approach is that it ensures that the import is protected regardless of how insecure the import channel is. It's similar in spirit to an IPFS import except the integrity check is built-in to Dhall and can't be spoofed by impersonating the IPFS url through networking-related attacks.
Note that this still does not protect against Dhall being used as a denial-of-service vector, nor does it protect against Dhall leaking information, but I believe this is the best solution for resolving people's concerns about code injection from a malicious user
The main things to decide are:
sha256 since it's the message digest algorithm with the shortest key for which there are no known attacksMy inclination is for the hash algorithm to hash the text of the import, for two main reasons:
If there are no objections then I'll add this as a new language feature
So after thinking about this more I believe that Dhall actually should hash a serialized representation of the normal form and not the raw file contents
The main reason for my reversal was realizing that hashing the raw contents was pretty useless since the imported file can still have impure non-hashed imports of its own. Hashing the fully resolved and normalized representation gives you a much stronger guarantee that the meaning of the imported program has not changed
I’d vote for two (orthogonal) “acceptance” features: --import-local-only and --import-normalized-only.
The “meaning” part is already solved by hashes, as hashing the normalized form is basically the ultimate form of a fixed input.
Then there’s only two kinds of attacks remaining, a CPU-/time-based attack by non-normalized inputs and a network-/time-based attack by slowly replying remote hosts. The hashes already mitigate the “malicious remote content” vector.
The functionality should also be required by the standard in my opinion, every client library must be able to turn these features off by request of the user.
@Profpatsch: --import-local-only is fairly straightforward to implement and standardize. I would only suggest renaming it to --local for conciseness.
I would also suggest renaming --import-normalized-only to --normalized, and clarifying that this changes the normalization phase of interpretation to only check that the expression (which has already been fully-resolved and type-checked) is in normal form. The reason for this clarification is:
So I think the combination of those two flags plus the existing support for semantic hashes covers almost all of the needs addressed in the original post:
sha256:... and Dhall could resolve it if somebody else had evaluated it before)--local--normalizedThe only thing that's not covered is:
However, I think it's fine to always support local imports since Dhall is still fundamentally a configuration language and it wouldn't make much sense to have a configuration language without any configuration files. Also, local imports are the only mechanism for defining "type synonyms" in Dhall.
So I took a stab at implementing --normalized and ran into one issue, which I can summarize with this tweet:
Question for PLT people: if a System Fω expression is in normal form, can you type-check that expression in linear time (relative to the size of the expression, including types of bound variables)
— Gabriel Gonzalez (@GabrielG439) December 16, 2017
I'm going to hold off on implementing that flag until I know for sure the answer to that question
Also, I forgot that Dhall already provides protection against information leaks via custom headers. Specifically, if you assume that local files are trusted and remote files are untrusted, then an untrusted import can't trick you into leaking local secrets, thanks to the referential opacity check here:
For example, consider the following header configuration from the tutorial:
[ { header = "Authorization", value = "token ${env:GITHUB_TOKEN as Text}" } ]
This is an example of a custom header which contains a secret. In some cases, this is desired behavior (i.e. you want to authorize access to a private GitHub repository as in the above example), but in other cases this could be a secret leak if the above set of headers were computed from an untrusted source.
However, Dhall prevents a remote import from computing a value that contains local imports (thanks to the referential opacity check). Interestingly, this is a Dhall feature that originates from Morte (since Dhall originated as a fork of Morte), and the original motivation for this feature wasn't security but rather sanity: local references like environment variables don't even make sense in the context of a remote import. The fact that this doubles as a security feature is just a bonus.
You can see an example of this in action by saving the above Dhall expression to a remote URL such as this one:
... and then try to use that remote import to customize another request:
$ GITHUB_TOKEN=SECRET dhall <<< 'https://httpbin.org/headers using http://lpaste.net/raw/360855 as Text'
↳ https://httpbin.org/headers using http://lpaste.net/raw/360855 as Text
↳ http://lpaste.net/raw/360855
Referentially opaque import: env:GITHUB_TOKEN as Text
... and dhall rejects the request for the GITHUB_TOKEN because the request originated in the context of a remote import. However, if you instead save the same header configuration to a local file, such as ./headers.dhall, then Dhall fulfills the request:
$ GITHUB_TOKEN=SECRET dhall <<< 'https://httpbin.org/headers using ./headers.dhall as Text'
Text
"{\n \"headers\": {\n \"Accept-Encoding\": \"gzip\", \n \"Authorization\": \"token SECRET\", \n \"Connection\": \"close\", \n \"Host\": \"httpbin.org\"\n }\n}\n"
I should mention that since this issue was created the dhall executable now provides dhall type and dhall normalize subcommands so that you can completely disable import resolution if you want
I will go ahead and close this since I think we've implemented everything we intended regarding import safety. The key features are:
dhall package can compile out support for remote imports using the -f -with-http cabal configure flagdhall command now provides dhall type and dhall normalize subcommands which disallow import resolution
Most helpful comment
@Profpatsch:
--import-local-onlyis fairly straightforward to implement and standardize. I would only suggest renaming it to--localfor conciseness.I would also suggest renaming
--import-normalized-onlyto--normalized, and clarifying that this changes the normalization phase of interpretation to only check that the expression (which has already been fully-resolved and type-checked) is in normal form. The reason for this clarification is:So I think the combination of those two flags plus the existing support for semantic hashes covers almost all of the needs addressed in the original post:
sha256:...and Dhall could resolve it if somebody else had evaluated it before)--local--normalizedThe only thing that's not covered is:
However, I think it's fine to always support local imports since Dhall is still fundamentally a configuration language and it wouldn't make much sense to have a configuration language without any configuration files. Also, local imports are the only mechanism for defining "type synonyms" in Dhall.