Terraform: Embed lua interpreter in terraform

Created on 23 Feb 2016 · 20Comments · Source: hashicorp/terraform

Hey,

I'd like to discuss embedding lua interpreter into terraform. The same has been done by redis, haproxy, and nginx to give few examples. This feature has good use cases in and avoid transforming terraform syntax into ugly turing-complete language.

Use case 1

I'd like to have default for variable computed dynamically instead of static string. For now it's not possible and can be only simulated with "template_file" resource as demonstrated by "etcd_discovery_url" in following config: https://github.com/Capgemini/Apollo/blob/devel/terraform/digitalocean/main.tf

Could we instead allow for dynamic custom variables so following is possible?

variable "etcd_discovery_url_file" {
  default = "#{
    return http.request('https://discovery.etcd.io/new?size=' .. (${var.masters} + ${var.slaves})')
  }"
}

Use case 2

It's like to provide custom logic for configuration for resources. This already known use case as shown in: #1604 #4548 #3358

Instead trying to hack terraform creating new language, let's just allow to embed lua:

resource "aws_instance" "machine" {
  ami = "#{
    if ${var.atlas_enabled} then
      return '${atlas_artifact.machine.id}'
    else
      return '${var.ami_string}'
    end
  }"
}

Code blocks are executed only one time to get defaults for configurations and variables.

As per redis documentation:

All Redis commands must be analyzed before execution to determine which keys the command will operate on. (...) Lua scripts can return a value that is converted from the Lua type to the Redis protocol using a set of conversion rules.

Proposed behavior

Introduce #{lua_code} interpolation syntax similar to variable interpolation ${variable_name} but where content is body of lua function, and value is the "return" value from this function.
In lua code allow to embed terraform variables using usual syntax: ${variable}

This way _terraform knows on what variables functions depend on_, so it knows when to re-execute function (one-time only, to result is cached if input variables don't change).

config enhancement thinking

Source

sheerun

Most helpful comment

I fully agree there are things you can't do in Terraform, that we'd like you to be able to do. The necessity of doing those things does not mean we can't be patient in finding the solution that technically solves the challenges while feeling great as a user. We'll get there, and conditional inclusion of attributes and so on is a real problem, but I won't rush something that _works_ at the expense of waiting for something that is _right_.

EDIT: Edit to add that if templating is what you want to do, you can (and others do) template the "tf" files using any language you desire. :) But we will have some official solution at some point.

mitchellh on 12 Mar 2016

👍2 ❤1

All 20 comments

cc @mitchellh, @phinze.

jen20 on 23 Feb 2016

Also note that cache-invalidation logic is already there as variable interpolation syntax is backward-compatible. The only things to handle are:

Implement is one-time evaluation of #{} block just before saving variable in tfstate
Providing some kind of standard library for lua scripts (like http in example above)
Figuring out "ABI" between terraform and lua, so maybe scripts can return lists/integers as well.

sheerun on 23 Feb 2016

@sheerun this is pretty interesting. Thanks for writing it up.

Are you imagining that the interpolation of terraform variables via ${} is done as a pre-processing step before passing the program to the Lua compiler? That certainly simplifies the problem a bunch, but raises some questions:

Do you think it will be confusing to debug error messages resulting from incorrect use of interpolations? That is, errors in the use of interpolations would actually appear as Lua parse errors, and it may be difficult for the user to understand how that error maps back to the original string they provided?
Would the user be required to use interpolation functions to escape quote characters in their strings? Weird behavior would presumably result if someone were to get this wrong, since their Lua program would be parsed in a very different way than intended.

Do you think it would be too expensive to just unconditionally execute the Lua code each time the string is evaluated, similar to how we deal with regular interpolations? That would simplify matters since the Terraform values could be passed in as a Lua table structure, but I expect it would mean we'd need to prohibit its use to do I/O like your http example and limit it just to simple logic.

Honestly my first reaction to this is that it's a lot of extra complexity and it'd be hard to get the level of abstraction "right" here so that this is intuitive to users and doesn't hurt the maintainability and predictability of Terraform configurations. But it would be interesting to think through the use-cases a little more.

I'd actually started looking at a different take on this a while ago where I'd wrap the Terraform API in a scripting language API (I was looking at JavaScript, but Lua could work just the same way) as a way to allow Terraform's providers and create/update/destroy actions to be re-used when building custom tools and admin scripts. The use-cases I had in mind were things like cleaning up old AMIs (with custom logic to decide what is "old") or pushing around instance/database snapshots. Definitely a different sort of thing than you're proposing here, but has some similar problems around how to build a programmatic actions around Terraform's flow. (I got distracted by other things before I got very far with this

apparentlymart on 23 Feb 2016

I am not a huge fan of adding lua interpreter to terraform.
One of the great things about terraform is that it unifies many technologies and providers into one language. It seems like a detractor from simplification.

BSick7 on 23 Feb 2016

👎1

Thank you for writing such a detailed document. It makes the use cases clear. So please understand I mean no disrespect for anything in this comment. I'd like to address your use cases.

At a high level, I'll say that I'm not a fan of embedding any real language interpreter in Terraform. Instead of substantiating that opinion, let me take a step back and attempt to offer up an alternate solution, or how I see the future playing out to fit your use cases in.

I think we can do this in a much more flexible way without embedding Lua, perhaps at the expense of a slightly worse UX. But for a more complicated feature, I think this is acceptable. And, as you'll see, Terraform's unique failure handling capabilities paper over this poor UX anyways... Moving on...

Case 1: Generating Data

_Use case:_ From above, downloading data and using that for values. Or using if statements to determine a value. etc.

I think with the integration of something along the lines of #4169, what we want to get to is the ability to _set variables_ within the runtime of a Terraform apply. Currently, variables are set once at the beginning of a run, and then never again. Another way this might manifest is being able to control outputs from a null_resource (but in a more elegant way). The more abstract feature: custom generated data at runtime.

With the above as a precursor, I think we can make this ultimately flexible by utilizing the local-exec provisioner: you can local-exec anything and read the output in a variable set at runtime. If this is a Lua script, fine. Ruby script? Sure. Python? Of course! You can use _any language_ to process data and set it as a variable.

So, here is an example (_completely not working, vaporware_):

resource "null_resource" "foo "{
  provisioner "local-exec" "bar" {
    command = "./ami.py ${maybe.an.input}"
  }
}

variable "ami" {
  default = "${null_resource.foo.provisioner.bar.stdout}"
}

resource "aws_instance" "foo" {
  ami = "${var.ami}"
}

A bit verbose, yes. I think we can work on syntactic sugar and I don't think its perfect. But I think the design point is driven. This avoids any opinion from Terraform core and always avoids complexity with a new runtime in the core.

Case 2: Conditionals

Use case: Turn on/off modules or resources, set or don't set certain attributes in a resource.

This of course deserves its own entire design document, which we're doing in #4548 at some point.

However, it is important to note that I don't think we need an embedded lang like Lua for this either. We can use our simple interpreted language for basic conditionals, and the _values_ that those conditionals use can come from dynamic variable setting from Case 1 above.

So, we'd need both features, but both don't require an embedded language.

I'd love to hear feedback on this, but I do want to be honest that I will fight very hard to avoid a Lua interpreter being embedded within Terraform since I do strongly believe the complexity can be avoided. I hope you don't consider this overly hostile, as I'm willing to have a reasoned argument about it. But, I do want to make my side clear.

What do you think?

mitchellh on 24 Feb 2016

👍3

Ah, one more added comment from me. I forgot to touch on UX.

Based on the context of my above comment, I think this is a slightly worse UX than having Lua available directly in Terraform. At worst, the point at which Terraform runs your script may error because you're missing a runtime (such as Python not being installed if you're using Python). However, since Terraform handles partial error cases quite well, it doesn't practically matter that much.

The recourse becomes: install Python, run Terraform again, and it'll continue where it left off and converge successfully.

Given that this is an advanced feature, I think this is a fair tradeoff.

The other point is just syntactic sugar. In my example above I do agree it is fairly ugly. I think we can work on sugar as we refine dynamically setting variables, so please don't bike shed on that. I was just trying to use something as familiar as possible to drive a point.

mitchellh on 24 Feb 2016

@apparentlymart You're right. "interpolation" syntax inside eval blocks could output variable names instead their values. It would solve debugging issues. To give an example:

resource "aws_instance" "machine" {
  ami = "#{
    if ${var.atlas_enabled} then
      return ${atlas_artifact.machine.id}
    else
      return ${var.ami_string}
    end
  }"
}

is transformed to something like:

knownDependencies := ['var.atlas_enabled', 'atlas_artifact.machine.id', 'var.ami_string']

L.Register(L,"",luar.Map{
  "var": getStateVariableForLua // lazily returns lua value for one of knownDependencies 
})

res := L.DoString(`
  if var('var.atlas_enabled') then
    return var('atlas_artifact.machine.id')
  else
    return var('var.ami_string')
  end
`)

return convertLuaVariableToStateValue(res)

@mitchellh Allowing executing external scripts is compelling as well, but I see few issues:

As you mentioned, you need to care about dependencies of scripts used in providers i.e. python/ruby/something interpreter in correct version, as well as dependencies of scripts themselves
Less control over execution of script, so for example lazy evaluation of variables as shown above is not possible when using external languages. All variables need to be evaluated before passing them to external scripts (I'm not sure if lazy evaluation is any useful in terraform though).
The recipes become less readable as for each logic block you need to create separate script file.
When overriding commands from other modules, it's not clear what's the CWD of executed scripts.
It's not along go-lang mindset, where you have one unified scripting environment instead of heterogenous one. Lua as baseline scripting language would result in more reusable terraform modules.

As for my solution, I'm not sure allowing to embed Lua in _any_ value is the optimal option. On the other hand your solution would solve most of use cases as well, but could result in less readable terraform files. Maybe allowing Lua only in variable context could be best of two worlds (no external scripts, arbitrary custom logic, named execution blocks instead of anonymous ones, known execution environment)

variable "ami" {
  command = "
    if ${var.atlas_enabled} then
      return ${atlas_artifact.machine.id}
    else
      return ${var.ami_string}
    end
  "
}

resource "aws_instance" "foo" {
  ami = "${var.ami}"
}

A helper like os.execute could be able to run external scripts if necessary anyway.

variable "author" {
  command = "return os.execute('git config --global user.email')"
}

sheerun on 24 Feb 2016

I thought about it and I maybe the best solution is to introduce concept of "controllers", i.e. scripts that accept some inputs in JSON format and return desired modifications in .hcl or .json format.

For example one could write following controller script:

#!/usr/bin/env ruby
require 'json'
state = JSON.parse(ARGF.read)

if state['atlas_enabled'] == '1'
  echo 'variable "ami" { default = "${atlas_artifact.machine.id}" }'
else
  echo 'variable "ami" { default = "${var.ami_string}" }'
end

that we could use in terraform in following way:

controller "${module.path}/controller" {
  atlas_enabled = "${var.atlas_enabled}"
}

Script is executed each time one of declared variables changes (in this case ${var.atlas_enabled}).

The stdout of script in incorporated intro current terraform plan.

sheerun on 2 Mar 2016

Note that it would probably require some kind of merging logic to provide a way to change configuration of existing resources by controllers... Unless we allow controllers at any level like so:

resource "aws" "something" {
  static = "property"
  controller "bash ${module.path}/controller ${var.servers_count}"
}

echo 'dynamic_property = "$(curl https://discovery.etcd.io/new?size=$1)"'

sheerun on 2 Mar 2016

@mitchellh I may be wrong (very new to Terraform) but I don't think either of your use cases handles something like:

resource "aws_autoscaling_group" "my_asg" {
<% unless ['t2.nano', 't2.micro'].include? var.servers.size -%>
  placement_group = "my_placement_group"
  # Other stuff
<% end -%>
}

Sorry for the ERB, I don't know any Lua. :wink:

Whereby a placement group is only valid if the server's size is > t2.micro. Specifying placement_group _at all_ in any of those situations (t2.micro or t2.nano) throws an AWS API error.

thegranddesign on 12 Mar 2016

I feel like _allowing_ but not _requiring_ the .tf files to be passed through a templating engine prior to being processed is ultimately flexible, allows most users to easily understand what's going on (they're probably used to it from working on their favorite web framework), minimally impacts the Terraform codebase (it's just a build pipeline step (I could be very, very wrong about this :wink:)), and still allows the Terraform team to add declarative syntax where it makes sense.

thegranddesign on 12 Mar 2016

I've worked with systems that use templating for their input configs in this way. I won't name them. I found the experience to be awful. I think we can build something better than that. Plus, if you're templating, why not just write a pre-processor on your own?

I think we can find a way to solve all these problems without that as a solution. Thanks for the input, though.

mitchellh on 12 Mar 2016

@mitchellh Ok, cool. Let me give you a general feeling then, comparing a solution I've recently started learning that uses templating and Terraform.

I recently starting using SaltStack. In my experience thus far, there isn't a single thing that I haven't been able to do. Some of those things took a bit of research, but they were doable. What I was left with was a modular and easy to read configuration where it was clear what was going on.

Like anything, if you do it poorly, you're going to get an unreadable mess, but that's not the fault of using a templating engine.

And although I'm enjoying myself and feel there's no contest between Terraform and CloudFormation in terms of enjoyability, I can't say that I've been able to everything I have wanted to do in Terraform. There are things that I simply cannot do. Especially in terms of readability and modularization.

I think that by not using a templating engine, the Terraform team is going to be required to add logic to Terraform to handle use cases that would be taken care of out of the box with a templating engine.

I guess what I'm saying is: maybe the problem of conditional resources is solved by some new syntax, and then conditional attributes is handled by some new syntax, and then another use case pops up that no one thought of that requires some new syntax, etc, etc.

A templating engine gives the power to your users so that they can get done what they need to get done, even if Terraform doesn't natively support it. A solution built into Terraform requires official support.

Sorry for the long-winded answer. :)

thegranddesign on 12 Mar 2016

EDIT: Edit to add that if templating is what you want to do, you can (and others do) template the "tf" files using any language you desire. :) But we will have some official solution at some point.

mitchellh on 12 Mar 2016

👍2 ❤1

@mitchellh I definitely see what you're saying. I hope you can see from my perspective that, while Terraform core is looking for the "right" solution, the users of Terraform have stuff to get done. You may have the liberty of being more patient than someone who vouched for using Terraform at their company (or to a client) and then hit a wall where they're not able to do what they need to.

Adding support for a templating engine allows your users a lot of flexibility to get stuff done, while waiting for Terraform core to decide if and when they want to support an official feature (like declarative resource inclusion). At which point, once it's added, they can choose whether to rip out their templated solution and replace it with the new shiny.

thegranddesign on 12 Mar 2016

Adding something to the core means we can't remove it without severely hurting backwards compatbility. All the while, for users that _need_ this to get things done, what you're proposing is a simple Ruby script that anyone can write, and doesn't need to burden core with backwards compat concerns.

mitchellh on 12 Mar 2016

@mitchellh this is your show man. You just asked for some perspective from the other side so I wanted to give you mine. :grinning:

thegranddesign on 12 Mar 2016

Yes, and I thanked you above (and still thank you). I just want to make sure that not only you but anyone else who reads this thread understands the motivation. It is important!

mitchellh on 12 Mar 2016

👍2

Hi all! Sorry for the long silence here.

As you may have seen, we recently released Terraform v0.12, which is the culmination of a big set of foundational changes to the configuration language to improve its usability and composability while retaining its declarative essense.

After lots of thought and prototyping, we concluded that investing in Terraform's own language is a better answer than embedding another language interpreter. The tight integration of the Terraform language and the Terraform Core engine help to ensure that all parts of Terraform are using the same concepts, terminology, and type system, and allows Terraform to produce helpful contextual error messages in more cases as opposed to simply exposing opaque runtime errors from another language runtime.

The Terraform language is designed around the principles of declarative programming and intended to read like a manifest of what exists (or should exist) rather than like a sequence of instructions. A consequence of that design foundation is that some familiar constructs imperative programming languages are not available, and in particular computation is pushed to the "edges" of the configuration (attribute values) rather than supporting control structures as wrappers as we see in imperative languages.

With that said, we do intend to continue to invest in the Terraform language to support additional use-cases, while retaining the language's declarative essense. We don't plan to integrate any other languages -- particularly imperative scripting languages -- directly into the Terraform language, because that is likely to hurt readability and predictability, and introduce a number of incongruent or overlapping concepts. (For example, Lua has an idea of local which is distinct from Terraform's idea of local.)

Regarding the two use-cases described in the original issue:

Use-case 1 is expressed in a pretty generic way so it's tough to identify a specific Terraform feature that it would relate to. However, the idea of data sources was introduced in Terraform 0.7 as a way to incorporate external data into a Terraform configuration without bringing objects under full Terraform management, so the given example can now be handled using the http data source:

data "http" "example" {
  url = "https://discovery.etcd.io/new?size=${var.leaders + var.followers}"
}

We also introduced the external data source as a way to include results from outside programs in a Terraform configuration without writing a full Terraform provider. As I write this, its interface is currently restricted by the limitations of the Terraform 0.11-compatible provider SDK, but we plan to generalize it to support arbitrary JSON-serializable data in a future release once the Terraform v0.11 provider protocol has been phased out.

Use-case 2 can be addressed using the operators built in to the Terraform language, whose behavior has been generalized and improved in Terraform 0.12. For the specific example shown, it could be expressed like this:

resource "aws_instance" "machine" {
  ami = var.atlas_enabled ? atlas_artifact.machine.id : var.ami_string
}

Because we do not intend to integrate a Lua interpreter into Terraform as proposed here, I'm going to close this out.

We hesitated to close this for a long time because we were loathe to close it without being able to show both of the use-cases being addressed, but because use-case 1 is so broad it's tough to define when it's "done". If you have a use-case that cannot be met with the current language features and available data sources across providers, please do feel free to open a new Feature Request issue!

Thanks for the great discussion on this before, and sorry again that this sat in silence for so long.

apparentlymart on 6 Jun 2019

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.