Terraform: Allow terraform resources to be data driven from CSV files

Created on 22 Sep 2017  ยท  5Comments  ยท  Source: hashicorp/terraform

Terraform all versions

Terraform is great (kudos, praise etc.), but if you have to work with a large amount of similarly typed resources, the terraform approach contains much boilerplate, copy+paste and is quite suboptimal. Take the example of aws_security_group_rules for example, there are 19 specific fw rules to get a windows server cooperating with a domain controller alone - having to write many many resource declarations just to accomplish a simple firewall set got me thinking that for some things, there is so much code you have to write compared to how we used to manage firewalls.

What would really really help here, is the ability to specify a CSV file, and data-drive a number of terraform resources. I'm imagining something like this:

resource "aws_security_group_rule" "windows server_rules" {
    security_group_id = "${aws_security_group.windows_server.id}"
    data_set = "${file("./windows.fw.csv")}"
}

and then a nice CSV like:

type,protocol,from,to,cidr_blocks,comment
ingress,icmp,8,0,10.0.0.0/8,ICMP echo request
egress,icmp,8,0,10.0.0.0/8,ICMP echo request
ingress,tcp,135,135,10.1.1.0/24,RPC domain controllers
ingress,tcp,1024,65535,10.1.1.0/24,RPC dynamic domain controllers
egress,tcp,389,389,10.1.1.0/24,LDAP domain controllers
... etc.

This makes auditing your fw rules a lot easier (in CSV format), and much more understandable as you dont have to repeat the boilerplate that is terraform resource blocks for each one. There is no reason that ${var.foo} and any such ${} resource type links could not be included in the CSV.

This then got me thinking that the same approach would work for instances, ebs vols, anything really and allow for substantially reduced codebase which is ++++good.

Can this already be achieved in terraform using lists and maps? Not really - Although you could define lists or maps to cover all of the above, and use count() to be able to iterate through them, it would be kind of clunky, and furthermore there is significant problem - unfortunately, if you decide to delete a middle resource from the list (e.g. number 10 in your list of 19 firewall rules) - terraform plan/apply will want to destroy/change all the rules in the list after the one you changed, as it does not handle key based resources, but only handles indexed based resources.

I'm not sure anybody has proposed this before, but I couldn't find anything - anything to reduce vast amounts of terraform boilerplate code would be great.

config enhancement

Most helpful comment

Hi all! Sorry for the long silence here.

For Terraform v0.12 we've added a csvdecode function to parse a CSV string and return a list of maps of strings. I tried the following test configuration on Terraform v0.12.0-alpha1, based on the example in the opening comment on this issue:

provider "aws" {
  region = "us-west-2"
}

locals {
  security_group_rules = csvdecode(file("${path.module}/security-group-rules.csv"))
}

resource "aws_security_group_rule" "windows_server_rules" {
  count = length(local.security_group_rules)

  security_group_id = "sg-1234"
  type              = local.security_group_rules[count.index].type
  protocol          = local.security_group_rules[count.index].protocol
  from_port         = local.security_group_rules[count.index].from
  to_port           = local.security_group_rules[count.index].to
  cidr_blocks       = [local.security_group_rules[count.index].cidr_blocks]
  description       = local.security_group_rules[count.index].comment
}

I tried that with the CSV file given in the opening comment:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_security_group_rule.windows_server_rules[0] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "ICMP echo request"
      + from_port                = 8
      + id                       = (known after apply)
      + protocol                 = "icmp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 0
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[1] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "ICMP echo request"
      + from_port                = 8
      + id                       = (known after apply)
      + protocol                 = "icmp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 0
      + type                     = "egress"
    }

  # aws_security_group_rule.windows_server_rules[2] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "RPC domain controllers"
      + from_port                = 135
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 135
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[3] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "RPC dynamic domain controllers"
      + from_port                = 1024
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 65535
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[4] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "LDAP domain controllers"
      + from_port                = 389
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 389
      + type                     = "egress"
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Since for_each on resources (#17179) is not yet included in v0.12.0 we need to do this with count for now, which has the usual consequences if a new record is introduced into the CSV anywhere except the end. #17179 will come in a later release, which will allow use-cases like this to be addressed more robustly.

Since that for_each functionality is already covered by another issue, and we've addressed the core request here of using CSV data in Terraform configuration, I'm going to close this out. Thanks again for this feature request, and thanks for your patience while we laid the groundwork to make this possible.

All 5 comments

You could create a module whose sole purpose is to have an output of the lists/maps you need to drive your SG settings. Then have another module use that one to actually create the basic firewall settings that you need which can be included with any new instance provisioning.

Hi @gtmtech! Thanks for this suggestion.

We're currently working on improvements to the configuration language that make it easier to support data structures and parsing functions, such as a jsondecode function. My first instinct here would be to also add a csvdecode function that returns a list of objects based on the rows in the CSV, and then couple that with either today's support for count on a resource, or the later planned foreach meta-argument that will make that easier to use.

With all of that in place, this could look like the following:

# DESIGN SKETCH: not currently implemented

resource "aws_security_group_rule" "windows server_rules" {
  foreach = "${csvdecode(file("${path.module}/windows.fw.csv"))}"

  security_group_id = "${aws_security_group.windows_server.id}"

  type     = "${foreach.value.type}"
  protocol = "${foreach.value.protocol}"
  # (etc)
}

This foreach feature is also intended to help address the problem you mentioned where adding and removing items in the middle of the list causes everything else to "shift". In my above example that would still be true, but this could be avoided by putting some a unique key field in the CSV file to use as the identifier for each resource, rather than using the list index as the key.

We have some work to do before we can implement this function, but that work is already in progress and I think the incremental work to implement this additional csvdecode function is pretty small once we've reached the point that jsondecode can also be implemented.

We would probably not directly allow interpolation within a CSV file, since that creates some tricky problems with the dependency graph, but it would in principle be possible to use the template_file data source to _template_ a CSV file, particularly if we were to also add a helper function to create safely-quoted/escaped values to ensure that interpolations don't produce invalid CSV.

I came up with a "not quite perfect" way to consume CSV files and create resources from them using either an external data source, or a null data source with heavy interpolation. The writeup is here along with working sample code if anyone is interested in trying something that could work until this is more properly resolved.

Hi all! Sorry for the long silence here.

For Terraform v0.12 we've added a csvdecode function to parse a CSV string and return a list of maps of strings. I tried the following test configuration on Terraform v0.12.0-alpha1, based on the example in the opening comment on this issue:

provider "aws" {
  region = "us-west-2"
}

locals {
  security_group_rules = csvdecode(file("${path.module}/security-group-rules.csv"))
}

resource "aws_security_group_rule" "windows_server_rules" {
  count = length(local.security_group_rules)

  security_group_id = "sg-1234"
  type              = local.security_group_rules[count.index].type
  protocol          = local.security_group_rules[count.index].protocol
  from_port         = local.security_group_rules[count.index].from
  to_port           = local.security_group_rules[count.index].to
  cidr_blocks       = [local.security_group_rules[count.index].cidr_blocks]
  description       = local.security_group_rules[count.index].comment
}

I tried that with the CSV file given in the opening comment:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_security_group_rule.windows_server_rules[0] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "ICMP echo request"
      + from_port                = 8
      + id                       = (known after apply)
      + protocol                 = "icmp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 0
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[1] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.0.0.0/8",
        ]
      + description              = "ICMP echo request"
      + from_port                = 8
      + id                       = (known after apply)
      + protocol                 = "icmp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 0
      + type                     = "egress"
    }

  # aws_security_group_rule.windows_server_rules[2] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "RPC domain controllers"
      + from_port                = 135
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 135
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[3] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "RPC dynamic domain controllers"
      + from_port                = 1024
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 65535
      + type                     = "ingress"
    }

  # aws_security_group_rule.windows_server_rules[4] will be created
  + resource "aws_security_group_rule" "windows_server_rules" {
      + cidr_blocks              = [
          + "10.1.1.0/24",
        ]
      + description              = "LDAP domain controllers"
      + from_port                = 389
      + id                       = (known after apply)
      + protocol                 = "tcp"
      + security_group_id        = "sg-1234"
      + self                     = false
      + source_security_group_id = (known after apply)
      + to_port                  = 389
      + type                     = "egress"
    }

Plan: 5 to add, 0 to change, 0 to destroy.

Since for_each on resources (#17179) is not yet included in v0.12.0 we need to do this with count for now, which has the usual consequences if a new record is introduced into the CSV anywhere except the end. #17179 will come in a later release, which will allow use-cases like this to be addressed more robustly.

Since that for_each functionality is already covered by another issue, and we've addressed the core request here of using CSV data in Terraform configuration, I'm going to close this out. Thanks again for this feature request, and thanks for your patience while we laid the groundwork to make this possible.

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Was this page helpful?
0 / 5 - 0 ratings