Terraform: support for data resources returning multiple entities?

Created on 15 Nov 2016 · 11Comments · Source: hashicorp/terraform

I would think this would be handy / more succinct. Was this considered?

data "aws_subnet" "public" {
  tag {
    status = "public"
  }
}

resource "aws_alb" "test" {
  subnets = ["${data.aws_subnet.public.*.id}"]
}

enhancement provideaws

Source

chadgrant

👍30

Most helpful comment

I was able to work around this sorta if anyone is interested ...

data "aws_subnet" "privates" {
  availability_zone = "${element(split(",", var.aws_availability_zones), count.index)}"
  vpc_id            = "${data.aws_vpc.current.id}"
  state             = "available"

  tags {
    "Name" = "${var.environment}-private-${element(split(",", var.aws_availability_zones), count.index)}"
  }

  count = "${length(split(",", var.aws_availability_zones))}"
}

resource "aws_autoscaling_group" "service" {
  name                  = "${var.environment_short_name}-${var.application}"

  availability_zones    = ["${split(",", var.aws_availability_zones)}"]
  vpc_zone_identifier   = ["${data.aws_subnet.privates.*.id}"]
}

chadgrant on 3 Dec 2016

👍25 🎉9 ❤8

All 11 comments

Hey @chadgrant – I can't speak to if this suggestion was considered or not, but it does seem useful 😄

It looks like our other data sources have taken a different route and have separate data sources when dealing with multiple, e.g.:

aws_availability_zone
aws_availability_zones

I think the difficulty with the way you've shown is that it becomes hard to work with a specific subnet, in the event you only want to interact with one.

That said, aws_subnets sounds like a logical addition! I can't promise anything, but I'll add it to our list

catsby on 15 Nov 2016

👍3

As a followup: to reply to the general question of data sources returning multiple , and not just specifically aws_subnet, I think the granularity we have now is by design. I believe that it is unlikely that we would modify existing data sources or encourage future data sources to be both multiple and singular in the same resource.

catsby on 15 Nov 2016

I can speak a little to the motivation here, as the implementer of aws_subnet and other similar data sources:

Terraform currently lacks good support for lists of complex types, so a data source that returned multiple objects (such as subnets, in this case) would end up being hard to use. Thus to solve the "80% case" we went with singleton data sources for now. My intent was that we'd revisit this decision later should Terraform grow better support for working with complex lists.

The specific formulation given in the original comment here is for the data source to act as if an implicit count is present that causes it to act as if there's a separate resource instance for each object found. This is an interesting idea and thus specific design didn't occur to me, but I think it conflicts with the existing support for count and so it would require some careful thought to implement. Perhaps that is what future support for this would look like, but the future is to cloudy for me to commit to that right now. :grinning:

The compromise of having both aws_availability_zone (for the details of one) and aws_availability_zones (for the list of names that can be passed into the former) is a bit of an experiment, and it's honestly not incredibly useful yet, until #7762 is resolved to allow the use of the latter to instantiate multiple instances of the former, to get a result similar to what you were proposing.

I do agree that being able to say "for each subnet matching this query..." would be useful, but I think we must get there by implementing some more fundamental core Terraform features first and then seeing how this fits in with those features in place.

apparentlymart on 15 Nov 2016

👍5

I was able to work around this sorta if anyone is interested ...

data "aws_subnet" "privates" {
  availability_zone = "${element(split(",", var.aws_availability_zones), count.index)}"
  vpc_id            = "${data.aws_vpc.current.id}"
  state             = "available"

  tags {
    "Name" = "${var.environment}-private-${element(split(",", var.aws_availability_zones), count.index)}"
  }

  count = "${length(split(",", var.aws_availability_zones))}"
}

resource "aws_autoscaling_group" "service" {
  name                  = "${var.environment_short_name}-${var.application}"

  availability_zones    = ["${split(",", var.aws_availability_zones)}"]
  vpc_zone_identifier   = ["${data.aws_subnet.privates.*.id}"]
}

chadgrant on 3 Dec 2016

👍25 🎉9 ❤8

+1 to new resources returning a list.
We have a few dozen application services which we manage with the same TF config.
In order to achieve that, we maintain a lookup table which lists all security groups that should be assigned to specific service. This lookup table is stored in consul and accessed through "consul_keys" data source.
We would love to replace it with something like data "aws_security_groups" filtered by tags.

gerilya on 19 Dec 2016

👍3

Thanks @chadgrant, that idea with interating over the datasource just saved my posterior.

jangrewe on 21 Feb 2017

I've run into a problem where this would help as well. In my case I'm trying to select EBS volumes from a pool of available volumes. @apparentlymart this doesn't need to be some complex rethinking of how we handle data resources. I think this would be fairly simple to address just by adding something like an index parameter, and would solve the use cases above. It wouldn't require the resources to act any different, and when combined with a count= parameter, would be a powerful way to select multiple individual objects returned from the search.

Imagine this use case. Currently if we use count on a data source and don't change the filter, then all of the data objects end up containing the same data:

Example:

data "aws_ebs_volume" "pool" {
  most_recent = true
  count = "3"
  filter {
    name = "tag:Name"
    values = ["data-volume"]
  }
  filter {
    name = "attachment.status"
    values = ["detached"]
  }
}
output "volumes" { value = ["${data.aws_ebs_volume.pool.*.id}"] }

Gives output like:

Outputs:
volumes = [
    vol-abcdef05,  <--- most recent volume
    vol-abcdef05,  <--- most recent volume, again
    vol-abcdef05   <--- and again
]

But if terraform simply supported an index meta-parameter for data sources, we could select a different entry for each iteration of the data resource.

Example:

data "aws_ebs_volume" "pool" {
  count = "3"
  index = "${count.index}"   <--- which entry to return
  filter {
    name = "tag:Name"
    values = ["data-volume"]
  }
  filter {
    name = "attachment.status"
    values = ["detached"]
  }
}
output "volumes" { value = ["${data.aws_ebs_volume.pool.*.id}"] }

Would return outputs like this, (assuming most recent to least recent sorting)

Outputs:
volumes = [
    vol-abcdef05,  <--- most recent volume
    vol-abcdef04,  <--- next most recent volume
    vol-abcdef03   <--- third most recent volume
]

Which makes a lot more sense than returning the same resource over and over again.

I feel like the most_recent = true parameter is already just a shortcut for index = 0 with ordering by most recent. While simple and elegant, there's probably ways to make it even more powerful, like adding an option to wrap similar to the element() function (instead of erroring out when no more results are available), or returning items from the other end of the list by using negative indexes. An option to specify how the list is ordered would be a nice-to-have improvement too, but none of them are essential.

The only thing we're really missing is the ability to specify the index. I'm kind of surprised that something like index isn't already supported for data sources.

Moeser on 16 Mar 2017

@apparentlymart Just following up with my post above. Maybe this is a simple change like the one I linked here: https://github.com/Moeser/terraform/pull/1

Moeser on 8 Apr 2017

This seems to also be useful when gathering resources that were created by a cloudformation stack.

cirocosta on 24 Apr 2017

This seems to also be useful when gathering resources that were created by a cloudformation stack.

Or as in my case I have to select resources created by kops inside my AWS setup.

soupdiver on 17 Sep 2018

👍2

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.