Terraform: aws_rds_cluster does not honor availability_zones parameter

Created on 4 Nov 2015 · 15Comments · Source: hashicorp/terraform

Hi,

It seems that the resource 'aws_rds_cluster' does not honor the availability_zones specified when creating.

See this example. First I execute the plan and as you can see it will create 'aws_rds_cluster.database' with 2 availability_zones configured ("eu-west-1b" and "eu-west-1a"):

$ terraform plan -module-depth=-1
[...]
+ aws_db_subnet_group.db_subnet_group
    description:           "" => "Subnet group for DB test98"
    name:                  "" => "test98_subnet_group"
    subnet_ids.#:          "" => "2"
    subnet_ids.2422430863: "" => "subnet-5bd0843e"
    subnet_ids.3500990608: "" => "subnet-5983ff2e"
    tags.#:                "" => "2"
    tags.Name:             "" => "test98_subnet_group"

+ aws_rds_cluster.database
    apply_immediately:                 "" => "<computed>"
    availability_zones.#:              "" => "2"
    availability_zones.1924028850:     "" => "eu-west-1b"
    availability_zones.3953592328:     "" => "eu-west-1a"
    cluster_identifier:                "" => "test98"
    cluster_members.#:                 "" => "<computed>"
    database_name:                     "" => "mydb
    db_subnet_group_name:              "" => "${aws_db_subnet_group.db_subnet_group.id}"
    endpoint:                          "" => "<computed>"
    engine:                            "" => "<computed>"
    master_password:                   "" => "XXXXXXXX"
    master_username:                   "" => "root"
    port:                              "" => "<computed>"
    vpc_security_group_ids.#:          "" => "1"
    vpc_security_group_ids.1287702605: "" => "sg-12345678"


Plan: 2 to add, 0 to change, 0 to destroy.

Then I apply, and it seems it honored the availability_zones setting:

$ terraform apply
aws_db_subnet_group.db_subnet_group: Creating...
  description:           "" => "Subnet group for DB test98"
  name:                  "" => "test98_subnet_group"
  subnet_ids.#:          "" => "2"
  subnet_ids.2422430863: "" => "subnet-5bd0843e"
  subnet_ids.3500990608: "" => "subnet-5983ff2e"
  tags.#:                "" => "2"
  tags.Name:             "" => "test98_subnet_group"
aws_db_subnet_group.db_subnet_group: Creation complete
aws_rds_cluster.database: Creating...
  apply_immediately:                 "" => "<computed>"
  availability_zones.#:              "" => "2"
  availability_zones.1924028850:     "" => "eu-west-1b"
  availability_zones.3953592328:     "" => "eu-west-1a"
  cluster_identifier:                "" => "test98"
  cluster_members.#:                 "" => "<computed>"
  database_name:                     "" => "mydb"
  db_subnet_group_name:              "" => "test98_test_subnet_group"
  endpoint:                          "" => "<computed>"
  engine:                            "" => "<computed>"
  master_password:                   "" => "XXXXXXXX"
  master_username:                   "" => "root"
  port:                              "" => "<computed>"
  vpc_security_group_ids.#:          "" => "1"
  vpc_security_group_ids.1287702605: "" => "sg-12345678"
aws_rds_cluster.database: Creation complete

Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

But, after resources creations, if I execute plan again this is what it says

$ terraform plan -module-depth=-1
[...]
-/+ aws_rds_cluster.database
    apply_immediately:                 "" => "<computed>"
    availability_zones.#:              "3" => "2" (forces new resource)
    availability_zones.1924028850:     "eu-west-1b" => "eu-west-1b" (forces new resource)
    availability_zones.3953592328:     "eu-west-1a" => "eu-west-1a" (forces new resource)
    availability_zones.94988580:       "eu-west-1c" => ""
    cluster_identifier:                "test98" => "test98"
    cluster_members.#:                 "0" => "<computed>"
    database_name:                     "mydb" => "mydb"
    db_subnet_group_name:              "test98_subnet_group" => "test98_subnet_group"
    endpoint:                          "test98-test.cluster-123456789012.eu-west-1.rds.amazonaws.com" => "<computed>"
    engine:                            "aurora" => "<computed>"
    master_password:                   "XXXXXXXX" => "XXXXXXXX"
    master_username:                   "root" => "root"
    port:                              "3306" => "<computed>"
    vpc_security_group_ids.#:          "1" => "1"
    vpc_security_group_ids.1287702605: "sg-12345678" => "sg-12345678"


Plan: 1 to add, 0 to change, 1 to destroy.

And what is worst, if I execute plan again it tries to destroy 'aws_rds_cluster.database' resource and create it again, with same results as previous. So it enter in a loop.

bug provideaws

Source

jordiclariana

👍1

Most helpful comment

@phinze any updates on the upstream API issue?

eedwardsdisco on 6 Jul 2016

👍3

All 15 comments

@jordiclariana hey, I'm just beginning with terraform myself, but I noticed in your configuration that you pointed cluster's db_subnet_group_name to what seems like the Name tag of the db_subnet_group. I'm not sure if this can work, but in my case I pointed it to "${aws_db_subnet_group.my_subnet_group.id}" and don't seem to have this issue.

edit: nevermind, I think I misread it

maxim on 4 Nov 2015

Hi @maxim,

What you saw was terraform resolution of my_subnet_group.id, and it should be alright. I still think is a bug only in the availability_zone parameter.

jordiclariana on 4 Nov 2015

Ok I just switched down from 3 to 2 AZs for my whole RDS setup, and seeing the same issue. Destroying and recreating the entire thing from scratch (including the VPC) doesn't help solve it.

maxim on 9 Nov 2015

As far as I understand availability_zones are determined by the sum availability_zone of each subnet in your subnet_group. That is if you're using a VPC style setup.

See relevant RDS docs

pixeleet on 14 Jan 2016

@pixeleet if that's so, then availability_zones should'nt be parametrizable, right? All in all is a little bit confusing, but from my point of view, if you define a set of availability_zones they should be honored always.

jordiclariana on 14 Jan 2016

As far as I understand availability_zones are for EC2 type setups and subnets are for VPC type setups.

See:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-vpc.html#differences-ec2-classic-vpc

pixeleet on 14 Jan 2016

👍2

Oh hey just noticed this issue after working on #5418 today.

This does seem to be an AWS API problem - check out the behavior here:

https://gist.github.com/phinze/34c87dc974e70681a41a

I've reached out to AWS Support about this and will report back with what I hear from them. :+1:

In the meantime, it's best to omit availability_zones from aws_rds_cluster and let AWS populate them with the full set of AZs from your region, which it always seems to do.

phinze on 18 Mar 2016

👍1

Still going back and forth with AWS Support on a thread - after some talking past each other, I believe we're understanding each other now and nearly have a conclusion. Will report back with my findings soon.

phinze on 24 Mar 2016

👍2

@phinze any updates on the upstream API issue?

eedwardsdisco on 6 Jul 2016

👍3

Edit: I was being bit by this, too, but it appears that a reliable work-around is to eliminate the availability_zones parameter when in a VPC and just rely on the subnet group.

dlgasser on 3 Feb 2017

Hi folks,

Apologies that I got pulled away from this one! Here's what I learned from the support conversation w/ AWS last March:

First, DB Subnet Groups _imply_ AZs and when you specify a DB Subnet Group you should not specify AZs explicitly.

Second, DB Subnet Groups w/ subnets in only 2 AZs trigger some AWS-side behavior w/ Aurora clusters:

DB-instances require at least two subnets in a DB Subnet Group for creating DB Instances, but clusters require three AZs aurora cluster’s backend storage to be distributed among three subnets, as my colleague mentioned.

Now, with that said, when you create a cluster and your DB Subnet Group only has the minimum of two subnets and yet a cluster requires three, it will randomly select a third AZ from that region. To avoid that, you can specify three subnets in three different AZs in your DB subnet group and that way you will make sure when you create a cluster (with specifying the DB Subnet Group that has three AZs) it will end up using the three AZs you specified in your DB Subnet Group.

Here's a further clarifying Q/A on the availability_zones parameter:

Q: Is the AvailabilityZones parameter on CreateDBCluster ever useful to set explicitly, or is it best to always leave it blank and let the AWS API either (a) populate it using the AZs of the specified DB subnet group or (b) select randomly from AZs in the current region?

Answer: Both a and b are correct description of the AWS API and it is based on what we have discussed so far. As for the AvailabilityZones option on the CreateDBCluster, it is an alternate way to using DB subnet Group to specify the AZ where you would like your db instance deployed. Meaning, the AvailabilityZones option helps you specifying all availability zones that you want your DB instances created inside this cluster. But in your case, I would recommend specifying the DB Subnet Group option when creating a cluster so you can also use the same DB Subnet Group when creating instances. That way, you will know you have consistency with your workflow when creating both clusters and instances.

This command below will create a db cluster using the db subnet group specified which has three AZs: "us-east-1e", "us-east-1d" and "us-east-1c"
$ aws rds create-db-cluster --db-cluster-identifier foobar --engine aurora --master-username user --master-user-password eightcharacters --db-subnet-group-name case1689207481

and it is similar in functionality to this:

$ aws rds create-db-cluster --db-cluster-identifier foobar --engine aurora --master-username user --master-user-password eightcharacters --AvailabilityZones "us-east-1e" "us-east-1d" "us-east-1c"

but the cluster created with the second command would be associated with the "default" db subnet group (since it wasn't not specified) thus I recommend using the first command for consistent results.

So - in summary - I think _use db_subnet_group_name and let availability_zones be computed_ is going to generally result in correct behavior.

Given this conclusion, I'm going to tentatively close the issue. But anybody who is still experiencing problems is welcome to follow up and we can reopen and investigate further. 👍 ❤️

phinze on 3 Feb 2017

👍1

@phinze I was experiencing the same issue, and your explanation resolved the problem:

  resource "aws_rds_cluster" "rdscluster" {
    cluster_identifier = "aurora-cluster"
-   availability_zones = ["us-west-2a","us-west-2c"]
+   # Don't specify AZs (below) as this causes the cluster to be recreated
+   # every time. It will be computed from the AZs in the subnet group automatically.
+   #availability_zones = ["us-west-2a","us-west-2c"]

Thanks!

UnclaimedPants on 6 Mar 2017

Second, DB Subnet Groups w/ subnets in only 2 AZs trigger some AWS-side behavior w/ Aurora clusters

I disagree...

For reference, while using a subnet group with 3 az's is recommended, it is still perfectly acceptable to just use two.

The underlying cluster storage volumes will still be placed on 3 azs (and is not changeable), but the instances that the cluster launches will be restricted to those defined in the subnet group.

Don't confuse the underlying volumes with the instances that expose access to them!

eedwardsdisco on 6 Mar 2017

Is there any way to tell AWS/terraform to provision both the main node and all replicas in the same availability zone? I don't want to pay for data transfer charges in between availability zones but right now terraform provisions the nodes in any of the availability zones that are inferred by the subnet group.

pradyuman on 14 Mar 2017

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.