Terraform-provider-aws: Aurora RDS Global cluster cannot be destroyed without manually removing both primary and secondary cluster from Global cluster

Created on 22 Mar 2020 · 6Comments · Source: hashicorp/terraform-provider-aws

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.24

Affected Resource(s)

aws_rds_global_cluster

Terraform Configuration Files

resource "aws_rds_global_cluster" "global" {
  provider                  = aws.pri
  global_cluster_identifier = "test-cluster"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  storage_encrypted         = false
}

resource "aws_rds_cluster" "primary_cluster" {
  provider                  = aws.pri
  availability_zones              = var.primary_availability_zones
  cluster_identifier              = var.cluster_name
  database_name                   = "test"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  engine_mode     = "global"
  global_cluster_identifier       = aws_rds_global_cluster.global.id
  master_password                 = "test"
  master_username                 = "test"
  skip_final_snapshot             = true
  storage_encrypted               = false
  vpc_security_group_ids          = [var.primary_vpc]
}

resource "aws_rds_cluster_instance" "primary" {
  provider                  = aws.pri
  cluster_identifier           = aws_rds_cluster.primary_cluster.id
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  identifier                   = "${var.db_instance_name}"
  instance_class               = "db.r4.large"
}

resource "aws_rds_cluster" "secondary_cluster" {
  provider                        = aws.sec
  apply_immediately               = var.cluster_apply_change_immediately
  availability_zones              = var.sec_az
  cluster_identifier              = "${var.cluster_name}-sec"
  db_subnet_group_name            = "default-vpc-test"
  depends_on                      = [aws_rds_cluster_instance.primary]
  engine                    = "aurora-mysql"
  engine_mode     = "global"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  global_cluster_identifier       = aws_rds_global_cluster.global.id
  skip_final_snapshot             = true
  storage_encrypted               = false
  vpc_security_group_ids          = [var.test_sec_groups]
}

resource "aws_rds_cluster_instance" "secondary" {
  provider                     = aws.sec
  cluster_identifier           = aws_rds_cluster.secondary_cluster.id
  db_subnet_group_name         = "default-vpc-test123"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  identifier                   = "test_sec-sec"
  instance_class               = "db.r4.large"
  publicly_accessible          = false
}

Expected Behavior

Global cluster is destroyed without user intervention.

Actual Behavior

Terraform complains that Cluster is a part of a global cluster:

Error: error deleting RDS Cluster (####): InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
    status code: 400, request id: 5f4f348f-15ea-4f82-9229-586497e8dd9c

And after that both primary and secondary clusters has to be removed manually from global cluster, and have to re-run terraform destroy

Steps to Reproduce

Run

terraform destroy
Wait for terraform to complain.
Go to AWS Console and remove secondary cluster from Global cluster, wait for it to be removed.
Remove Primary cluster from Global cluster.
Run terraform destroy again.

Some insight

From the debug output I can see that engine_mode is set to provisioned even though I declare it to be global. I have also noticed that global_cluster_identifier is empty even though it is specifically declared.

7396             "engine_mode": "provisioned",
7397             "engine_version": "5.7.mysql_aurora.2.07.1",
7398             "final_snapshot_identifier": null,
7399             "global_cluster_identifier": "",

I believe following code: https://github.com/terraform-providers/terraform-provider-aws/blob/737b2cf2c46a763f2b071d5bb84c2fb885388207/aws/resource_aws_rds_cluster.go#L1043 which is supposed to Fetch and save Global Cluster if engine_mode global is not executed because engine_mode is not updated from the default value of provisioned. Due to which the value of global_cluster_identifier is empty string.

Following line of code checks if global_cluster_identifier is empty and then executes RemoveFromGlobalClusterInput, which I believe is not executed because global_cluster_identifier is empty.

https://github.com/terraform-providers/terraform-provider-aws/blob/737b2cf2c46a763f2b071d5bb84c2fb885388207/aws/resource_aws_rds_cluster.go#L1242

bug servicrds

Source

sufiyanghori

👍4

Most helpful comment

I was about to open and issue for the same problem. The following is more information about the issue, and a potential fix.

Removing RDS Aurora database clusters from RDS Aurora global clusters fails for Aurora version 1.22 and later, and 2.07 and later. Because of this deletion of a database cluster that is part of a global cluster fails, as the deletion requires removal from the global cluster.

The root cause of the error is a change on the AWS side. Engine mode global has been deprecated. Engine mode for database clusters using version 1.22 and later and 2.07 and later will be set to provisioned even if it is part of a global cluster. Other engine modes (serverless, parallelquery, and multimaster) might still be valid.

If one tries to delete a database cluster that is part of a global cluster, it errors out with the following error.

InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first

In order to bypass this error, the terraform provider automatically removes a database cluster from the global cluster before deleting it.
https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1257-L1271

However, to be efficient, this piece of code is only invoked if the database cluster is part of a global cluster, which the terraform provider code bases on the value of EngineMode.
The relation is indirect.

the global_cluster_identifier only gets populated for database clusters with EngineMode == "global" https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1052
and the removal from global cluster code only gets executed if global_cluster_identifier is set to a non empty string https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1259

Proposed Fix

Try populating the global_cluster_identifier for database clusters with EngineMode == "global" or EngineMode == "provisioned", and let rdsDescribeGlobalClusterFromDbClusterARN function return a blank blue for database cluster not part of a global cluster, if it doesn't already do that.

<   if aws.StringValue(dbc.EngineMode) == "global" {
---
>   if aws.StringValue(dbc.EngineMode) == "global" || aws.StringValue(dbc.EngineMode) == "provisioned" {

prateekjaipuria on 23 Mar 2020

👍2

All 6 comments

I was about to open and issue for the same problem. The following is more information about the issue, and a potential fix.

If one tries to delete a database cluster that is part of a global cluster, it errors out with the following error.

InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first

the global_cluster_identifier only gets populated for database clusters with EngineMode == "global" https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1052
and the removal from global cluster code only gets executed if global_cluster_identifier is set to a non empty string https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1259

Proposed Fix

<   if aws.StringValue(dbc.EngineMode) == "global" {
---
>   if aws.StringValue(dbc.EngineMode) == "global" || aws.StringValue(dbc.EngineMode) == "provisioned" {

prateekjaipuria on 23 Mar 2020

👍2

Got AWS to update their documentation to reflect the change.
https://docs.aws.amazon.com/cli/latest/reference/rds/describe-db-clusters.html

Note
global engine mode only applies for global database clusters created with Aurora MySQL version 5.6.10a. For higher Aurora MySQL versions, the clusters in a global database use provisioned engine mode. To check if a DB cluster is part of a global database, use DescribeGlobalClusters instead of checking the EngineMode return value from DescribeDBClusters .

prateekjaipuria on 2 Apr 2020

👍1

I get the same problem with Aurora postgresql