Terraform-provider-aws: Aurora RDS Global cluster cannot be destroyed without manually removing both primary and secondary cluster from Global cluster

Created on 22 Mar 2020  ·  6Comments  ·  Source: hashicorp/terraform-provider-aws

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.24

Affected Resource(s)

  • aws_rds_global_cluster

Terraform Configuration Files

resource "aws_rds_global_cluster" "global" {
  provider                  = aws.pri
  global_cluster_identifier = "test-cluster"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  storage_encrypted         = false
}

resource "aws_rds_cluster" "primary_cluster" {
  provider                  = aws.pri
  availability_zones              = var.primary_availability_zones
  cluster_identifier              = var.cluster_name
  database_name                   = "test"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  engine_mode     = "global"
  global_cluster_identifier       = aws_rds_global_cluster.global.id
  master_password                 = "test"
  master_username                 = "test"
  skip_final_snapshot             = true
  storage_encrypted               = false
  vpc_security_group_ids          = [var.primary_vpc]
}

resource "aws_rds_cluster_instance" "primary" {
  provider                  = aws.pri
  cluster_identifier           = aws_rds_cluster.primary_cluster.id
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  identifier                   = "${var.db_instance_name}"
  instance_class               = "db.r4.large"
}

resource "aws_rds_cluster" "secondary_cluster" {
  provider                        = aws.sec
  apply_immediately               = var.cluster_apply_change_immediately
  availability_zones              = var.sec_az
  cluster_identifier              = "${var.cluster_name}-sec"
  db_subnet_group_name            = "default-vpc-test"
  depends_on                      = [aws_rds_cluster_instance.primary]
  engine                    = "aurora-mysql"
  engine_mode     = "global"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  global_cluster_identifier       = aws_rds_global_cluster.global.id
  skip_final_snapshot             = true
  storage_encrypted               = false
  vpc_security_group_ids          = [var.test_sec_groups]
}

resource "aws_rds_cluster_instance" "secondary" {
  provider                     = aws.sec
  cluster_identifier           = aws_rds_cluster.secondary_cluster.id
  db_subnet_group_name         = "default-vpc-test123"
  engine                    = "aurora-mysql"
  engine_version            = "5.7.mysql_aurora.2.07.1"
  identifier                   = "test_sec-sec"
  instance_class               = "db.r4.large"
  publicly_accessible          = false
}

Expected Behavior

Global cluster is destroyed without user intervention.

Actual Behavior

Terraform complains that Cluster is a part of a global cluster:

Error: error deleting RDS Cluster (####): InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first
    status code: 400, request id: 5f4f348f-15ea-4f82-9229-586497e8dd9c

And after that both primary and secondary clusters has to be removed manually from global cluster, and have to re-run terraform destroy

Steps to Reproduce

Run

  1. terraform destroy
  2. Wait for terraform to complain.
  3. Go to AWS Console and remove secondary cluster from Global cluster, wait for it to be removed.
  4. Remove Primary cluster from Global cluster.
  5. Run terraform destroy again.

Some insight

From the debug output I can see that engine_mode is set to provisioned even though I declare it to be global. I have also noticed that global_cluster_identifier is empty even though it is specifically declared.

7396             "engine_mode": "provisioned",
7397             "engine_version": "5.7.mysql_aurora.2.07.1",
7398             "final_snapshot_identifier": null,
7399             "global_cluster_identifier": "",

I believe following code: https://github.com/terraform-providers/terraform-provider-aws/blob/737b2cf2c46a763f2b071d5bb84c2fb885388207/aws/resource_aws_rds_cluster.go#L1043 which is supposed to Fetch and save Global Cluster if engine_mode global is not executed because engine_mode is not updated from the default value of provisioned. Due to which the value of global_cluster_identifier is empty string.

Following line of code checks if global_cluster_identifier is empty and then executes RemoveFromGlobalClusterInput, which I believe is not executed because global_cluster_identifier is empty.

https://github.com/terraform-providers/terraform-provider-aws/blob/737b2cf2c46a763f2b071d5bb84c2fb885388207/aws/resource_aws_rds_cluster.go#L1242

bug servicrds

Most helpful comment

I was about to open and issue for the same problem. The following is more information about the issue, and a potential fix.

Removing RDS Aurora database clusters from RDS Aurora global clusters fails for Aurora version 1.22 and later, and 2.07 and later. Because of this deletion of a database cluster that is part of a global cluster fails, as the deletion requires removal from the global cluster.

The root cause of the error is a change on the AWS side. Engine mode global has been deprecated. Engine mode for database clusters using version 1.22 and later and 2.07 and later will be set to provisioned even if it is part of a global cluster. Other engine modes (serverless, parallelquery, and multimaster) might still be valid.

If one tries to delete a database cluster that is part of a global cluster, it errors out with the following error.

InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first

In order to bypass this error, the terraform provider automatically removes a database cluster from the global cluster before deleting it.
https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1257-L1271

However, to be efficient, this piece of code is only invoked if the database cluster is part of a global cluster, which the terraform provider code bases on the value of EngineMode.
The relation is indirect.

Proposed Fix

Try populating the global_cluster_identifier for database clusters with EngineMode == "global" or EngineMode == "provisioned", and let rdsDescribeGlobalClusterFromDbClusterARN function return a blank blue for database cluster not part of a global cluster, if it doesn't already do that.

<   if aws.StringValue(dbc.EngineMode) == "global" {
---
>   if aws.StringValue(dbc.EngineMode) == "global" || aws.StringValue(dbc.EngineMode) == "provisioned" {

All 6 comments

I was about to open and issue for the same problem. The following is more information about the issue, and a potential fix.

Removing RDS Aurora database clusters from RDS Aurora global clusters fails for Aurora version 1.22 and later, and 2.07 and later. Because of this deletion of a database cluster that is part of a global cluster fails, as the deletion requires removal from the global cluster.

The root cause of the error is a change on the AWS side. Engine mode global has been deprecated. Engine mode for database clusters using version 1.22 and later and 2.07 and later will be set to provisioned even if it is part of a global cluster. Other engine modes (serverless, parallelquery, and multimaster) might still be valid.

If one tries to delete a database cluster that is part of a global cluster, it errors out with the following error.

InvalidDBClusterStateFault: This cluster is a part of a global cluster, please remove it from globalcluster first

In order to bypass this error, the terraform provider automatically removes a database cluster from the global cluster before deleting it.
https://github.com/terraform-providers/terraform-provider-aws/blob/acce77b1887ce80ea1b0f1e291ee2356aa605cd8/aws/resource_aws_rds_cluster.go#L1257-L1271

However, to be efficient, this piece of code is only invoked if the database cluster is part of a global cluster, which the terraform provider code bases on the value of EngineMode.
The relation is indirect.

Proposed Fix

Try populating the global_cluster_identifier for database clusters with EngineMode == "global" or EngineMode == "provisioned", and let rdsDescribeGlobalClusterFromDbClusterARN function return a blank blue for database cluster not part of a global cluster, if it doesn't already do that.

<   if aws.StringValue(dbc.EngineMode) == "global" {
---
>   if aws.StringValue(dbc.EngineMode) == "global" || aws.StringValue(dbc.EngineMode) == "provisioned" {

Got AWS to update their documentation to reflect the change.
https://docs.aws.amazon.com/cli/latest/reference/rds/describe-db-clusters.html

Note
global engine mode only applies for global database clusters created with Aurora MySQL version 5.6.10a. For higher Aurora MySQL versions, the clusters in a global database use provisioned engine mode. To check if a DB cluster is part of a global database, use DescribeGlobalClusters instead of checking the EngineMode return value from DescribeDBClusters .

I get the same problem with Aurora postgresql

The fix for this has been merged and will release with version 2.59.0 of the Terraform AWS Provider, later today. 👍

This has been released in version 2.59.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

Was this page helpful?
0 / 5 - 0 ratings