Terraform-provider-aws: Terraform seems to ignore "skip_final_snapshot" for rds cluster

Created on 8 Dec 2017  路  26Comments  路  Source: hashicorp/terraform-provider-aws

_This issue was originally opened by @andrei-dascalu as hashicorp/terraform#16861. It was migrated here as a result of the provider split. The original body of the issue is below._


Hello,

I'm not sure if the following is a terraform or aws provider issue but it sure seems like it's an issue as in my case the request behaves as if "skip_final_snapshot" isn't set or it's always true and it's expecting a "final_snapshot_identifier" (though even when provided the error claims it's not there). I've linked a similar issue as well.

Thanks!

Terraform Version

Terraform v0.11.1
+ provider.aws v1.5.0

Terraform Configuration Files

variable "db_password" {
  description = "Database password"
}

variable "db_name" {
  description = "Database name"
}

resource "aws_rds_cluster_instance" "example" {
  count              = 2
  engine             = "aurora"
  identifier         = "aurora-cluster-demo-${count.index}"
  cluster_identifier = "${aws_rds_cluster.example.id}"
  instance_class     = "db.t2.small"
}

resource "aws_rds_cluster" "example" {
  cluster_identifier = "aurora-cluster-demo"
  availability_zones = ["eu-west-1a", "eu-west-1b"]
  database_name      = "${var.db_name}"
  master_username    = "admin"
  master_password    = "${var.db_password}"
  skip_final_snapshot = true
  final_snapshot_identifier = "whatever"
}

Debug Output


https://gist.github.com/andrei-dascalu/9eaaa3491b5ad77f7e4f8c4f11dc960a

Expected Behavior


RDS Cluster with 2 instances created

Actual Behavior


Error:
1 error(s) occurred:

  • aws_rds_cluster.example (destroy): 1 error(s) occurred:

  • aws_rds_cluster.example: RDS Cluster FinalSnapshotIdentifier is required when a final snapshot is required

Steps to Reproduce


run: terraform apply

References

  • hashicorp/terraform#5417
bug servicrds upstream-terraform

Most helpful comment

Alright, so on a normal RDS instance I've found a way to easily replicate this bug and a proposed fix.

Replication:

  1. Spin up an RDS instance with Terraform (with skip_final_snapshot = false and no name set for snapshot)
  2. Either set or change the identifier on the database.
  3. Attempt to apply.
  4. It will complain that it cannot do this since the instance requires a final snapshot and there is no name set.
  5. Change the identifier back to the original value.
  6. Set skip_final_snapshot = true.
  7. Apply. This will set that variable in the state file.
  8. Change the identifier.
  9. Apply. It will now properly destroy and recreate the instance.

Suggested fix:

If possible, it seems like a good idea to check if skip_final_snapshot = true via the config and NOT just the state. This would prevent the song and dance seen above.

Right now, it is going soley by the state so if you end up in a state where skip_final_snapshot is already false, then it's only possible to fix this by ensuring that is set to true (Without modifying the state file manually, which is bad practice)

All 26 comments

From debug output, skip_final_snapshot was set to false in the beginning and you wanted to change to true and add new instances to the cluster at the same time?

2017/12/06 20:29:29 [TRACE] DiffTransformer: Module: DESTROY/CREATE: aws_rds_cluster.example
  apply_immediately:               "" => "<computed>"
  availability_zones.#:            "3" => "2" (forces new resource)
  availability_zones.1924028850:   "eu-west-1b" => "eu-west-1b"
  availability_zones.3953592328:   "eu-west-1a" => "eu-west-1a"
  availability_zones.94988580:     "eu-west-1c" => "" (forces new resource)
  backup_retention_period:         "1" => "1"
  cluster_identifier:              "aurora-cluster-demo" => "aurora-cluster-demo"
  cluster_identifier_prefix:       "" => "<computed>"
  cluster_members.#:               "0" => "<computed>"
  cluster_resource_id:             "cluster-CEPPSOIGTTLSDYOFWR33MK4NAM" => "<computed>"
  database_name:                   "mydb" => "test" (forces new resource)
  db_cluster_parameter_group_name: "default.aurora5.6" => "<computed>"
  db_subnet_group_name:            "default" => "<computed>"
  endpoint:                        "aurora-cluster-demo.cluster-czvdbs8joxz1.eu-west-1.rds.amazonaws.com" => "<computed>"
  engine:                          "aurora" => "aurora"
  engine_version:                  "5.6.10a" => "<computed>"
  final_snapshot_identifier:       "" => "whatever"
  kms_key_id:                      "" => "<computed>"
  master_password:                 "<sensitive>" => "<sensitive>" (attribute changed)
  master_username:                 "admin" => "admin"
  port:                            "3306" => "<computed>"
  preferred_backup_window:         "23:03-23:33" => "<computed>"
  preferred_maintenance_window:    "mon:04:40-mon:05:10" => "<computed>"
  reader_endpoint:                 "aurora-cluster-demo.cluster-ro-czvdbs8joxz1.eu-west-1.rds.amazonaws.com" => "<computed>"
  skip_final_snapshot:             "false" => "true"
  storage_encrypted:               "false" => "false"
  vpc_security_group_ids.#:        "1" => "<computed>"

@loivis that actually seems to capture the problem. Even though the .tf code shows skip_final_snapshot = false, attempts to run (plan/apply) show that something inside terraform wants to change it to true despite the user's specification.

I just took a quick look at the code and the only place I could find where that variable was being set to true was here.

I have also encountered this bug with RDS read replicas, for which final snapshots do not even apply.

This bug still exists (trying to destroy a Postgres RDS instance, Terraform v0.11.7).

It seems to manifest if 'skip_final_snapshot' parameter was missing when the instance got created, and it's added later. If the 'skip_final_snapshot' was specified from the beginning, everything works correctly.

Alright, so on a normal RDS instance I've found a way to easily replicate this bug and a proposed fix.

Replication:

  1. Spin up an RDS instance with Terraform (with skip_final_snapshot = false and no name set for snapshot)
  2. Either set or change the identifier on the database.
  3. Attempt to apply.
  4. It will complain that it cannot do this since the instance requires a final snapshot and there is no name set.
  5. Change the identifier back to the original value.
  6. Set skip_final_snapshot = true.
  7. Apply. This will set that variable in the state file.
  8. Change the identifier.
  9. Apply. It will now properly destroy and recreate the instance.

Suggested fix:

If possible, it seems like a good idea to check if skip_final_snapshot = true via the config and NOT just the state. This would prevent the song and dance seen above.

Right now, it is going soley by the state so if you end up in a state where skip_final_snapshot is already false, then it's only possible to fix this by ensuring that is set to true (Without modifying the state file manually, which is bad practice)

Hit this also, please fix :)

Even though I have mentioned skip_final_snapshot = "true", I am getting the below error:

RDS Cluster FinalSnapshotIdentifier is required when a final snapshot is required

Terraform v0.11.3

  • provider.aws v1.13.0

@rupsray See my comment here

https://github.com/terraform-providers/terraform-provider-aws/issues/2588#issuecomment-396299734

This should allow you to workaround the issue.

If you examine your state file, you'll likely notice that it's not set to true and that it will attempt to destroy it before first changing skip_final_snapshot = "true" in the state file.

This appears to still be broken over a year later.

Not only is it still broken, there is another issue with the using skip_final_snapshot = false.

If I set the final_snapshot_identifier to a constant name (e.g., mydb-final-snapshot), terraform destroy fails if the snapshot with that name already exists. While I understand why, there doesn't seem to be a good way to get around this limitation other than to manually delete the previous final snapshot or change the name of the final snapshot every time you run terraform destroy. Sigh...

I hate to say this, but I keep running issues that have been for for 2, 3, 4, or 5 years that aren't fixed.

They really need to do something about prioritizing issues based on how long they've been open.

Is it they or us? Do they accept patches? Anyone willing to take these on?

Being as this only affects RDS instances that were created without final snapshot policy set to enforced, I don't know if this is a bug so much as a good place to put some notifications that you can't change this option with TF. I would think if someone wanted it protected from the start, they wouldn't want some accidental change to some TF configurations to make it unprotected.

I just tried to reproduce the original issue with the latest AWS provider and couldn't. @rpatrick00 I'm going to take a look at your issue tonight.

Okay, I can reproduce and see what's going on here. For starters, skip_final_snapshot defaults to False which should also require final_snapshot_identifier to be set but it's not so what happens is the create/update is applied, state updated where skip_final_snapshot is False but final_snapshot_identifier is null. This causes the destroy operation to fail it's verification stage.

This can be fixed but I don't really have a great story for those who already have prexisting state. One possibility would be that a delete operation ignores skip_final_shopshot if the identifier is null. Another might be to default final_snapshot_identifier to something random if skip_final_snapshot is set to or defaulted to False. I think for data safety reasons, ignoring skip_final_snapshot if final_snapshot_identifier is null is a bad idea and it'd be better to just randomize an identifier.

Thanks very much for taking the time to look into this.

A randomized ID seems safest to me, if it is a UUID or base32'ed UUID or something guaranteed to be random.

is this issue resolved or not ? I am using Terraform v0.12.23 and facing this issue. Tried with some workaround but its not working

I opened a PR last year and it looks like there's now conflicting changes. I'll update the PR but I can only be hopeful this gets more notice.

still an issue? lol

I'm using:

Terraform v0.12.24
+ provider.aws v2.57.0

And this is still an issue! Any plans to fix this bug?

Same here:

Terraform v0.12.24
+ provider.aws v2.58.0

I believe I have reproduced the same issue using Terraform 0.13-beta2:

Terraform v0.13.0-beta2
+ provider registry.terraform.io/hashicorp/aws v2.67.0

After adding skip_final_snapshot = true, I still had skip_final_snapshot = false in the state file. I edited that to be also true, and I could delete the DB instance.

I ran into the same issue here with Terraform 0.12.26. Any solutions for this?

In my case I encountered this issue when trying to destroy my RDS resources.
Below are the steps I took for solving this issue:

  1. Change skip_final_snapshot to true and remove final_snapshot_identifier if exists.

  2. Remove backup_window (Under AWS Aurora its probably called preferred_backup_window).

  3. Change backup_retention_period to 0.

  4. Make sure that apply_immediately is set to true.

(*) A more detailed answer is given here.

(**) Running under: Terraform v0.12.28, provider.aws v2.70.0

Terraform Version 0.13.5 and still RDS instance deletion is not as clean and graceful as it should be.

Was this page helpful?
0 / 5 - 0 ratings