Terraform-provider-aws: aws_elasticache_replication_group always recreates when cluster_mode replicas_per_node_group is zero

Created on 12 Jun 2018  路  7Comments  路  Source: hashicorp/terraform-provider-aws

When a aws_elasticache_replication_group uses cluster mode and the replicas_per_node is 0, the resource will want to recreate every time.

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

$ terraform version
Terraform v0.11.7
+ provider.aws v1.22.0

Affected Resource(s)

  • aws_elasticache_replication_group

Terraform Configuration Files

variable "size" {
  default = "1"
}

resource "aws_elasticache_replication_group" "foo" {
  replication_group_id          = "foo"
  replication_group_description = "foo"
  node_type                     = "cache.r3.large"
  automatic_failover_enabled    = "${var.size > 1}"

  cluster_mode {
    num_node_groups         = 1
    replicas_per_node_group = "${var.size - 1}"
  }
}

Debug Output

https://gist.github.com/b-dean/2746b1e47ff03544e3c6e92fcf877af6

Expected Behavior

Second terraform plan should show no changes needed

Actual Behavior

The plan shows:

-/+ aws_elasticache_replication_group.foo (new resource required)
      id:                                     "foo" => <computed> (forces new resource)
      apply_immediately:                      "" => <computed>
      at_rest_encryption_enabled:             "false" => "false"
      auto_minor_version_upgrade:             "true" => "true"
      automatic_failover_enabled:             "false" => "false"
      cluster_mode.#:                         "0" => "1"
      cluster_mode.0.num_node_groups:         "" => "1"
      cluster_mode.0.replicas_per_node_group: "" => "0" (forces new resource)
      configuration_endpoint_address:         "" => <computed>
      engine:                                 "redis" => "redis"
      engine_version:                         "3.2.10" => <computed>
      maintenance_window:                     "fri:08:30-fri:09:30" => <computed>
      node_type:                              "cache.r3.large" => "cache.r3.large"
      number_cache_clusters:                  "1" => <computed>
      parameter_group_name:                   "default.redis3.2" => <computed>
      primary_endpoint_address:               "foo.27yrhh.ng.0001.use1.cache.amazonaws.com" => <computed>
      replication_group_description:          "foo" => "foo"
      replication_group_id:                   "foo" => "foo"
      security_group_ids.#:                   "0" => <computed>
      security_group_names.#:                 "0" => <computed>
      snapshot_window:                        "06:00-07:00" => <computed>
      subnet_group_name:                      "default" => <computed>
      transit_encryption_enabled:             "false" => "false"


Plan: 1 to add, 0 to change, 1 to destroy.

Steps to Reproduce

  1. terraform apply
  2. terraform plan

Important Factoids

We have the variable size because in our development environments we don't want a bunch of costly replicas, whereas in production we might add a -var size=5. I thought that when size is greater than 1, we don't see this behavior of the cluster_mode showing the wrong values from the refresh, but I just ran it with -var size=2 and it wanted to recreate on the second apply. Same thing with the cluster_mode information missing.

bug servicelasticache

Most helpful comment

@casalewag @jeremygaither @b-dean I figured it out! It's not a bug. It works as it should. The issue here is that amazon returns "ClusterEnabled": "false" when terraform wants to compare the state with whatever's running on AWS. This is caused by using the default parameter group. The default parameter group has ClusterEnabled set to false. Solvable by passing a parameter group that has ClusterMode enabled (either a default one or a custom one). This sadly will force you to recreate the resource as it's not something you can change after creation. Hope this helps. I'll make a PR to update the documentation to warn users about this.

All 7 comments

Also seeing this when not set to zero. cluster_mode.# always seem to change to 0.

-/+ aws_elasticache_replication_group.iavs (new resource required)
      id:                                     "REDACTED" => <computed> (forces new resource)
      apply_immediately:                      "" => <computed>
      at_rest_encryption_enabled:             "false" => "false"
      auto_minor_version_upgrade:             "true" => "true"
      automatic_failover_enabled:             "true" => "true"
      cluster_mode.#:                         "0" => "1"
      cluster_mode.0.num_node_groups:         "" => "1"
      cluster_mode.0.replicas_per_node_group: "" => "2" (forces new resource)
      configuration_endpoint_address:         "" => <computed>
      engine:                                 "redis" => "redis"
      engine_version:                         "2.8.24" => "2.8.24"
      maintenance_window:                     "wed:05:00-wed:06:00" => <computed>
      node_type:                              "cache.m4.large" => "cache.m4.large"
      number_cache_clusters:                  "3" => <computed>
      parameter_group_name:                   "default.redis2.8" => "default.redis2.8"
      primary_endpoint_address:               "REDACTED use1.cache.amazonaws.com" => <computed>
      replication_group_description:          "REDACTED" => "REDACTED"
      replication_group_id:                   "REDACTED" => "REDACTED"
      security_group_ids.#:                   "0" => <computed>
      security_group_names.#:                 "0" => <computed>
      snapshot_retention_limit:               "3" => "3"
      snapshot_window:                        "00:00-05:00" => "00:00-05:00"
      subnet_group_name:                      "REDACTED" => "REDACTED"
      transit_encryption_enabled:             "false" => "false"

Same. This is currently a blocking issue for something we want to setup in production.
terraform: 0.10.8
provider: 1.26.0

-/+ module.env.module.elasticache-ha.aws_elasticache_replication_group.redis (new resource required)
      id:                                     "REDACTED" => <computed> (forces new resource)
      apply_immediately:                      "" => <computed>
      at_rest_encryption_enabled:             "true" => "true"
      auth_token:                             <sensitive> => <sensitive> (attribute changed)
      auto_minor_version_upgrade:             "true" => "true"
      automatic_failover_enabled:             "true" => "true"
      cluster_mode.#:                         "0" => "1"
      cluster_mode.0.num_node_groups:         "" => "1"
      cluster_mode.0.replicas_per_node_group: "" => "1" (forces new resource)
      configuration_endpoint_address:         "" => <computed>
      engine:                                 "redis" => "redis"
      engine_version:                         "4.0.10" => "4.0.10"
      maintenance_window:                     "fri:22:30-fri:23:30" => "sun:00:00-sun:03:00"
      member_clusters.#:                      "2" => <computed>
      node_type:                              "cache.m3.large" => "cache.m3.large"
      number_cache_clusters:                  "2" => <computed>
      parameter_group_name:                   "default.redis4.0" => <computed>
      port:                                   "6379" => "6379"
      primary_endpoint_address:               "REDACTED.euw1.cache.amazonaws.com" => <computed>
      replication_group_description:          "Replication group for redis elasticache" => "Replication group for redis elasticache"
      replication_group_id:                   "REDACTED" => "REDACTED"
      security_group_ids.#:                   "1" => "1"
      security_group_ids.1286157895:          "REDACTED" => "REDACTED"
      security_group_names.#:                 "0" => <computed>
      snapshot_window:                        "04:30-05:30" => <computed>
      subnet_group_name:                      "REDACTED" => "REDACTED"
      transit_encryption_enabled:             "true" => "true"

This was working earlier with 0 replicas_per_node_group and 2 num_node_groups

inspection of the tfstate shows that "cluster_mode.#": "0", is what ends up in the statefile. This is likely the cause.

Edit: I was wrong, statefile says the following:

                            "cluster_mode.#": "1",
                            "cluster_mode.4199676665.num_node_groups": "1",
                            "cluster_mode.4199676665.replicas_per_node_group": "1"

It's really strange to me that TF incorrectly goes looking for information at index 0 for this information.

Did some more digging. Problem went away after I reverted the TF provider to 1.12.0

@kiwivogel this worked for me as well, thank you very much.

provider "aws" {
  version = "1.12.0"
  }

Can we confirm that this is being worked on?
I would prefer to not specify an older version of AWS provider when this will be running in production.

@casalewag @jeremygaither @b-dean I figured it out! It's not a bug. It works as it should. The issue here is that amazon returns "ClusterEnabled": "false" when terraform wants to compare the state with whatever's running on AWS. This is caused by using the default parameter group. The default parameter group has ClusterEnabled set to false. Solvable by passing a parameter group that has ClusterMode enabled (either a default one or a custom one). This sadly will force you to recreate the resource as it's not something you can change after creation. Hope this helps. I'll make a PR to update the documentation to warn users about this.

@casalewag @jeremygaither @b-dean additionally, if you feel safe doctoring tfstate files (disclamer, this is generally speaking a very bad idea) that you can actually "fix" this by removing the cluster_mode block from the state file and your resource definition and replacing it with
number_cache_clusters = "${var.size}" or total number of nodes in the resource definition, the statefile already has the correct number. Additionally @radeksimko this issue should probably be closed because it's not a bug but a configuration issue.

Thanks @kiwivogel this fixed it for me!

For those looking for a quick fix, if you're using Redis Cluster Mode you want to use default.redis5.0.cluster.on as the parameter group name.

Was this page helpful?
0 / 5 - 0 ratings