Terraform: Update/replace resource when a dependency is changed

Created on 10 Aug 2016  路  45Comments  路  Source: hashicorp/terraform

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    depends_on = ["foo.bar"]
}

bar.foo is not modified if the file 'foobar' changed without otherwise changing the resource that includes it.

config enhancement thinking

Most helpful comment

This taint_on_dependency_change idea is an interesting one. I'm not sure I would actually implement it using the tainting mechanism, since that's more of a workflow management thing and indicates that the resource is "broken" in some way, but we could potentially think of it more like replace_on_dependency_change: artificially produce a "force new" diff any time a dependency changes.

I think this sort of thing would likely require some of the machinery from #6810 around detecting the presence of whole-resource diffs and correctly handling errors with them. There are some edge cases round what happens if B depends on A and A is changed but B encounters an error while replacing... since the intended change is not explicitly visible in the attributes, Terraform needs to make sure to do enough book-keeping that it knows it has more work to do when run again after the error is resolved.

It might work out conceptually simpler to generalize the triggers idea from null_resource or keepers from the random provider, so that it can be used on any resource:

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    lifecycle {
        replace_on_change {
            foo_bar_foobar = "${foo.bar.foobar}"
        }
    }
}

In the above example, the lifecycle.replace_on_change attribute acts as if it were a resource attribute with "forces new resource" set on it: the arbitrary members of this map are stored in the state, and on each run Terraform will diff what's in the state with what's in the config and generate a "replace" diff if any of them have changed.

This effectively gives you an extra place to represent explicit _value_ dependencies that don't have an obvious home in the resource's own attributes.

This is conceptually simpler because it can build on existing mechanisms and UX to some extent. For example, it might look like this in a diff:

-/+ bar.foo
    lifecycle.replace_on_change.foo_bar_foobar: "old_value" => "new value" (forces new resource)

In the short term we're likely to continue addressing this by adding special extra ForceNew attributes to resources where such behavior is useful, so that this technique can be used in a less-generic way where it's most valuable. This was what I'd proposed over in #6613, and has the advantage that it can be implemented entirely within a provider without requiring any core changes, and so there's much less friction to get it done. Thus having additional concrete use-cases would be helpful, either to motivate the implementation of a generic feature like above or to prompt the implementation of resource-specific solutions where appropriate.


For the moment I'm going to re-tag this one as "thinking" to indicate that it's an interesting idea but we need to gather more data (real use-cases) in order to design it well. I'd encourage other folks to share concrete use-cases they have in this area as separate issues, similar to what's seen in #6613, and mention this issue by number so that it can become a collection of links to relevant use-cases that can inform further design.

All 45 comments

Hi @OJFord
would you mind providing more concrete example with real resources that would help us reproduce the unexpected behaviour you described?

Thanks.

@radeksimko please check issue referenced #6613. This is pretty important and can be hit in other places as well. From my experiments, I observed that "depends on" is only related to order. It does not trigger a change.

Hi @OJFord and @cemo,

In Terraform's design, a dependency edge (which is what depends_on creates explicitly) is used only for _ordering_ operations. So in the very theoretical example given in the issue summary, Terraform knows that when it's doing any operation that affects _both_ foo.bar and bar.foo it will always do the operation to foo.bar first.

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

So in practice this means that the bar.foo in the original example will only get an "update" diff if any of its own attributes are changed. To @radeksimko's point it's hard to give a good example without a real use-case, but the way this would be done is to interpolate some attribute of bar.foo into foo.bar such that an update diff will be created whenever _that attribute_ changes. Note that it's always attribute-oriented... you need to interpolate the specific value that will be changing.

In practice this behavior does cause some trouble on edge cases, and those edge cases are what #4846 and #8769 are about: allowing Terraform to detect the _side effects_ of a given update, such as the version_id on an Amazon S3 object implicitly changing each time its content is updated.

Regarding your connection to that other issue @cemo, you are right that the given issue is another one of these edge cases, though a slightly different one: taking an action (deploying) directly in response to another action (updating some other resource), rather than using attribute-based diffing... though for this API gateway case in particular, since API gateway encourages you to create _a lot_ of resources, the specific syntax proposed there would likely be inconvenient/noisy.

Again as @radeksimko said a specific example from @OJFord might allow us to suggest a workaround for a specific case today, in spite of the core mechanisms I've described above. In several cases we have made special allowances in the design of a resource such that a use-case can be met, and we may be able to either suggest an already-existing one of these to use or design a new "allowance" if we have a specific example to work with. (@cemo's API gateway example is already noted, and there were already discussions about that which I will describe in more detail over there.)

I'm sorry that I never came back with an example; I'm afraid I can't remember exactly what I was doing - but:

I think you are expecting an additional behavior: if there is an update to foo.bar then there will always be an automatic update to bar.foo. But that is not actually how Terraform works, by design: the dependency edges are used for ordering, but the direct attribute values are used for diffing.

is exactly right, that was what I misunderstood.

Perhaps something like taint_on_dependency_change = true is possible? That is, if such a variable is true, change the semantic of "ordering" above from "do this after, if it needs to be done" to "do this after".

@OJFord the issue you don't remember might be #6613.

I second @OJFord's proposition and expect something like a simpler thing as taint_on_dependency_change. However I can not be considered an expert on terraform land and due to the fact that this is my first experiment with it my opinions might not weight enough.

This taint_on_dependency_change idea is an interesting one. I'm not sure I would actually implement it using the tainting mechanism, since that's more of a workflow management thing and indicates that the resource is "broken" in some way, but we could potentially think of it more like replace_on_dependency_change: artificially produce a "force new" diff any time a dependency changes.

I think this sort of thing would likely require some of the machinery from #6810 around detecting the presence of whole-resource diffs and correctly handling errors with them. There are some edge cases round what happens if B depends on A and A is changed but B encounters an error while replacing... since the intended change is not explicitly visible in the attributes, Terraform needs to make sure to do enough book-keeping that it knows it has more work to do when run again after the error is resolved.

It might work out conceptually simpler to generalize the triggers idea from null_resource or keepers from the random provider, so that it can be used on any resource:

resource "foo" "bar" {
    foobar = "${file("foobar")}"
}

resource "bar" "foo" {
    lifecycle {
        replace_on_change {
            foo_bar_foobar = "${foo.bar.foobar}"
        }
    }
}

In the above example, the lifecycle.replace_on_change attribute acts as if it were a resource attribute with "forces new resource" set on it: the arbitrary members of this map are stored in the state, and on each run Terraform will diff what's in the state with what's in the config and generate a "replace" diff if any of them have changed.

This effectively gives you an extra place to represent explicit _value_ dependencies that don't have an obvious home in the resource's own attributes.

This is conceptually simpler because it can build on existing mechanisms and UX to some extent. For example, it might look like this in a diff:

-/+ bar.foo
    lifecycle.replace_on_change.foo_bar_foobar: "old_value" => "new value" (forces new resource)

In the short term we're likely to continue addressing this by adding special extra ForceNew attributes to resources where such behavior is useful, so that this technique can be used in a less-generic way where it's most valuable. This was what I'd proposed over in #6613, and has the advantage that it can be implemented entirely within a provider without requiring any core changes, and so there's much less friction to get it done. Thus having additional concrete use-cases would be helpful, either to motivate the implementation of a generic feature like above or to prompt the implementation of resource-specific solutions where appropriate.


For the moment I'm going to re-tag this one as "thinking" to indicate that it's an interesting idea but we need to gather more data (real use-cases) in order to design it well. I'd encourage other folks to share concrete use-cases they have in this area as separate issues, similar to what's seen in #6613, and mention this issue by number so that it can become a collection of links to relevant use-cases that can inform further design.

@mitchellh This issue might be considered for 0.8 release as you improved "depends_on" and this might be a quick win.

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = "${var.max_capacity}"
  min_capacity       = "${var.min_capacity}"
  role_arn           = "${var.global_vars["ecs_as_arn"]}"

  resource_id        = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_cpu_scale_in" {
  adjustment_type         = "${var.adjustment_type}"
  cooldown                = "${var.cooldown}"
  metric_aggregation_type = "${var.metric_aggregation_type}"

  name                    = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in"
  resource_id             = "service/${var.global_vars["ecs_cluster_name"]}/${var.ecs_service_name}"
  scalable_dimension      = "ecs:service:DesiredCount"
  service_namespace       = "ecs"

  step_adjustment {
    metric_interval_upper_bound = "${var.scale_in_cpu_upper_bound}"
    scaling_adjustment          = "${var.scale_in_adjustment}"
  }

  depends_on = ["aws_appautoscaling_target.ecs_target"]
}

Hi @apparentlymart,

Here is another real user-case of my own.

resource aws_appautoscaling_policy.ecs_cpu_scale_in (let it be autoscaling policy) depends on resource aws_appautoscaling_target.ecs_target(let it be autoscaling target).

When I change the value of max_capacity, and then run terraform plan, it shows the autoscaling target is forced to new (it is going to be destroyed and re-added). But nothing will happen to autoscaling policy, which is supposed to be destroyed and re-added as well.

Why is it supposed to? Because in my practice, after terraform applysuccessfully (which destroys and re-adds the autoscaling target successfully), the autoscaling policy is gone automatically (if you login to aws console, you can see it's gone), so I have to run terraform apply again, the second time, and this time, it will add the autoscaling policy back.

(BTW. the both resources are actually defined in a module. maybe it matters, or not, I'm not sure).

Hi @ckyoog! Thanks for sharing that.

What you described there sounds like what's captured in terraform-providers/terraform-provider-aws#240. If you think it's the same thing, it would be cool if you could post the same details in that issue since having a full reproduction case is very useful. I think in your particular case this is a bug that we ought to fix in the AWS provider, though you're right that if the feature I described in my earlier comment were implemented it could in principle be used as a _workaround_.

In the mean time, you might be able to already workaround this by including an additional interpolation in your policy name to force it to get recreated when the target is recreated:

  name = "${var.global_vars["ecs_cluster_name"]}-${var.ecs_service_name}-cpu-scale-in-${aws_appautoscaling_target.ecs_target.id}"

Since the name attribute _forces new resource_, this should cause the policy to get recreated each time the target is recreated.

Thank you @apparentlymart for the workaround. Sure, I will post my case to issue terraform-provider/terraform-provider-aws#240.

Hey, just got an idea of how this might be solutionned, now the approach is inspired fron Google Cloud and I don't know if it will apply to all use cases.
Basically, in google cloud you have the notion of used by and uses on resources. For example, the link between a boot_disk and an instance. The boot_disk can exist alone as a simple disk but the instance cannot exist without a boot disk. Therefore, in the data model, you can have a generic system that states, used_by.

Example:

resource "google_compute_disk" "bastion_boot" = {
  image = "centos-7"
  size    = "10"
  used_by = ["${google_compute_instance.bastion.name}"]
}

resource "google_compute_instance" "bastion" = {
  boot_disk = {
    source = "${google_compute_disk.bastion_boot.name}"
  }
  uses = ["${google_compute_disk.bastion_boot.name}"]
}

The uses and used_by could be implicitly set in well known cases but could be explicitly set in some user and/or corner cases. And it would become the provider's responsibility to know about the implicit uses and as a workaround, it would be possible to use the explicit form.

It would work like the implicit and explicit depends_on except.

Now I understand that there are some subtle differences in the problems that have been mentioned like, I don't want to destroy, I want to update a resource for example. I don't know how my case would fit into this.

Also, I think it would be best to stick with the cloud provider's semantics, and, in my case, it really reflects what I'm doing and how everything works. This system would be a reverse depends on creating a possible destruction cycle that would be triggered before the create cycle. Which would be fine in most cases, and if you cannot tolerate a destruction, you usually apply a blue-green model anyways which doesn't give such a pain. But in my case, during my maintenance windows, I can be destructive on most of my resources.

Just some related issues:

16065 #16200

I have run into the need for this issue myself.

The use case is the following:

I have a resource for a database instance (In this case an AWS RDS instance) which performs a snapshot of its disk upon destruction. If I destroy this resource and recreate it and destroy it again, AWS returns an error because it will attempt to create a snapshot with the same identifier as before.

This can be mitigated by using something like the "random_id" resource as a suffix/prefix to that identifier. The issue is that if I taint the database resource, I need to manually remember to taint the "random_id" resource as well otherwise the new instance will have the same "random_id" as before.

Attempting to use a "keepers" pointing to the database resource id does not work because it causes a cyclic dependency.

Any ideas on how one handles that?

I've run into this same issue with trying to get an EMR cluster to rebuild when the contents of a bootstrap operation change. See https://stackoverflow.com/questions/53887061/in-terraform-how-to-recreate-emr-resource-when-dependency-changes for details.

My concrete case is similar to the ones already discussed. I want to destroy, recreate a disk resource when a VM startup-script changes. I really like the described lifecycle.replace_on_change solution, but I wonder if it it would work for me. My VM already has a reference to the disk, and the disk would grow a replace-on-change reference to the VM's startup-script. Would that cycle be a problem? I can represent the startup script as a separate template-file resource pretty easily, but cycles caused by replace-on-change should either work well or be an error.

The other solution that I thought of looks like so:

resource my-vm { startup-script }
resource my-disk {}

resource null-resource _ {
  triggers {
    startup-script = my-vm.startup-script
  }
  provisioner "taint" {
    resources = ["my-disk"]
  }
}

But I don't know how this would look in tf-plan. You're right that the lifecycle.replace-on-change design leverages some existing patterns nicely.

I frequently run into this problem if I'm using kubernetes provider resources that depend on a module that creates the gke or eks cluster. If a configuration change is made that causes the k8s cluster to be destroyed/recreated, obviously all kubernetes resources are lost.

I am running into this problem as well. I have two resources. One resource depends on the other. If I delete the dependent resource outside of terraform, I need BOTH resources to be recreated but terraform does not know that, it only offers to create the resource that I manually deleted.

Its a chicken and egg issue when an outside force modifies the infrastructure.

Another example: rotating an EC2 keypair that is configured on an Elastic Beanstalk environment should trigger a rebuild of the environment.

resource "aws_elastic_beanstalk_environment" "test"
  ...
  setting {
    namespace = "aws:autoscaling:launchconfiguration"
    name      = "EC2KeyName"
    value     = "${aws_key_pair.test.key_name}"
  }
}

Here's another example with Amazon Lightsail: if you recreate a aws_lightsail_instance you will need to recreate the aws_lightsail_static_ip_attachment between the aws_lightsail_instance and the aws_lightsail_static_ip

resource "aws_lightsail_instance" "instance_1" {
    name = "Instance 1"
    # ...
}

# yes, the below is really all that is needed for the aws_lightsail_static_ip resource
resource "aws_lightsail_static_ip" "instance_1_static_ip" {
    name = "Instance 1 Static IP"
}

resource "aws_lightsail_static_ip_attachment" "instance_1_static_ip_attachment" {
    static_ip_name = "${aws_lightsail_static_ip.instance_1_static_ip.name}"
    instance_name  = "${aws_lightsail_instance.instance_1.name}"
}

In this example, if you run terraform taint aws_lightsail_instance.instance_1 running terraform apply will recreate the aws_lightsail_instance resource, but then the aws_lightsail_static_ip_attachment resource will be automatically detached. You'll have to run terraform apply again to realize it has changed.

Adding another use case related to this request.

I have a custom Provider which defines a "workflow_execution" resource. When created, it triggers an application deployment.
I would like to have the "workflow_execution" created:

  • when there is a change in the resources describing the different components of the application deployment or
  • when "workflow_execution" attribute(s) has changed.

For the first point to be achieved the "workflow_execution" resource creation has to be dependent on a change in another resource, which is currently not supported by Terraform.

Adding another use case related to this request.

I use AVI LB and create GSLB for all the services that we use. Right now the connection between AVI GSLB and the web apps are done through uuid only. when any attribute on the web app changes the uuid gets regenerated and results in out of sync with AVI GSLB.

Need a solution to recreate GSLB everytime there is a change done to web app.

If one needs to recreate an aws_lb_target_group that is currently the target of an aws_lb_listener_rule, the aws_lb_listener_rule needs to first be destroyed before the aws_lb_target_group can be recreated.

Piling on, this would be extremely useful for redeploying APIs via the AWS provider.

e.g., aws_api_gateway_deployment resource handles the deployment of an AWS API Gateway instance. However it must be manually redeployed if _any_ API methods, resources, or integrations change.

A workaround might be setting the stage name of the deployment to the hash of the directory containing the volatile configurations, but the end result would be many stages.

_edit_ - Naturally, it looks like there's already been a few issues created regarding this.

I was running into the same issue with kubernetes as you @jhoblitt . I managed to find a workaround in the fact that (it seems) all kubernetes resources require that the name doesn't change. If you change the name, the resource will be recreated.

So I created a random id that is based on the cluster endpoint and I append that to the name of all my kubernetes resources.

// Generate a random id we can use to recreate k8s resources
resource "random_id" "cluster" {
    keepers = {
        // Normally a new cluster will generate a new endpoint
        endpoint = google_container_cluster.cluster.endpoint
    }
    byte_length = 4
}

resource "kubernetes_deployment" "tool" {
    metadata {
        name = "tool-deployment-${random_id.cluster.hex}"
        labels = {
            App = "tool"
        }
    }

    spec {
    ...
    }
}

It's not ideal (especially for naming services) but it works for me. The only issue I still have is with helm which I use to install traefik. If I add the id to those names creation works fine but on update of the id I get a cyclic dependency problem. Also the change in the name of the service account roles makes helm / tiller not work properly anymore, so I'll probably completely forgo helm and configure traefik manually.

@radeksimko

would you mind providing more concrete example with real resources that would help us reproduce the unexpected behaviour you described?

resource "kubernetes_config_map" "config" {

  data = {
    FOO = "bar"
  }

  metadata {
    name = "config"
  }
}

resource "kubernetes_deployment" "deployment" {

  depends_on = [ kubernetes_config_map.config ]

  metadata {
    name = "deployment"
  }

  spec {
    env_from {
      config_map_ref {
         name = kubernetes_config_map.config.metadata[0].name
      }
    }
  }
}

i want my k8s deployment to get patched every time i terraform apply a config change, for example, changing the env var FOO to baz. that's my use case

If one needs to recreate an aws_lb_target_group that is currently the target of an aws_lb_listener_rule, the aws_lb_listener_rule needs to first be destroyed before the aws_lb_target_group can be recreated.

That's similar to what I'm bumping into and trying to work around right now ... trying to evaluate a solution and a "force_recreate/taint" in lifecycle, or similar, would be incredibly useful right now ...

In my case I have a target group that needs to be recreated, but the listener (no rule involved here) is only getting a "update in place" change ... but then the target group cannot be destroyed because the listener isn't being destroyed ...

For reference for others searching the issue for this in the AWS provider is being tracked in terraform-providers/terraform-provider-aws#10233

I was running into the same issue with the Google Provider and the resource google_compute_resource_policy & google_compute_disk_resource_policy_attachment.

When you create a policy for scheduling the snapshots of a GCE Disk you must attach the policy to the disk. That policy isn't editable so if you perform any changes Terraform has to recreate the resource but doesn't recreate the attachment resource, even if it's "linked" with the _depends_on_ directive of Terraform.

Example of the resources:

resource "google_compute_resource_policy" "snapshot_schedule_wds" {
  name    = "snapshot-weekly-schedule-wds"
  region  = var.subnetwork_region
  project = google_project.mm-sap-prod.name

  snapshot_schedule_policy {
    schedule {
      weekly_schedule {
        day_of_weeks {
          day        = "SATURDAY"
          start_time = "20:00"
        }
      }
    }
    retention_policy {
      max_retention_days    = 366
      on_source_disk_delete = "KEEP_AUTO_SNAPSHOTS"
    }
    snapshot_properties {
      labels = {
        app     = "xxx"
      }
      storage_locations = ["europe-west6"]
      guest_flush       = false
    }
  }
}

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.name
  disk = google_compute_disk.web-dispatch-boot.name
  zone = var.zone
  project = google_project.mm-sap-prod.name

  depends_on = ["google_compute_resource_policy.snapshot_schedule_wds"]
}

Terraform version

Terraform v0.12.13
+ provider.external v1.2.0
+ provider.google v2.20.0
+ provider.google-beta v2.20.0

Any solution for this use case?

@psanzm in this very specific use case, using the google_compute_resource_policy's id field, instead of name, in the google_compute_disk_resource_policy_attachment's name field allows to it work:

resource "google_compute_disk_resource_policy_attachment" "gcp_wds_snap_schedule_pd_boot" {
  name = google_compute_resource_policy.snapshot_schedule_wds.id
...

Note: it works because the actual values of name and id are the same, but the id is unknown upon recreation.

To add another example use case I recently ran into with Azure PostgreSQL. I wanted to upgrade the version of the PostgreSQL engine on the server, which requires replacement. The dependent resources such as firewall rules and Postgres configurations were not re-created. I had to run through two applies. This is a common occurrence in Azure where most IDs are based on the name of the resource, so if it is re-created the ID stays the same and dependent resources don't register the change.

resource "azurerm_postgresql_server" "pgsql_server" {
  name                = "examplepgsql"
  resource_group_name = "my-rg"
  location            = "eastus"

  sku {
    name     = "GP_Gen5_2"
    capacity = "2"
    tier     = "GeneralPurpose"
    family   = "Gen5"
  }

  storage_profile {
    storage_mb            = "51200"
    backup_retention_days = 35
    geo_redundant_backup  = "Enabled"
  }

  administrator_login          = var.admin_username
  administrator_login_password = var.admin_password
  version                      = "11"
  ssl_enforcement              = "Enabled"
}

resource "azurerm_postgresql_firewall_rule" "azure_services_firewall_rule" {
  name                = "AzureServices"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  start_ip_address    = "0.0.0.0"
  end_ip_address      = "0.0.0.0"
}

resource "azurerm_postgresql_configuration" "log_checkpoints_pgsql_config" {
  name                = "log_checkpoints"
  resource_group_name = azurerm_postgresql_server.pgsql_server.resource_group_name
  server_name         = azurerm_postgresql_server.pgsql_server.name
  value               = "on"
}

Another use case :

I wanted to update an SSM parameter with the value of a AMI data block, but only when it changes.

This is for use with an Automation workflow like the example posted in the AWS docs.

My thought was : put in a null_resource that triggers when the AMI ID changes, and make the SSM parameter depend on this, but all null_resource emits is an ID.

Aha, I thought, I'll do this :

data "aws_ami" "windows" {
  most_recent = true
  owners      = ["amazon"]
  filter {
    name   = "name"
    values = ["Windows_Server-2012-R2_RTM-English-64Bit-Base-*"]
  }
}

resource "null_resource" "new_windows_ami" {
  triggers = {
    base_ami_date = data.aws_ami.windows.creation_date
    force_update  = 1
  }
}

resource "aws_ssm_parameter" "current_windows_ami" {
  name  = "/ami/windows/2k12/current"
  value = data.aws_ami.windows.image_id
  type  = "String"

  tags = {
    BaseAmiTriggerId = null_resource.new_windows_ami.id
  }

  depends_on = [
    null_resource.new_windows_ami,
  ]
  # We only want the initial value from the data, we're going to replace this
  # parameter with the current "patched" release until there's a new base AMI
  overwrite = true
  lifecycle {
    ignore_changes = [
      value,
    ]
  }
}

... sadly ignore_change also implies block changes. What I was hoping was that the change to the tag would be enough to trigger an update of the whole resource. ignore_changes means that changes to the inputs of the attributes are ignored for _all_ purposes, not just whether they trigger a lifecycle update.

This seems a shame because otherwise you could implement quite sophisticated lifecycle management with the null resource, concocting triggers with interpolations and such and only triggering an update to a dependent resource when the ID changed as a result.

I came to this thread from https://github.com/terraform-providers/terraform-provider-azurerm/issues/763. I do not know how this is connected, but that issue was closed for the sake of https://github.com/terraform-providers/terraform-provider-azurerm/issues/326, which in turn was closed for the sake of this one.

So, if you guys understand how the connection was made, then here is another scenario and very real. We modify probing path on an azure traffic manager and boom - its endpoints are gone. This is very frustrating. Is there an ETA on the fix for this issue?

@MarkKharitonov This issue is essentially a feature request, what you're describing with Azure sounds like a bug though (but I haven't used Azure or read through those issues) - so perhaps the link is 'sorry, nothing we can do without [this issue resolved], closing'.

I phrased it as a bug in the OP (and I should perhaps edit that) out of misunderstanding, but it's really a request for a form of dependency control that isn't possible (solely) with terraform today.

I do not understand. I have a traffic manager resource. The change does not recreate the resource - it is reported as an in-place replacement. Yet it blows away the endpoints. How come it is a feature request?

@MarkKharitonov As I said, "what you're describing with Azure sounds like a bug", but _this_ issue is a feature request, for something that does not exist in terraform core today.

Possibly the Azure resolution was 'nothing we can do without a way of doing [what is described here]' - I have no idea - but this issue itself isn't a bug, and is labelled 'thinking'. There's no guarantee there'll ever be a way of doing this, nevermind an ETA.

(I don't work for Hashicorp, I just opened this issue, there could be firmer internal plans for all I know, just trying to help.)

I do not know what to do. There are real issues in the provider that are being closed claiming the issues is because of this one. But this one is something apparently huge in scope. So, I do not understand what am I supposed to do. Should I open yet another issue in terraform providers making reference to the already closed ones and to this one? How do we attract attention to the real bug without it being closed for nothing, which has already happened twice?

@MarkKharitonov I'm not expert on Terraform or Terraform provider development so someone else please correct me if I'm wrong but I don't think there's anything that can be done in the provider. The issues in the Azure provider are caused by a limitation of Terraform, not a bug in the AzureRM provider that can be fixed. Based on the comments in this issue, there is a fundamental challenge with how the Azure API works and how Terraform handles dependencies. Azure's API does not have unique IDs for resources. So if you have a child resource that references a parent resource by ID, even if that parent resource is re-created the ID doesn't change. From Terraform's perspective, that means that no attribute was changed on the child-resource since the ID it's referencing is the same, even though in actuality the child resource was also destroyed together with the parent resource. The feature request here, as I understand it, is to add additional intelligence to Terraform dependencies to use them not just for ordering resource creation, but also to detect that a dependency (e.g. parent resource) was destroyed/re-created and trigger a destroy/re-create on the dependent resource (e.g. child resource), irrespective of if any attributes on the child resource have changed.

This issue appears really critical and not a feature request at all. The fundamental core of terraform is to make sure to apply any missing changes if required. In this case not having terraform create dependency on a parent resource recreation is fundamentally an issue.

Could someone clarify if the authorization rules not being created when the event hub associated with is re-created has been present a long time ago. Is there any previous version of azureRM or Terraform that would mitigate the issue until this gets resolved?

Because the only approach that I can see to work around this issue is to invoke twice terraform deployment which to me is a non sense.

Hey !
I have another example of this behaviour. Changes done to modules that force recreation of resources inside the module, used by dashboard, won't update and it will result in dashboard referencing configuration from before-apply. Another apply will actually pickup those changes and alter the dashboard_json template. Weird thing is that changes done to aws_instance.cron will be picked up at the time of the first apply but changes to module will not.

data "template_file" "dashboard_json" {
  template = file("${path.module}/templates/cloudwatch_dashboard/dashboard.tpl")
  vars = {
    rds_instance_id                      = module.database.rds_instance_id
    region                               = var.aws_region
    asg_normal_name                      = module.autoscaling_group.aws_autoscaling_group_name-normal
    cron_instance_id                     = aws_instance.cron.id
    lb_arn_suffix                        = module.load_balancer.aws_lb_arn_suffix
    lb_target_group_arn_suffix           = module.load_balancer.aws_lb_target_group_target_group_arn_suffix
    lb_blackhole_target_group_arn_suffix = module.load_balancer.aws_lb_target_group_target_group_blackhole_arn_suffix
    lb_redash_target_group_arn_suffix    = aws_lb_target_group.redash.arn_suffix
    procstats_cpu                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_cpu.rendered : ""
    procstats_mem                        = (length(var.cron_procstats[local.environment]) > 0) ? data.template_file.dashboard_procstats_mem.rendered : ""
    # force recreation of the dashboard due to weird behaviour when changes to modules above
    # are not picked up by terraform and dashboard is not being updated
    force_recreation = var.force_dashboard_recreation[local.environment] ? "${timestamp()}" : ""
  }
}

resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "${var.project_name}-${local.environment}-dashboard"
  dashboard_body = data.template_file.dashboard_json.rendered
}

I tried using depends_on - maybe the ordering would help with it - but it didn't help I end up using timestamp to force recreation.

We have the exact same problem on GCP which is described in details in this issue https://github.com/terraform-providers/terraform-provider-google/issues/6376.

Here is part of the relevant config:

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

I'm not sure is this a general TF problem or a Google provider problem, but here it goes.
Currently it's not possible to lover the number of google_compute_instance_group that are used in a google_compute_region_backend_service. In the code above if we lower the number of google_compute_instance_group resources and try to apply the configuration, TF will first try to delete the not needed instance groups and then update the backend configuration, but that order doesn't work because you cannot delete an instance group that is used by the backend service, the order should be the other way around.

So to sum it up, when I lower the number of the instance group resources TF does this:

  1. delete surplus google_compute_instance_group -> this fails
  2. update google_compute_region_backend_service

It should do this the other way around:

  1. update google_compute_region_backend_service
  2. delete surplus google_compute_instance_group -> this fails

What I don't understand is why doesn't TF know that it should do the update first, then remove instance groups? When I run destroy, TF does it correctly: first destroys the backend service, then instance groups.

Also this is very hard to fix, because you need to make a temp config change, apply, then set the final config you want and again apply.

@kustodian Can you use create_before_destroy in google_compute_instance_group?

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link

  lifecycle {
    create_before_destroy = true
  }
}

@lorengordon I can, but it doesn't help. TF works exactly the same im my example with or without create_before_destroy = true.

To be honest I'm not entirely sure that my issue is the same thing as what the issue reporter is describing.

@apparentlymart May I suggest locking this issue? I suspect you and the team probably have enough examples and use cases to consider this feature now?

I could 'unsubscribe' of course, it's just that I _would_ like to be notified if/when there's a decision, some progress, or something to help test. Cheers. :slightly_smiling_face:

Edit: It turns out this is really a function of kubernetes, and not really a terraform concern.

Just adding my 0.02. This is also an issue with the kubernetes provider and secrets/config maps. A service using an updated config map or secret doesn't detect the change because the underlying pods of the service need to be restarted or recreated to detect the changes.

resource "kubernetes_secret" "value" {
  metadata {
    name      = "k8s-secret-value"
    namespace = "private"
  }

  data {
    secret = var.secret_value
  }
}

resource "kubernetes_deployment" "service" {
  metadata {
    name      =  "internal-service"
    namespace = "private"
  }
  spec {


    template {


      spec {
        container {


          env {
            name = "SECRET_VALUE"

            value_from {
              secret_key_ref {
                name = kubernetes_secret.value.metadata.0.name
                key  = "secret"
              }
            }
          }
        }
      }
    }
  }
}

If the value for the secret key is updated, nothing seems to happen with the deployment.

I'm going to lock this issue for the time being, because the remaining discussion seems largely to be people supporting each other in workarounds.

I鈥檓 happy to see people are helping each other work around this, and I've created a thread for this on the community forum so that people can continue these discussions without creating excess noise for people who just want succinct updates in GitHub.

Was this page helpful?
0 / 5 - 0 ratings