Terraform-provider-google: Cannot delete instance group because it's being used by a backend service

Created on 14 May 2020 · 11Comments · Source: hashicorp/terraform-provider-google

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
Please do not leave _+1_ or _me too_ comments, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.12.24

provider.google v3.21.0
provider.google-beta v3.21.0

Affected Resource(s)

google_compute_region_backend_service
google_compute_instance_group

Terraform Configuration Files

locals {
  project         = "<project-id>"
  network         = "<vpc-name>"
  network_project = "<vpc-project>"
  zones           = ["europe-west1-b", "europe-west1-c", "europe-west1-d"]
  s1_count        = 3
}

provider "google" {
  project = local.project
  version = "~> 3.0"
}

data "google_compute_network" "network" {
  name    = local.network
  project = local.network_project
}

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

I'm not sure is this a general TF problem or a Google provider problem, but here it goes.
Currently it's not possible to lover the number of google_compute_instance_group that are used in a google_compute_region_backend_service. In the code above if we lower the number of google_compute_instance_group resources and try to apply the configuration, TF will first try to delete the not needed instance groups and then update the backend configuration, but that order doesn't work because you cannot delete an instance group that is used by the backend service, the order should be the other way around.

So to sum it up, when I lower the number of the instance group resources TF does this:

delete surplus google_compute_instance_group -> this fails
update google_compute_region_backend_service

It should do this the other way around:

update google_compute_region_backend_service
delete surplus google_compute_instance_group -> this fails

Here is the output it generates:

google_compute_instance_group.s1[2]: Destroying... [id=projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03]

Error: Error deleting InstanceGroup: googleapi: Error 400: The instance_group resource 'projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03' is already being used by 'projects/<project-id>/regions/europe-west1/backendServices/s1', resourceInUseByAnotherResource

Expected Behavior

TF should first update the google_compute_region_backend_service, then delete the instance group.

Actual Behavior

TF tried to delete the instance group first, which resulted in an error.

Steps to Reproduce

terraform apply
Set s1_count = 2
terraform apply

Important Factoids

It's not a simple task to fix this. One "workaround" is to change the dynamic for_each to have a slice() function like this:

  dynamic "backend" {
    for_each = slice(google_compute_instance_group.s1, 0, 2)
    content {
      group = backend.value.self_link
    }
  }

So you first set the second number of slice() to the new number of the instanca groups run apply, then lower the s1_count to that same number and run apply again, but that's just to complicated for a simple task like this.

persistent-bug sizM

Source

kustodian

👍12

Most helpful comment

@pdecat that should work, and requires implementing a new fine-grained resource google_compute_region_backend_service_backend.

Reopening the issue since a solution is possible, and this will be tracked similarly to other feature-requests.

c2thorn on 27 May 2020

👍3

All 11 comments

Unfortunately, this is an upstream Terraform issue. The provider doesn't have access to the update/destroy order. This is a similar to the scenario outlined here: https://github.com/terraform-providers/terraform-provider-google/issues/3008
I believe multiple apply's is the only way to go for this case.

c2thorn on 19 May 2020

Multiple apply doesn't fix the issue here. You have to change the config,
apply, than change again, apply.

On Tue, May 19, 2020, 20:05 Cameron Thornton notifications@github.com
wrote:

Closed #6376
https://github.com/terraform-providers/terraform-provider-google/issues/6376
.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/terraform-providers/terraform-provider-google/issues/6376#event-3353091107,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAO34LNRT7KQV3MSZZTZ6CLRSLC7BANCNFSM4NAMXOJA
.

kustodian on 19 May 2020

Sorry, that's what I meant. We don't have access to enable a solution for just one apply.

c2thorn on 19 May 2020

Hi, here's a somewhat work-around for this specific use-case using an intermediate datasource (needs two applies):

provider google {
  version = "3.22.0"
  region  = "europe-west1"
  project = "myproject"
}

locals {
  #zones = []
  zones = ["europe-west1-b"]
}

data "google_compute_network" "network" {
  name = "default"
}

data "google_compute_instance_group" "s1" {
  for_each = toset(local.zones)
  name     = format("s1-%s", each.key)
  zone     = each.key
}

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = [for group in data.google_compute_instance_group.s1 : group.self_link if group.self_link != null]
    content {
      group = backend.value
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  for_each = toset(local.zones)
  name     = format("s1-%s", each.key)
  zone     = each.key
  network  = data.google_compute_network.network.self_link
}

pdecat on 20 May 2020

@pdecat your suggestion removes the dependency between google_compute_region_backend_service and google_compute_instance_group so this will probably always require two applies, even when starting from scratch.

kustodian on 20 May 2020

so this will probably always require two applies, even when starting from scratch.

I can confirm it does.

But at least it does not need manual intervention out of band to fix the situation.

pdecat on 20 May 2020

Maybe something the google provider could do to fix this situation would be to manage backends of a google_compute_region_backend_service as a separate resource:

# NOT A WORKING EXAMPLE
locals {
  project         = "<project-id>"
  network         = "<vpc-name>"
  network_project = "<vpc-project>"
  zones           = ["europe-west1-b", "europe-west1-c", "europe-west1-d"]
  s1_count        = 3
}

provider "google" {
  project = local.project
  version = "~> 3.0"
}

data "google_compute_network" "network" {
  name    = local.network
  project = local.network_project
}

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

# WARNING: this resource type does not exist
resource "google_compute_region_backend_service_backend" "s1" {
  for_each = google_compute_instance_group.s1

  backend_service = google_compute_region_backend_service.s1.self_link
  group = backend.value.self_link
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

As a side note, I feel like https://github.com/hashicorp/terraform/issues/8099 is not really about the same issue. It is about replacing or updating a resource when another resource it depends on changes (and not being destroyed).

pdecat on 20 May 2020

I added a comment on the Terraform core issue (https://github.com/hashicorp/terraform/issues/25010#issuecomment-634228336)

Based on that comment (terraform taint up the dependency chain until a single-pass apply works), I _think_ there's a provider-specific fix.

If ForceNew was part of the schema here ...

https://github.com/terraform-providers/terraform-provider-google/blob/c87e414b028becc33f64183a9bd52c92c9b49737/google/resource_compute_region_backend_service.go#L173-L179

... wouldn't that have the same effect as my manual terraform taint?

StephenWithPH on 26 May 2020

@pdecat that should work, and requires implementing a new fine-grained resource google_compute_region_backend_service_backend.

Reopening the issue since a solution is possible, and this will be tracked similarly to other feature-requests.

c2thorn on 27 May 2020

👍3

@StephenWithPH ForceNew would have the same effect, but make every change (addition as well as removal) to the backend set destructive. Providing a new fine-grained resource is the cleaner option here.

c2thorn on 27 May 2020

🎉2

lack of pretty essential features and bugs like this makes me very disappointed with all the terraform and GCP