Terraform-provider-google: Replica Cloud SQL instances fail to be created

Created on 2 Mar 2018  ·  12Comments  ·  Source: hashicorp/terraform-provider-google

I am running into this issue reliably. We are provisioning multiple replicas and a failover for a master database. At least one (usually more) of the replicas / failover fail to get created. They end up in a failed state in the GCP console and cannot be removed:

image

It seems to be a timing issue or not waiting for the master to be actually ready because we can workaround it by creating the master, waiting a couple minutes, and then creating the replicas. For this example the workaround looks like:

terraform apply -target=google_sql_database_instance.master
# wait a couple minutes
terraform apply -target=google_sql_database_instance.failover
terraform apply -target=google_sql_database_instance.replica

Terraform Version

$ terraform -v
Terraform v0.11.3
+ provider.google v1.6.0
+ provider.random v1.1.0
+ provider.zerotier (unversioned)

Affected Resource(s)

  • google_sql_database_instance

Terraform Configuration Files

locals {
  region  = "us-east1"
  project = "staging-af5b7922"
}

provider "google" {
  version = "~> 1.6.0"
  region  = "us-east1"
  project = "${local.project}"
}

provider "random" {
  version = "~> 1.1.0"
}

resource "random_id" "database" {
  byte_length = 4
  prefix      = "database-"
}

resource "google_sql_database_instance" "master" {
  name             = "${random_id.database.hex}"
  database_version = "MYSQL_5_7"
  region           = "${local.region}"

  settings {
    tier             = "db-n1-standard-2"
    disk_size        = 20
    replication_type = "SYNCHRONOUS"

    backup_configuration {
      binary_log_enabled = true
      enabled            = true
    }
  }
}

resource "google_sql_database_instance" "failover" {
  name                 = "${random_id.database.hex}-failover"
  database_version     = "MYSQL_5_7"
  master_instance_name = "${google_sql_database_instance.master.name}"
  region               = "${local.region}"

  settings {
    tier                   = "db-n1-standard-2"
    replication_type       = "SYNCHRONOUS"
    crash_safe_replication = true
    disk_size              = 20
  }

  replica_configuration {
    failover_target = true
  }
}

resource "google_sql_database_instance" "replica" {
  name                 = "${random_id.database.hex}-replica-${count.index}"
  database_version     = "MYSQL_5_7"
  master_instance_name = "${google_sql_database_instance.master.name}"
  region               = "${local.region}"
  count                = 1

  settings {
    tier                   = "db-n1-standard-2"
    replication_type       = "SYNCHRONOUS"
    crash_safe_replication = true
    disk_size              = 100
  }
}

Debug Output

https://gist.github.com/andyshinn/93bd82100be2a77c080e94a64a111bf6

Panic Output

No panic.

Expected Behavior

Replica databases created without error.

Actual Behavior

Multiple replica Cloud SQL instances usually result in at least 1 failing and not being able to be re-created.

Steps to Reproduce

  1. terraform apply

Important Factoids

Nothing out of the ordinary. Standard GCP project created in Terraform.

References

This is possibly the same as #1069 and #1083. But they are both closed and I'm not quite sure so I am opening as a new issue. But if this is the same then I'm happy to continue the conversation in one of them and close this.

upstream

Most helpful comment

Wanted to let you all know they're still working it internally.

All 12 comments

I've seen something like this before, yeah. I think this will be a tricky one, I'll try to dig in.

👍 Thanks for the quick response. Let me know if there is any ways I can assist.

Thanks. I've got a consistent minimal repro and I hope that'll help ... but it may take a little while.

Good news! An internal bug is open and there are people who work on the Cloud SQL systems working on root-causing it. :) I'll tag this "upstream", and keep it updated.

Wanted to let you all know they're still working it internally.

any updates on this @ndmckinley ?

There's movement and people are working it, but it's still going to be a while - this problem is not unique to terraform, it's a general "creating multiple replicas at once" issue, which requires a general fix.

Been a while, just wanted to reach out and see if there were any updates.

Unfortunately, the upstream bug is not fixed yet. We did submit a fix a while ago which should cause terraform to retry this when it happens. I'll mark this as closed, and if people see the issue again, they can comment here, or else open a new issue referring to this one.

Do you recall which version Terraform + Google Provider has the fix?

Yeah, this was submitted in https://github.com/terraform-providers/terraform-provider-google/blob/master/CHANGELOG.md#1190-october-08-2018, under bugfixes. So if you're still seeing the issue in 1.19, let us know!

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings