Terraform-provider-google: google_sql_database randomly errors with failure waiting for insertion

Created on 14 Sep 2018 · 25Comments · Source: hashicorp/terraform-provider-google

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform version

0.11.7

Terraform resources affected

google_sql_database

Terraform Configuration Files

resource "google_sql_database" “some_db” {
   name      = “some_db”
  instance  = "${google_sql_database_instance.master.name}"
  charset   = "UTF8"
  collation = "en_US.UTF8"
  project   = "${var.gcp_project}"
}

Debug Output

This error is completely random and very difficult to get logs for.

Panic Output

Expected Behavior

The db should have been created

Actual Behavior

apply fails

Error: Error applying plan:

1 error(s) occurred:

google_sql_database.some_db: 1 error(s) occurred:
google_sql_database.some_db: Error, failure waiting for insertion of some_db into some_db_instance:

Steps to Reproduce

terraform apply

Important Factoids

References

#0000

bug

Source

sereeth

👍23

Most helpful comment

Thanks- I reached out to the team internally and they're going to look into it. In the meantime, I'm preparing a PR that'll add retries in more places.

danawillow on 4 Oct 2018

👍3

All 25 comments

I've tried a couple of techniques to get this to work consistently.

1) Make each db dependant on the last to ensure only 1 runs at a time
2) Set parallelize to 1 on the apply

sereeth on 14 Sep 2018

Hey @sereeth, sorry to hear about this annoying issue. Without logs, though, there's nothing we can do on our side to know what's going on. Is that the full error message, with nothing after the colon?

danawillow on 14 Sep 2018

Hey @danawillow , i'll try and reproduce today with debugging on, i'm guessing this is a google api issue icky..

sereeth on 14 Sep 2018

@danawillow here is the debug output

https://gist.github.com/sereeth/d53480a98c34c936d1d75ceb53ddc555

sereeth on 20 Sep 2018

Oh ok, that's a different error message than the original one but sounds like we need to just add some retry logic to the database resource.

danawillow on 20 Sep 2018

We're getting similar issues with the database instance too, fyi. https://github.com/terraform-providers/terraform-provider-google/issues/2083

sereeth on 20 Sep 2018

Hi, i'm getting the same error with:

Terraform version 0.11.8
provider.google version 1.18.0

pdemagny on 26 Sep 2018

moving from #1283:

provider.google version: 1.18

We have a 5 GCP Postgres DB, and the likelihood of failure increases as it randomly 503 a different each time.

* module.jinx-production.google_sql_database.jinx: 1 error(s) occurred:

* module.jinx-production.google_sql_database.jinx: google_sql_database.jinx: Error reading SQL Database "jinx" in instance "jinx": googleapi: Error 503: Service temporarily unavailable., serverException

* module.julius-production.google_sql_database.julius: 1 error(s) occurred:

* module.julius-production.google_sql_database.julius: google_sql_database.julius: Error reading SQL Database "julius" in instance "julius": googleapi: Error 503: Service temporarily unavailable., serverException

Can retry with a exponential backoff period as it always looks to be intermittent with different DB instances. We'd rather wait a few extra minutes than wait for an entire terraform plan to run again. Thanks

francisd on 26 Sep 2018

Hello, i managed to work around this issue when creating instance/database/user by adding a local-exec provisioner to sleep for 60 seconds after creation of the instance and before creating database/user.

resource "google_sql_database_instance" "master_db_instance" {
  project          = "${var.general["project"]}"
  ...

  settings {
    ...
  }

  provisioner "local-exec" {
    command = "sleep 60"
  }
}

However, and its probably unrelated to the former workaround, i encounter errors when destroying resources too ... Errors won't go away even after multiple run of terraform destroy, and leaves me with a crippled tfstate ...

1 error(s) occurred:

* module.webapp-cloudsql.google_sql_user.user (destroy): 1 error(s) occurred:

* google_sql_user.user: Error, failure waiting for deletion of <database> in testing-webapp-cloudsql-341d7c:

Anyone having an idea to work around this ? Because otherwise using cloudsql in terraform is pretty much impossible for us atm :(:(:(

pdemagny on 30 Sep 2018

we're suffering the same error googleapi: Error 503: Service temporarily unavailable., serverException on almost every plan run. There are no changes on our infra setup, in fact the database hasn't been touched in a while; the problem used to be very infrequent before, now is almost blocking deployments (if we insist on running the plan multiple times we may get lucky once in a while).

Just for the record, we tried using v1.17.1 and 1.18 of the provider with very similar results.

prodriguezdefino on 3 Oct 2018

👍3

we are experiencing the same issue starting about 2018-10-02 14:00 UTC-7. Google Cloud SQL has consistently responded with googleapi: Error 503: Service temporarily unavailable., serverException across random different instance. Sometimes it would be 1 instance, other times 5 instances.

We have not been able to get a successful terraform plan in the last 20hrs with consistent retrying at different times of the day/night.

francisd on 3 Oct 2018

👍3

We've also been running into this issue non-stop for the past 2 days on existing/old google_sql_database resources.

The only work around is to add --parallelism=1 or -target plan/apply on non-sql resources.

This seems to be a major issue for many of us.

bluemalkin on 4 Oct 2018

👍2

Hey all, if you're coming here to report that this is happening to you too, please provide debug logs. This will help us know which requests to GCP are returning this error.

danawillow on 4 Oct 2018

Here is the extract from the debug logs, this is during the state refresh phase:

Request

2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 2018/10/04 17:37:18 [DEBUG] Google API Request Details:
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: ---[ REQUEST ]---------------------------------------
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: GET /sql/v1beta4/projects/XXXXXXX/instances/YYYYYYYYY/databases/ZZZZZZ_backend?alt=json HTTP/1.1
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Host: www.googleapis.com
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: User-Agent: google-api-go-client/0.5 Terraform/0.11.7 (+https://www.terraform.io)
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Accept-Encoding: gzip
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:

Response

2018-10-04T17:37:21.820Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 2018/10/04 17:37:21 [DEBUG] Google API Response Details:
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: ---[ RESPONSE ]--------------------------------------
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: HTTP/2.0 503 Service Unavailable
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Cache-Control: private, max-age=0
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Content-Type: application/json; charset=UTF-8
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Date: Thu, 04 Oct 2018 17:37:21 GMT
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Expires: Thu, 04 Oct 2018 17:37:21 GMT
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Server: GSE
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Vary: Origin
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Vary: X-Origin
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Content-Type-Options: nosniff
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Frame-Options: SAMEORIGIN
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Xss-Protection: 1; mode=block
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:  "error": {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:   "errors": [
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:    {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:     "domain": "global",
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:     "reason": "serverException",
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:     "message": "Service temporarily unavailable."
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:    }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:   ],
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:   "code": 503,
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:   "message": "Service temporarily unavailable."
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:  }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018/10/04 17:37:21 [ERROR] root: eval: *terraform.EvalRefresh, err: google_sql_database.ZZZZZ_backend: Error reading SQL Database "ZZZZZ_backend" in instance "YYYYYYYY": googleapi: Error 503: Service temporarily unavailable., serverException
2018/10/04 17:37:21 [ERROR] root: eval: *terraform.EvalSequence, err: google_sql_database.ZZZZZ_backend: Error reading SQL Database "ZZZZZ_backend" in instance "YYYYYYYY": googleapi: Error 503: Service temporarily unavailable., serverException
2018/10/04 17:37:21 [TRACE] [walkRefresh] Exiting eval tree: google_sql_database.ZZZZZ_backend

This is just one occurrence of at least 3 that happened during this plan run.

prodriguezdefino on 4 Oct 2018

Thanks- I reached out to the team internally and they're going to look into it. In the meantime, I'm preparing a PR that'll add retries in more places.

danawillow on 4 Oct 2018

👍3

@danawillow thanks for tackling this out! I would expect this change be included in a minor release, is there any ETA for it?

On a separate note, the root cause of this seems to be related to some instability/flakiness in the API resource which the TF resource tries to GET from in order to refresh the state, although this has happened also for different TF resources related with CloudSQL service. Is there any updates on regard of this? Maybe is hitting some sort of quota limit per IP or something else, but in any case the message could be a little bit more descriptive than Service temporarily unavailable.

prodriguezdefino on 5 Oct 2018

👍1

Yeah just wondering what release this will be in as we hit this multiple times per day during plan phases even when we're not changing sql resources.

Stono on 30 Oct 2018

This was released in 1.19.0. If you're still seeing the problem, I'd love to see debug logs to see how long it ends up actually retrying for.

danawillow on 30 Oct 2018

Ahh we were a version behind, and presumed this wasn't release as this issue is still open!

will bump the version now and capture debug logs if it reappears, cheers @danawillow

Stono on 30 Oct 2018

Ah ok, yeah. I think my plan was to wait to close the issue until I either heard that the retries fixed it or got word back from the SQL team that they fixed the underlying cause. The PR itself is merged; that's what actually makes it into the release.

danawillow on 30 Oct 2018

Hi, before closing let me do some more tests tomorrow please. I did the upgrade a few days ago and tested very quickly as i was busy with something else. Can remember exactly but i think i had less issues, but it was still present.
So let me check again tomorrow and come back to you with the results.

pdemagny on 30 Oct 2018

Yip makes total sense. Will report back in a day or two and let you know if
it's resolved for us with that PR

On Tue, 30 Oct 2018, 6:38 pm Dana Hoffman, notifications@github.com wrote:

Ah ok, yeah. I think my plan was to wait to close the issue until I either
heard that the retries fixed it or got word back from the SQL team that
they fixed the underlying cause. The PR itself is merged; that's what
actually makes it into the release.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/terraform-providers/terraform-provider-google/issues/2055#issuecomment-434417653,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABaviT4YzHTt0iCd7j_y_DY3B_nXw9R1ks5uqJykgaJpZM4WpURE
.

Stono on 30 Oct 2018

Evening, so i had time to do some more tests this week and after all wasn't able to reproduce, so it seems to be fixed, thanks ;)

pdemagny on 3 Nov 2018

Great! Closing.

danawillow on 5 Nov 2018

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

hashibot[bot] on 29 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Argument Not Expected - Master Authorised Networks CIDR Blocks

Evesy · 3Comments

Computed properties always show as changed and forces new resource

hashibot[bot] · 3Comments

Add support for custom project and organization roles

pdecat · 3Comments

Add iam_configuration.bucket_policy_only to google_storage_bucket

sho-abe · 3Comments

google_storage_bucket_acl invalid value

Evesy · 3Comments