0.11.7
google_sql_database
resource "google_sql_database" “some_db” {
name = “some_db”
instance = "${google_sql_database_instance.master.name}"
charset = "UTF8"
collation = "en_US.UTF8"
project = "${var.gcp_project}"
}
This error is completely random and very difficult to get logs for.
The db should have been created
apply fails
Error: Error applying plan:
1 error(s) occurred:
google_sql_database.some_db: 1 error(s) occurred:
google_sql_database.some_db: Error, failure waiting for insertion of some_db into some_db_instance:
terraform applyI've tried a couple of techniques to get this to work consistently.
1) Make each db dependant on the last to ensure only 1 runs at a time
2) Set parallelize to 1 on the apply
Hey @sereeth, sorry to hear about this annoying issue. Without logs, though, there's nothing we can do on our side to know what's going on. Is that the full error message, with nothing after the colon?
Hey @danawillow , i'll try and reproduce today with debugging on, i'm guessing this is a google api issue icky..
@danawillow here is the debug output
https://gist.github.com/sereeth/d53480a98c34c936d1d75ceb53ddc555
Oh ok, that's a different error message than the original one but sounds like we need to just add some retry logic to the database resource.
We're getting similar issues with the database instance too, fyi. https://github.com/terraform-providers/terraform-provider-google/issues/2083
Hi, i'm getting the same error with:
moving from #1283:
provider.google version: 1.18
We have a 5 GCP Postgres DB, and the likelihood of failure increases as it randomly 503 a different each time.
* module.jinx-production.google_sql_database.jinx: 1 error(s) occurred:
* module.jinx-production.google_sql_database.jinx: google_sql_database.jinx: Error reading SQL Database "jinx" in instance "jinx": googleapi: Error 503: Service temporarily unavailable., serverException
* module.julius-production.google_sql_database.julius: 1 error(s) occurred:
* module.julius-production.google_sql_database.julius: google_sql_database.julius: Error reading SQL Database "julius" in instance "julius": googleapi: Error 503: Service temporarily unavailable., serverException
Can retry with a exponential backoff period as it always looks to be intermittent with different DB instances. We'd rather wait a few extra minutes than wait for an entire terraform plan to run again. Thanks
Hello, i managed to work around this issue when creating instance/database/user by adding a local-exec provisioner to sleep for 60 seconds after creation of the instance and before creating database/user.
resource "google_sql_database_instance" "master_db_instance" {
project = "${var.general["project"]}"
...
settings {
...
}
provisioner "local-exec" {
command = "sleep 60"
}
}
However, and its probably unrelated to the former workaround, i encounter errors when destroying resources too ... Errors won't go away even after multiple run of terraform destroy, and leaves me with a crippled tfstate ...
1 error(s) occurred:
* module.webapp-cloudsql.google_sql_user.user (destroy): 1 error(s) occurred:
* google_sql_user.user: Error, failure waiting for deletion of <database> in testing-webapp-cloudsql-341d7c:
Anyone having an idea to work around this ? Because otherwise using cloudsql in terraform is pretty much impossible for us atm :(:(:(
we're suffering the same error googleapi: Error 503: Service temporarily unavailable., serverException on almost every plan run. There are no changes on our infra setup, in fact the database hasn't been touched in a while; the problem used to be very infrequent before, now is almost blocking deployments (if we insist on running the plan multiple times we may get lucky once in a while).
Just for the record, we tried using v1.17.1 and 1.18 of the provider with very similar results.
we are experiencing the same issue starting about 2018-10-02 14:00 UTC-7. Google Cloud SQL has consistently responded with googleapi: Error 503: Service temporarily unavailable., serverException across random different instance. Sometimes it would be 1 instance, other times 5 instances.
We have not been able to get a successful terraform plan in the last 20hrs with consistent retrying at different times of the day/night.
We've also been running into this issue non-stop for the past 2 days on existing/old google_sql_database resources.
The only work around is to add --parallelism=1 or -target plan/apply on non-sql resources.
This seems to be a major issue for many of us.
Hey all, if you're coming here to report that this is happening to you too, please provide debug logs. This will help us know which requests to GCP are returning this error.
Here is the extract from the debug logs, this is during the state refresh phase:
Request
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 2018/10/04 17:37:18 [DEBUG] Google API Request Details:
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: ---[ REQUEST ]---------------------------------------
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: GET /sql/v1beta4/projects/XXXXXXX/instances/YYYYYYYYY/databases/ZZZZZZ_backend?alt=json HTTP/1.1
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Host: www.googleapis.com
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: User-Agent: google-api-go-client/0.5 Terraform/0.11.7 (+https://www.terraform.io)
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Accept-Encoding: gzip
2018-10-04T17:37:18.447Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:
Response
2018-10-04T17:37:21.820Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: 2018/10/04 17:37:21 [DEBUG] Google API Response Details:
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: ---[ RESPONSE ]--------------------------------------
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: HTTP/2.0 503 Service Unavailable
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Cache-Control: private, max-age=0
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Content-Type: application/json; charset=UTF-8
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Date: Thu, 04 Oct 2018 17:37:21 GMT
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Expires: Thu, 04 Oct 2018 17:37:21 GMT
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Server: GSE
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Vary: Origin
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: Vary: X-Origin
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Content-Type-Options: nosniff
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Frame-Options: SAMEORIGIN
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: X-Xss-Protection: 1; mode=block
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "error": {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "errors": [
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: {
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "domain": "global",
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "reason": "serverException",
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "message": "Service temporarily unavailable."
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: ],
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "code": 503,
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: "message": "Service temporarily unavailable."
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: }
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4:
2018-10-04T17:37:21.924Z [DEBUG] plugin.terraform-provider-google_v1.18.0_x4: -----------------------------------------------------
2018/10/04 17:37:21 [ERROR] root: eval: *terraform.EvalRefresh, err: google_sql_database.ZZZZZ_backend: Error reading SQL Database "ZZZZZ_backend" in instance "YYYYYYYY": googleapi: Error 503: Service temporarily unavailable., serverException
2018/10/04 17:37:21 [ERROR] root: eval: *terraform.EvalSequence, err: google_sql_database.ZZZZZ_backend: Error reading SQL Database "ZZZZZ_backend" in instance "YYYYYYYY": googleapi: Error 503: Service temporarily unavailable., serverException
2018/10/04 17:37:21 [TRACE] [walkRefresh] Exiting eval tree: google_sql_database.ZZZZZ_backend
This is just one occurrence of at least 3 that happened during this plan run.
Thanks- I reached out to the team internally and they're going to look into it. In the meantime, I'm preparing a PR that'll add retries in more places.
@danawillow thanks for tackling this out! I would expect this change be included in a minor release, is there any ETA for it?
On a separate note, the root cause of this seems to be related to some instability/flakiness in the API resource which the TF resource tries to GET from in order to refresh the state, although this has happened also for different TF resources related with CloudSQL service. Is there any updates on regard of this? Maybe is hitting some sort of quota limit per IP or something else, but in any case the message could be a little bit more descriptive than Service temporarily unavailable.
Yeah just wondering what release this will be in as we hit this multiple times per day during plan phases even when we're not changing sql resources.
This was released in 1.19.0. If you're still seeing the problem, I'd love to see debug logs to see how long it ends up actually retrying for.
Ahh we were a version behind, and presumed this wasn't release as this issue is still open!
will bump the version now and capture debug logs if it reappears, cheers @danawillow
Ah ok, yeah. I think my plan was to wait to close the issue until I either heard that the retries fixed it or got word back from the SQL team that they fixed the underlying cause. The PR itself is merged; that's what actually makes it into the release.
Hi, before closing let me do some more tests tomorrow please. I did the upgrade a few days ago and tested very quickly as i was busy with something else. Can remember exactly but i think i had less issues, but it was still present.
So let me check again tomorrow and come back to you with the results.
Yip makes total sense. Will report back in a day or two and let you know if
it's resolved for us with that PR
On Tue, 30 Oct 2018, 6:38 pm Dana Hoffman, notifications@github.com wrote:
Ah ok, yeah. I think my plan was to wait to close the issue until I either
heard that the retries fixed it or got word back from the SQL team that
they fixed the underlying cause. The PR itself is merged; that's what
actually makes it into the release.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/terraform-providers/terraform-provider-google/issues/2055#issuecomment-434417653,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABaviT4YzHTt0iCd7j_y_DY3B_nXw9R1ks5uqJykgaJpZM4WpURE
.
Evening, so i had time to do some more tests this week and after all wasn't able to reproduce, so it seems to be fixed, thanks ;)
Great! Closing.
I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!
Most helpful comment
Thanks- I reached out to the team internally and they're going to look into it. In the meantime, I'm preparing a PR that'll add retries in more places.