Terraform-provider-google: Potential Race Condition When Apply Google Project IAM Policy

Created on 13 Jun 2017  ·  15Comments  ·  Source: hashicorp/terraform-provider-google

_This issue was originally opened by @zachgersh as hashicorp/terraform#12291. It was migrated here as part of the provider split. The original body of the issue is below._


Terraform Version

0.85+

Affected Resource(s)

  • google_projec_iam_policy

Terraform Configuration Files

None provided. Totally can work up an example if needed.

Expected Behavior

I should be able to remove/add members to the policy

Actual Behavior

I get random permission denied errors due to the fact that the policy has become out of date (a member was added or removed before the policy was destroyed).

Spoke to @evandbrown a bit about this and I am unsure if restoring a saved copy of the previous policy that is stored in your tfstate file is the right way to go. It would seem that anyone with multiple environments / policy changes in the same project would end up hitting this problem.

Steps to Reproduce

  1. create two different policies in the same GCP project
  2. start to teardown one and then begin tearing down the other
  3. watch it fail

cc @paddyforan as well whose been looking over a bunch of my GCP changes.

bug

Most helpful comment

The three google_project_iam_* resources all control different levels of the same thing.

  • google_project_iam_policy controls the entire project. It ensures that no accounts are set on any roles except the ones it defines.
  • google_project_iam_binding controls a specific role within the policy. It ensures that no accounts except the ones it defines are set on _only the one role it controls_. Two or more google_project_iam_bindings can be used for the same project, but only if they're controlling different roles.
  • google_project_iam_member controls a specific member on a specific role in the policy. It ensures that the member is part of the role, but doesn't care if other members are on that role or not.

If you use google_project_iam_member or google_project_iam_binding, you shouldn't use google_project_iam_policy.

If you use google_project_iam_member, you shouldn't use a google_project_iam_binding for the same role, though other google_project_iam_members on the same role are fine. And google_project_iam_bindings on _different_ roles are fine.

If you use google_project_iam_binding, you shouldn't use a google_project_iam_binding or a google_project_iam_member on the same role.

All 15 comments

Here is more info from the original issue (https://github.com/hashicorp/terraform/issues/12291#issuecomment-292035883):

@kkallday and I have been able to reproduce this issue:

template 1: https://gist.github.com/kkallday/5d9127f37b33ab660869a3f4011492ee#file-template-1-tf
template 2: https://gist.github.com/kkallday/5d9127f37b33ab660869a3f4011492ee#file-template-2-tf

  1. terraform apply -state=some-dir with template 1
  2. terraform apply -state=some-OTHER-dir with template 2
  3. terraform destroy -state=some-dir
  4. terraform destroy -state=some-OTHER-dir and you will receive the following error:
google_service_account.opsman_service_account: Refreshing state... (ID: projects/...ount.com)
data.google_iam_policy.opsman_policy: Refreshing state...
google_project_iam_policy.opsman_iam_policy: Refreshing state... (ID: YOUR-PROJECT-ID)
google_project_iam_policy.opsman_iam_policy: Destroying... (ID: YOUR-PROJECT-ID)
Error applying plan:

1 error(s) occurred:

* google_project_iam_policy.opsman_iam_policy (destroy): 1 error(s) occurred:

* google_project_iam_policy.opsman_iam_policy: Error applying IAM policy to project: Error applying IAM policy for project "YOUR-PROJECT-ID". Policy is &cloudresourcemanager.Policy{Bindings:[]*cloudresourcemanager.Binding{(*cloudresourcemanager.Binding)(0xc42097b080), (*cloudresourcemanager.Binding)(0xc42097b0e0), (*cloudresourcemanager.Binding)(0xc42097b140), (*cloudresourcemanager.Binding)(0xc42097b1a0), (*cloudresourcemanager.Binding)(0xc42097b260), (*cloudresourcemanager.Binding)(0xc42097b2c0), (*cloudresourcemanager.Binding)(0xc42097b380)}, Etag:"BwVMctIFq+g=", Version:1, ServerResponse:googleapi.ServerResponse{HTTPStatusCode:200, Header:http.Header{"X-Content-Type-Options":[]string{"nosniff"}, "Alt-Svc":[]string{"quic=\":443\"; ma=2592000; v=\"37,36,35\""}, "Server":[]string{"ESF"}, "X-Xss-Protection":[]string{"1; mode=block"}, "Cache-Control":[]string{"private"}, "X-Frame-Options":[]string{"SAMEORIGIN"}, "Content-Type":[]string{"application/json; charset=UTF-8"}, "Vary":[]string{"Origin", "X-Origin", "Referer"}, "Date":[]string{"SOME-TIME-STAMP"}}}, ForceSendFields:[]string(nil), NullFields:[]string(nil)}, error is googleapi: Error 400: Request contains an invalid argument., badRequest

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

This error is happening because when terraform apply runs in step 1 it takes a "snapshot" of the IAM policies for the project. When terraform destroy runs in step 3 it "restores" the snapshot of the IAM policies that were applied before step 1. This destroys the IAM policies that step 4 attempts to restore (which would be the ones it saw at step 2). We believe that Google's API rejects this with a 400 invalid argument because the policy that step 4 sent has "outdated" data.

We suspect this line [1] is causing the issue. It makes a request to the SetIamPolicies endpoint with bindings from the TF state. Instead it should (1) make a request to GetIamPolicies, subtract the policy bindings that are in the first run's TF state, then send this payload to SetIamPolicies.

  1. https://github.com/hashicorp/terraform/blob/4f235c870d4db2e88e7732174aef790c8c836ba8/builtin/providers/google/resource_google_project_iam_policy.go#L225

cc @davewalter @rainmaker @utako @rizwanreza @acrmp @joshzarrabi @ryanmoran

This should be addressed by #171, which was just merged and should be in the next release of the GCP provider.

Sounds good @paddycarver. We really appreciate your help on this issue - our team has been working off a fork with a temporary fix so it will be great to get off that.

Awesome! Really looking forward to getting people moved off forks and back onto the upstream. Sorry for the delay in implementing this.

@paddycarver Hi Paddy.

We tried to move the fork off onto the latest terraform and to use the resources you mention in your PR (the binding and member). However, this actually fails absolutely as oppose to in a race condition.

When we create an environment A, we create an IAM user A.
When we create an environment B, we create an IAM user B.
This B user replaces A, such that environment A no longer has any permissions.

For instance, a role storage admin can only belong to one member at a time. If we want more than one IAM user with that role at a given time, we cannot.

Could we discuss with you an alternate solution to our above issue?

The following configuration should support that:

Environment A:

resource "google_service_account" "account" {
  account_id   = "account-a"
  display_name = "Account A"
}

resource "google_project_iam_member" "project" {
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.account.email}"
}

Environment B:

resource "google_service_account" "account" {
  account_id   = "account-b"
  display_name = "Account B"
}

resource "google_project_iam_member" "project" {
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.account.email}"
}

I just ran that and it worked as expected, with both account-a and account-b ending up with the storage.admin role.

If you're experiencing different results, I'd love an issue with a minimal reproduction.

Environment A:

resource "google_service_account" "service_account" {
  account_id   = "account-a"
  display_name = "A Service Account"
}

resource "google_project_iam_binding" "storage_admin" {
  project = "${var.project}"
  role    = "roles/storage.admin"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

resource "google_project_iam_binding" "compute_network_admin" {
  project = "${var.project}"
  role    = "roles/compute.networkAdmin"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

Environment B:

resource "google_service_account" "service_account" {
  account_id   = "account-b"
  display_name = "B Service Account"
}

resource "google_project_iam_binding" "storage_admin" {
  project = "${var.project}"
  role    = "roles/storage.admin"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

resource "google_project_iam_binding" "compute_network_admin" {
  project = "${var.project}"
  role    = "roles/compute.networkAdmin"

  members = [
    "serviceAccount:${google_service_account.service_account.email}",
  ]
}

google_project_iam_binding controls the entire role in your project; only one Terraform state should have google_project_iam_binding in it for a given role, as documented here. If you want to have two separate statefiles both assign a service account to the same role, you must use the google_project_iam_member, as I did in my example.

With the google_project_iam_member, you can only have one role?

How does a member get more than one role as I have in my example?

You can have as many google_project_iam_member resources as you like, each binding the same or different service accounts to the same or different roles.

This is perfectly valid:

resource "google_project_iam_member" "project" {
  role    = "roles/compute.networkAdmin"
  member  = "serviceAccount:${google_service_account.account.email}"
}

resource "google_project_iam_member" "project2" {
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.account.email}"
}

as is this:

resource "google_project_iam_member" "project" {
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account.account.email}"
}

resource "google_project_iam_member" "project2" {
  role    = "roles/storage.admin"
  member  = "serviceAccount:${google_service_account2.account.email}"
}

You can have multiple google_project_iam_member resources that are really just adding roles to the same member? As opposed to a single iam member resource that could have roles?

The three google_project_iam_* resources all control different levels of the same thing.

  • google_project_iam_policy controls the entire project. It ensures that no accounts are set on any roles except the ones it defines.
  • google_project_iam_binding controls a specific role within the policy. It ensures that no accounts except the ones it defines are set on _only the one role it controls_. Two or more google_project_iam_bindings can be used for the same project, but only if they're controlling different roles.
  • google_project_iam_member controls a specific member on a specific role in the policy. It ensures that the member is part of the role, but doesn't care if other members are on that role or not.

If you use google_project_iam_member or google_project_iam_binding, you shouldn't use google_project_iam_policy.

If you use google_project_iam_member, you shouldn't use a google_project_iam_binding for the same role, though other google_project_iam_members on the same role are fine. And google_project_iam_bindings on _different_ roles are fine.

If you use google_project_iam_binding, you shouldn't use a google_project_iam_binding or a google_project_iam_member on the same role.

Awesome. Thank you for the detailed explanation.

We'll make that change and try it again.

Thank you!

Not sure where it would be best to explain what you wrote above, but I don't find the description in the docs for google_project_iam_member as clear as what you wrote above:

Allows creation and management of a single member for a single binding within the IAM policy for an existing Google Cloud Platform project.

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings