Terraform-provider-google: Adding logging sink service accounts as writers on a big query dataset

Created on 7 Feb 2019  ·  7Comments  ·  Source: hashicorp/terraform-provider-google


Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
  • If an issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to "hashibot", a community member has claimed the issue already.

Terraform Version

Terraform v0.11.11

  • provider.google v1.19.1
  • provider.random v2.0.0

Affected Resource(s)

  • google_bigquery_dataset
  • google_logging_project_sink

Terraform Configuration Files

resource "google_bigquery_dataset" "ProjectLogging" {
    dataset_id = "ProjectLogging"
    friendly_name = "ProjectLogging"
    location = "US"
    project = "${google_project.project.project_id}" 
    labels {
        env = "${var.environment}"
    }
}

resource "google_logging_project_sink" "ProjectSink" {
    name = "ProjectLogging"
    destination = "bigquery.googleapis.com/projects/${google_project.project.project_id}/datasets/ProjectLogging"
    project = "${google_project.project.project_id}"
    filter = "resource.type = project"
    depends_on = ["google_bigquery_dataset.ProjectLogging"]
    unique_writer_identity = true
}

Expected Behavior

We use big query datasets to store our logging data. So in terraform I need to be able to

  • create the dataset
  • create the sink
  • add the service account created by the sink as a bigquery WRITER on the dataset.

We're unable to use IAM (google_project_iam_binding) due to our security requirements.

I tried the below just to see what would happen and terraform complained about a cycle.

resource "google_bigquery_dataset" "ProjectLogging" {
    dataset_id = "ProjectLogging"
    friendly_name = "ProjectLogging"
    location = "US"
    project = "${google_project.project.project_id}" 
    labels {
        env = "${var.environment}"
    }

    access = {
        role = "WRITER"
        user_by_email = "${google_logging_project_sink.ProjectSink.writer_identity}"
    }
}

resource "google_logging_project_sink" "ProjectSink" {
    name = "ProjectLogging"
    destination = "bigquery.googleapis.com/projects/${google_project.project.project_id}/datasets/ProjectLogging"
    project = "${google_project.project.project_id}"
    filter = "resource.type = project"
    depends_on = ["google_bigquery_dataset.ProjectLogging"]
    unique_writer_identity = true
}

Actual Behavior

N/A

Important Factoids

Using the standard method of a service account and not doing anything else I'd consider to be "odd".

References

bug

Most helpful comment

If it's helpful, for anyone who might need something of a workaround for now, I ended up doing the following. Less than ideal, but gets the job done for now (Terraform + accompanying set-flow-log-writer-access.sh used by the provisioner)

resource "google_logging_project_sink" "flow_log_sink" {
  name                   = "flow-logs"
  destination            = "bigquery.googleapis.com/projects/${data.google_project.project.project_id}/datasets/${google_bigquery_dataset.flow_logs.dataset_id}"
  filter                 = "resource.type = gce_subnetwork AND logName=projects/${data.google_project.project.project_id}/logs/compute.googleapis.com%2Fvpc_flows"
  unique_writer_identity = "true"
}

resource "google_bigquery_dataset" "flow_logs" {
  ..

  # all the below workarounds (lifecycle + null resource with exec) is related to
  # not being able to correctly attach the sink writer to the bigquery dataset.
  # see: https://github.com/terraform-providers/terraform-provider-google/issues/3012
  #
  # access {
  #   role          = "WRITER"
  #   user_by_email = "${google_logging_project_sink.flow_log_sink.writer_identity}"
  # }

  lifecycle {
    ignore_changes = ["access"]
  }

  access {
    role          = "OWNER"
    special_group = "projectOwners"
  }

  access {
    role          = "READER"
    special_group = "projectReaders"
  }
}

resource "null_resource" "flow_log_access" {
  count = "${var.collect_flow_logs}"
  triggers = {
    writer_identity = "${google_logging_project_sink.flow_log_sink.writer_identity}"
  }

  provisioner "local-exec" {
    command = "${path.module}/set-flow-log-writer-access.sh ${data.google_project.project.project_id} ${google_bigquery_dataset.flow_logs.dataset_id} ${google_logging_project_sink.flow_log_sink.writer_identity}"
  }
}
#!/bin/bash
set -e

if [[ -z "$1" || -z "$2" || -z "$3" ]]; then
 echo "pass [project id] [dataset id] [writer identity]"
 exit 1
fi

if ! [ -x "$(command -v bq)" ]; then
  echo "bq cli (bigquery cli, from gcloud cli) is not available in path"
  exit 1
fi

project=$1
dataset=$2
writer=${3#"serviceAccount:"}
temp_file=$(mktemp)
trap "rm -f ${temp_file}" EXIT

bq show --format=prettyjson --project ${project} ${dataset} \
  | jq --arg writer ${writer} '.access | . += [{"role": "WRITER", "userByEmail": $writer }] | {"access": .}' > ${temp_file}
bq update --source ${temp_file} ${project}:${dataset}

All 7 comments

Related: #2051 for allowing IAM policies on a particular BigQuery dataset

I ran into this today and I'm curious what the recommendation for this is.

2051 was closed (at least from a BigQuery perspective), I think because of a misunderstanding of IAM assignment to datasets. I'll admit BigQuery is pretty new to me, and maybe the functionality is too but based on my interpretation of https://cloud.google.com/bigquery/docs/access-control, it seems like IAM permissions can be assigned to datasets, and this is what the GCP console uses when you create a sink for BigQuery. access is used for this, and I guess it would require separate resources and dataset.patch calls?

I'm thinking my workaround right now might be to use a null resource with a local provisioner or something to use the gCloud CLI to get the job done, but ideally this could be 100% TF.

If it's helpful, for anyone who might need something of a workaround for now, I ended up doing the following. Less than ideal, but gets the job done for now (Terraform + accompanying set-flow-log-writer-access.sh used by the provisioner)

resource "google_logging_project_sink" "flow_log_sink" {
  name                   = "flow-logs"
  destination            = "bigquery.googleapis.com/projects/${data.google_project.project.project_id}/datasets/${google_bigquery_dataset.flow_logs.dataset_id}"
  filter                 = "resource.type = gce_subnetwork AND logName=projects/${data.google_project.project.project_id}/logs/compute.googleapis.com%2Fvpc_flows"
  unique_writer_identity = "true"
}

resource "google_bigquery_dataset" "flow_logs" {
  ..

  # all the below workarounds (lifecycle + null resource with exec) is related to
  # not being able to correctly attach the sink writer to the bigquery dataset.
  # see: https://github.com/terraform-providers/terraform-provider-google/issues/3012
  #
  # access {
  #   role          = "WRITER"
  #   user_by_email = "${google_logging_project_sink.flow_log_sink.writer_identity}"
  # }

  lifecycle {
    ignore_changes = ["access"]
  }

  access {
    role          = "OWNER"
    special_group = "projectOwners"
  }

  access {
    role          = "READER"
    special_group = "projectReaders"
  }
}

resource "null_resource" "flow_log_access" {
  count = "${var.collect_flow_logs}"
  triggers = {
    writer_identity = "${google_logging_project_sink.flow_log_sink.writer_identity}"
  }

  provisioner "local-exec" {
    command = "${path.module}/set-flow-log-writer-access.sh ${data.google_project.project.project_id} ${google_bigquery_dataset.flow_logs.dataset_id} ${google_logging_project_sink.flow_log_sink.writer_identity}"
  }
}
#!/bin/bash
set -e

if [[ -z "$1" || -z "$2" || -z "$3" ]]; then
 echo "pass [project id] [dataset id] [writer identity]"
 exit 1
fi

if ! [ -x "$(command -v bq)" ]; then
  echo "bq cli (bigquery cli, from gcloud cli) is not available in path"
  exit 1
fi

project=$1
dataset=$2
writer=${3#"serviceAccount:"}
temp_file=$(mktemp)
trap "rm -f ${temp_file}" EXIT

bq show --format=prettyjson --project ${project} ${dataset} \
  | jq --arg writer ${writer} '.access | . += [{"role": "WRITER", "userByEmail": $writer }] | {"access": .}' > ${temp_file}
bq update --source ${temp_file} ${project}:${dataset}

We just ran into the same issue, the only way around it that I can think of is if there were a new resource named something like "google_bigquery_dataset_access" and you specified all of the access to a specific dataset there. Otherwise, there is a circular dependency you cannot get around except with @chrisboulton's hack.

3990 should be able to address this if it gets added

@gosseljl close this issue as the fix (#3012) was applied. Please feel free to reopen it if you still encounter the issue. Thanks

I'm going to lock this issue because it has been closed for _30 days_ ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings