Terraform-provider-google: Google Cloud Data flow - execution parameters are not configurable in Terraform - (diskSizeGb, workerDiskType,workerMachineType )

Created on 18 May 2018  路  5Comments  路  Source: hashicorp/terraform-provider-google

_This issue was originally opened by @karthik-papajohns as hashicorp/terraform#18073. It was migrated here as a result of the provider split. The original body of the issue is below._


Terraform Version

Terraform v0.11.5

Terraform Configuration Files

...

Debug Output

Crash Output

Expected Behavior

Expected terraform to have Google cloud data flow execution parameters(diskSizeGb, workerDiskType,workerMachineType ) configurable.

https://cloud.google.com/dataflow/pipelines/specifying-exec-params

Actual Behavior

No references of execution parameters for Google cloud dataflow are found in terraform official documentation.

https://www.terraform.io/docs/providers/google/r/dataflow_job.html

Additional Context

#

enhancement upstream

Most helpful comment

I have found a dirty work-around:

  1. I downloaded the Google template I wanted to deploy to dataflow from Google's template bucket with the gcloud cli tool.
  2. For a a custom pipeline we wrote in Apache Beam in Java, I set the parameter workerMachineType to what I wanted and then wrote the pipeline to a template file instead of deploying it to GCP.
  3. Then I looked through the template I just created and manually copied everything relating to "machineType" over to the google template previously downloaded. There were a total of three places:

At the top in "options":
"zone" : null,
"workerMachineType" : "n1-standard-1",
"gcpTempLocation" : "gs://dataflow-staging-us-central1-473832897378/temp/",

Again at the bottom of sdkPipelineOptions:
}, {
"namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineOptions",
"key" : "templateLocation",
"type" : "STRING",
"value" : "gs://dataflow-templates-staging/2018-10-08-00_RC00/PubSub_to_BigQuery"
}, {
"namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions",
"key" : "workerMachineType",
"type" : "STRING",
"value" : "n1-standard-1"
}
]
},

And finally in "workerPools":
"dataDisks" : [ { } ],
"machineType" : "n1-standard-1",
"numWorkers" : 0,

  1. I used the Terraform resource "google_storage_bucket_object" to upload this file into a bucket in my gcp project.
  2. Finally I just point "template_gcs_path" in "google_dataflow_job" to the just uploaded location in my bucket instead of the standard google template.

I realise it is a bit hacky, but it works. The pipeline gets successfully deployed on a n1-standard-1 Compute Engine instead of the default n1-standard-4.

All 5 comments

I have found a dirty work-around:

  1. I downloaded the Google template I wanted to deploy to dataflow from Google's template bucket with the gcloud cli tool.
  2. For a a custom pipeline we wrote in Apache Beam in Java, I set the parameter workerMachineType to what I wanted and then wrote the pipeline to a template file instead of deploying it to GCP.
  3. Then I looked through the template I just created and manually copied everything relating to "machineType" over to the google template previously downloaded. There were a total of three places:

At the top in "options":
"zone" : null,
"workerMachineType" : "n1-standard-1",
"gcpTempLocation" : "gs://dataflow-staging-us-central1-473832897378/temp/",

Again at the bottom of sdkPipelineOptions:
}, {
"namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineOptions",
"key" : "templateLocation",
"type" : "STRING",
"value" : "gs://dataflow-templates-staging/2018-10-08-00_RC00/PubSub_to_BigQuery"
}, {
"namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions",
"key" : "workerMachineType",
"type" : "STRING",
"value" : "n1-standard-1"
}
]
},

And finally in "workerPools":
"dataDisks" : [ { } ],
"machineType" : "n1-standard-1",
"numWorkers" : 0,

  1. I used the Terraform resource "google_storage_bucket_object" to upload this file into a bucket in my gcp project.
  2. Finally I just point "template_gcs_path" in "google_dataflow_job" to the just uploaded location in my bucket instead of the standard google template.

I realise it is a bit hacky, but it works. The pipeline gets successfully deployed on a n1-standard-1 Compute Engine instead of the default n1-standard-4.

machine_type is now configurable. The others aren't yet.

Is there any planned timeline for implementing diskSizeGb to be configurable as well? As documented in Google Dataflow's common error guidance we'd like to be able to manage the workers Disk Size when managing Dataflow jobs with terraform.

Another feature nice to have is the possibility to set numbers of workers.

+1, would like to be able to set disk_size_gb and worker_disk_type

Was this page helpful?
0 / 5 - 0 ratings