Terraform-provider-google: Allow creation of UDFs in a dataset

Created on 18 Sep 2019  ยท  7Comments  ยท  Source: hashicorp/terraform-provider-google


Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

I was looking for something that might allow me to define a UDF inside a Google Cloud BigQuery dataset, and wasn't able to find anything in the provider at present. It seems like the Go client library allows for creation of these by creation of a Routine, defined here.

It might be worthwhile to consider adding this functionality to the module considering they can now be defined in a persistent manner (see this)

New or Affected Resource(s)

  • google_bigquery_udf
  • google_bigquery_dataset

Potential Terraform Configuration

resource "google_bigquery_dataset" "fn" {
  dataset_id = "fn"
  location = "US"
  project = google_project.project_here.project_id
}

data "udf_parameters" "udf_args" {
  named_parameter {
    param_name = "my_arg1"
    param_type = "FLOAT64"
  }

  named_parameter {
    param_name = "my_arg2"
    param_type = "FLOAT64"
  }
}

resource "google_bigquery_udf" "my_udf" {
  project = google_project.project_here.project_id
  dataset_id = google_bigquery_dataset.fn.dataset_id
  persistent_function_id = "MY_UDF"
  language = "js"
  arguments = data.udf_parameters.udf_args
  returns = "FLOAT64"
  body = <<EOF
  function body here, or import a file
EOF
}

I'm not confident this covers all the potential knobs that you can turn, but I think it covers the ones mentioned here.

References

enhancement priorit2 sizM

Most helpful comment

Bringing over the points I made on #4619 which @paddycarver has (rightly) closed as a duplicate. This may add discussion to the points that @chetaldrich raised above.

Description

On 2019-10-03 the BigQuery service gained the ability to use stored procedures and UDFs, see the release notes here: https://cloud.google.com/bigquery/docs/release-notes#October_03_2019

The routines resource on the BigQuery API supports management of stored procedures and UDFs: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines

This issue is to request support in the Terraform GCP Provider for creating, modifying & destroying BigQuery stored procedures and UDFs.

New or Affected Resource(s)

  • google_bigquery_routine

Potential Terraform Configuration

resource "google_bigquery_routine" "sproc" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "mysproc"
  routine_type = "PROCEDURE"
  language = "SQL"
  definition_body = "CREATE PROCEDURE ${google_bigquery_dataset.default.dataset_id}.mysproc () BEGIN SET x = x + delta; END"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}
resource "google_bigquery_routine" "udf" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "myudf"
  routine_type = "SCALAR_FUNCTION"
  language = "SQL"
  definition_body = "${data.template.function_body}"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}

All 7 comments

Thanks for filing this.

Essentially Routines (at the API level) are a meta-collection for different kinds of function-like resources. As you've noted, user-defined functions are one aspect, but other kinds of things map into this concept like stored procedures and table-valued functions.

Is the general preference of the terraform community to have different kinds of resources exposed (e.g. google_bigquery_udf, google_bigquery_procedure, etc) or to adhere closer to the API and expose a more generic google_bigquery_routine resource with optional fields that aren't relevant for all kinds of routines? Main precedent among bigquery resources here is that currently google_bigquery_table can be used to create managed tables, logical views, and federated tables that reference external resources.

The other thing that's going to be interesting is expressing the data typing of inputs and outputs. The common case is something like the scalar syntax here, but BQ's ability to use structs and arrays means that typing is more complex, and it may make more sense to do the same "just use a json string" technique as used for expressing schemas in table resources.

So, it seems like I'm seeing a few questions here:

  1. Routines represent multiple "things" (stored procedures, table-valued functions, and UDFs). Should we have more than one resource or just a routine resource that conforms to the API?

So I think there's a trade-off here. Specifically, if you have multiple resources for each case, the Terraform API differs more from the corresponding Google API, and will increase the maintenance burden, since there are now more resources to maintain.

The other side of the coin is that having a single resource for all of these that requires different parameters that are required/not depending on the circumstance makes it less clear to users what is actually needed to define any particular one of these three, and makes it more likely when users are building these that issues won't be caught during the Terraform planning phase and will instead be noticed when applies are run and they fail. This is, of course, not a great user experience.

The main thing that would make a decision more clear is how different each of these is when you request against the Routine API defined here. If they require much different configurations in practice, I'd argue it makes more sense to split them up into separate resources. The opposite if they mostly require the same set of parameters. Unfortunately I wasn't able to find documentation like the following for stored procedures or table-valued functions beyond a _very_ cursory mention of stored procedures here after searching the BigQuery docs. If you happen to have more clear documentation somewhere that I missed on how these differ in practice it'd make this call easier.

  1. BQ's ability to use structs and arrays means that typing is more complex, so it'd probably be better to just use a JSON, right?

Yeah, I'd agree. I wasn't really thinking about that at the time of creating this issue. We could use JSON that hews closer to the Argument definition instead of using the named_parameter objects I constructed in the initial example.

Bringing over the points I made on #4619 which @paddycarver has (rightly) closed as a duplicate. This may add discussion to the points that @chetaldrich raised above.

Description

On 2019-10-03 the BigQuery service gained the ability to use stored procedures and UDFs, see the release notes here: https://cloud.google.com/bigquery/docs/release-notes#October_03_2019

The routines resource on the BigQuery API supports management of stored procedures and UDFs: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines

This issue is to request support in the Terraform GCP Provider for creating, modifying & destroying BigQuery stored procedures and UDFs.

New or Affected Resource(s)

  • google_bigquery_routine

Potential Terraform Configuration

resource "google_bigquery_routine" "sproc" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "mysproc"
  routine_type = "PROCEDURE"
  language = "SQL"
  definition_body = "CREATE PROCEDURE ${google_bigquery_dataset.default.dataset_id}.mysproc () BEGIN SET x = x + delta; END"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}
resource "google_bigquery_routine" "udf" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "myudf"
  routine_type = "SCALAR_FUNCTION"
  language = "SQL"
  definition_body = "${data.template.function_body}"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}

Regarding whether to have a single Terraform resource or multiple resources for stored procs, functions - the existing google_bigquery_table resource can be used to implement both tables _and views_ in a dataset, even though these are conceptually different, because that is how the table API is implemented. For consistency, I think I would prefer a single "routines" resource that maps to the routines API, rather than abstracting this into multiple resources. There's little difference between the API definition of functions and stored procedures since most of the logic is implemented within the arbitrary code of the "definition" property.

I was curious about this feature and looked at generating the resource via api.yaml, but the api includes a recursive data type: https://cloud.google.com/bigquery/docs/reference/rest/v2/StandardSqlDataType. I'm not very familiar with the api.yaml format--is it possible to represent recursive data types?

Hi all, does Terraform give us a release date or do we have to get enough ๐Ÿ‘ ?
Thanks !

I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐Ÿค– ๐Ÿ™‰ , please reach out to my human friends ๐Ÿ‘‰ [email protected]. Thanks!

Was this page helpful?
0 / 5 - 0 ratings