dbt should support the creation of UDFs in BigQuery. BQ requires that UDFs be created as temporary functions in the same statement as the eventual query which uses it. Example:
CREATE TEMP FUNCTION greeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
return "Hello, " + a + "!";
""";
create table dbt_dbanin.my_udf_model as (
SELECT greeting(name) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS name
);
via docs.
dbt's pre-hooks would seem like a good candidate for this feature, but unfortunately, the pre-hooks run in separate statements, which won't work here. The UDF definition will need to occur deep within the materialization code itself.
BQ users who want to use UDFs in their models.
PLEASE NOTE THAT THE WORKAROUND IS NO LONGER REQUIRED as per the last comment of this thread.
@TristanBoudreault ran into this issue, and we figured out a workaround.
macros/custom_create_table_as.sql (or similar, the name really doesn't matter). Full credit to Tristan for this macro code:{% macro bigquery__create_table_as(temporary, relation, sql) -%}
{%- set raw_partition_by = config.get('partition_by', none) -%}
{%- set raw_cluster_by = config.get('cluster_by', none) -%}
{%- set udf = config.get('udf', none) -%} # I added this part
{{ udf if udf is not none }} # I added this part
create or replace table {{ relation }}
{{ partition_by(raw_partition_by) }}
{{ cluster_by(raw_cluster_by) }}
as (
{{ sql }}
);
{% endmacro %}
{{ config(
materialized='table',
udf='''
CREATE TEMP FUNCTION greeting(a STRING)
RETURNS STRING
LANGUAGE js AS """
return "Hello, " + a + "!";
""";
'''
)-}
SELECT greeting(name) as everyone
FROM UNNEST(["Hannah", "Max", "Jakob"]) AS name
bigquery__create_table_as over the default version when using the table materialization!Thank you for the work-around, @clrcr. It works as a charm.
@drewbanin It will be nice to add this to the future release.
However, it doesn't work when using UDF in models with materialized='incremental'
I am investigating the work-around. If anyone knows about the fix, I greatly appreciated.
Hey @hui-zheng - on BigQuery, incremental models use merge statements, so they don't go through the bigquery__create_table_as code path. I think you'd need to implement an approach similar to the one described above for the bigquery__get_merge_sql macro.
I suppose this is something we could support natively in dbt, but really, I wish BigQuery would ship with a more sane implementation of UDFs!
@drewbanin I would be happy to contribute to the fix of bigQuery UDF for the next release, until BigQuery ships a better UDFs. Could you point me to a doc explaining how to setup development env and contribute?
Hey @hui-zheng - check out the contributing guide here.
You'll basically want to clone the repo, check out a new branch (like feature/bq-udfs), then install dbt from source. You can do that with
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
As long as your virtualenv is activated, the dbt command will point to the code in this repo! That means that you can change the materialization code, do a dbt run and inspect the results.
Drop us a line in #development in Slack if you have any questions about development in general, and feel free to follow up here if you want to talk about the implementation of this specific feature.
@drewbanin thank you. I will look into that.
Meanwhile, I come up with a work-around for UDF in incremental mode.
Add the following macro to your project in macros/bq_custom_merge.sql (or similar sql file name).
{% macro bigquery__get_merge_sql(target, source, unique_key, dest_columns) %}
{%- set udf = config.get('udf', none) -%}
{{ udf if udf is not none }}
{{ common_get_merge_sql(target, source, unique_key, dest_columns) }}
{% endmacro %}
BigQuery supports persistent UDFs now! https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_function_statement
We should close this issue once we've assessed that BQ UDFs work similarly to UDFs on Snowflake/Redshift. We can roll it into https://github.com/fishtown-analytics/dbt/issues/136 if it's beneficial to give dbt first-class knowledge of UDFs, or, users can continue to create UDFs using hooks/run-operations/etc
closing this - persistent UDFs work great. We may revisit this in #136 in the future
Most helpful comment
PLEASE NOTE THAT THE WORKAROUND IS NO LONGER REQUIRED as per the last comment of this thread.
@TristanBoudreault ran into this issue, and we figured out a workaround.
macros/custom_create_table_as.sql(or similar, the name really doesn't matter). Full credit to Tristan for this macro code:bigquery__create_table_asover the default version when using the table materialization!