Dbt: Applying BigQuery labels through DBT

Created on 20 Nov 2019  路  7Comments  路  Source: fishtown-analytics/dbt

Describe the feature

I would like to be able to apply labels to BigQuery tables that are created during the DBT process.
BQ tables have the labels feature (see attached image):
image
The Label feature is very useful for version control and other desired attributes.
It would be extremely valuable to be able to control this feature by using Config in my DBT SQL files.

Who will this benefit?

Anyone using the BigQuery Label feature will benefit from the ability to control it using DBT.

bigquery enhancement good first issue

All 7 comments

I want this too!

I'm also interested in having config tags be the labels.

Example 1
dbt tag of 'nightly'
BigQuery label = 'nightly'

Example 2
dbt tag of 'schedule=nightly'
Bigquery label = 'schedule:nightly'

Additional notes and thoughts:

  • Duplicate label keys are not allowed on both RedShift and BigQuery. What happens if a table or view config has duplicate keys with conflicting values?
  • When updating existing tables and views would we want dbt to clear out existing labels first? What if there are labels not applied by DBT? Something to think about.

Here are the BigQuery label requirements: https://cloud.google.com/bigquery/docs/labels-intro#requirements

I also included the RedShift label requirements to preserve future compatibility: https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-tagging.html

See also https://github.com/fishtown-analytics/dbt/issues/1947

cool idea @talcherrehub! Let's do it :)

@ciscodebs I think the redshift tags are different -- those tags apply to _redshift clusters_, ie. they're something you'd set on the cluster itself, not on tables in the cluster. So, this is BQ-only functionality which actually makes this easier to implement :)

I think the idea of persisting dbt's tags as BQ tags is pretty elegant. You buy that @talcherrehub and @kconvey?

I tend to think using dbt tags for BigQuery labels is overly restrictive, and overloads the tag/label config option. BigQuery labels seem like they'd naturally support further processing post-dbt, which may not have anything to do with dbt labels. Putting BQ labels into dbt tags probably requires that all models wind up with their dbt tag as a label, which may have only existed as a tag for model selection in dbt, adding noise to BQ labels simply to avoid another config option in your dbt_project.yml. It makes more sense to me to be explicit in dbt_project.yml for what are two related, but ultimately different concepts. Where there is strong overlap, I would hope that yaml anchors and the like could potentially save you from repeating yourself.

There is also the issue that BQ labels are ultimately key-value pairs, where dbt tags currently are not. It would be a bit odd to add empty values for dbt tags just to meet a BQ syntax requirement, especially when dbt tags are used across adapters.

I am all in favor of a more elegant solution than this, but it still seems like a useful feature worth supporting even if there is some overlap between tag/label. Eager to hear some other thoughts!

Yeah, I buy that, thanks for the cogent argument @kconvey :)

Let's go ahead and support a BQ-specific config, labels, which should accept a dictionary.

Sounds good to me. I also wanted to point out that BigQuery accepts label keys without values. These are referenced as tags in their docs which isn't confusing at all. I though this might be worth bringing up just in case...

https://cloud.google.com/bigquery/docs/adding-labels#adding_a_tag

good to know @ciscodebs - thanks for the additional info!

closed by #1964

Was this page helpful?
0 / 5 - 0 ratings