Dbt: Persist source descriptions to database

Created on 12 Jun 2020  路  3Comments  路  Source: fishtown-analytics/dbt

Describe the feature

Currently description fields are persisted to the appropriate relation (table/column) in the target database. Would it be a good feature to extend that to sources?

The current use case I have is that data is streamed from Kafka in to BigQuery. We then create a source in DBT pointing at that streamed table and have DBT staging models that refine the data and marts models to consume the staging models.

It would be nice if the description added to the source YAML file could then both feed the DBT documentation and be persisted to BigQuery

Describe alternatives you've considered

An option I have considered would be to create a script that parses the manifest.json to extract resource_type = source and cal the BigQuery API to update the description

Additional context

This particular request is for BigQuery, but I imagine would be applicable to other target databases that allow documentation to be added

Who will this benefit?

This feature will be used from a data discovery point of view. It will allow our analytics team to understand information about the table without inferring from the table name/content

enhancement

Most helpful comment

Totally agree with the reasoning here. I see this as a run-operation, rather than an embedded action within dbt run. I think it's powerful, compelling, and we want people to _really_ know what they're doing when they're running any statements against source data tables.

All 3 comments

hey @whittid4 - this is a neat idea, and I _totally_ buy the reasoning, but I don't think this is something we're going to want dbt to do. dbt doesn't really touch source data, and I have concerns about baking something like this directly into dbt. For one: when would this code run? I don't think it should happen in a dbt run, and I don't know that there is really an appropriate subcommand for something like this today.

Instead, what do you think about making a macro that sets these descriptions? This macro could inspects the sources defined in a project, then execute some queries to add table and column-level comments to source tables.

I can picture that working with:

dbt run-operation dbt.persist_source_descriptions

The persist_source_descriptions macro might look something like:

-- pseudo-code
{% macro persist_source_descriptions() %}

for source in graph.sources:
  persist_relation_description(source)
  persist_column_descriptions(source)

{% endmacro %}

We could either ship with dbt, or we could include it in a package like dbt-utils.

@jtcohen6 curious to hear what you think too

Thanks for opening this issue @whittid4!

@drewbanin thanks for the response and your comment make sense.

I didn't realise how powerful the dbt graph context was. I will take a look and see if I can get a macro going. I get something working maybe I could contribute it to dbt-utils

Totally agree with the reasoning here. I see this as a run-operation, rather than an embedded action within dbt run. I think it's powerful, compelling, and we want people to _really_ know what they're doing when they're running any statements against source data tables.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nave91 picture nave91  路  3Comments

jtcohen6 picture jtcohen6  路  3Comments

jtcohen6 picture jtcohen6  路  3Comments

jgillies picture jgillies  路  3Comments

drewbanin picture drewbanin  路  3Comments