Dbt: model blocks

Created on 19 Oct 2016 · 7Comments · Source: fishtown-analytics/dbt

This issue presents a suggested alternative to the way that dbt code is currently organized.

Currently, every .sql file in the models directory is its own model. This proposal is to break that 1:1 relationship between files and models, and instead allow models to be defined similarly to how functions are defined in a programming language. For example:

def [model_1_name]

SELECT a, b
from table_c
where d is not null

def [model_2_name]

...

By itself, this change doesn't accomplish anything except for adding some light organization to a models directory. But the relaxation of the constraint of 1 file : 1 model allows users to create models very differently.

Currently, the 1:1 constraint encourages users to write models that are fairly long and complex, frequently 50-100 lines long or more. These models often contain several CTEs, each of which represents a separate data transformation. These CTEs, because they are wrapped within a larger model, cannot be tested and are not a part of the dependency graph. Plus, long code with multiple CTEs is harder to read, write, and maintain.

This file structure would allow model authors to decompose their models into multiple components, and then control each component as a part of the DAG. Each def is a model, but the default for all models is to be ephemeral, unless instructed otherwise. Ephemeral models can still be tested but are not materialized directly into the database. We still need to consider how this default interacts with model configuration scoping.

Additionally, authors would be able to provide common aliases for queries within that file using an import directive. The import directive is a convenient way for model authors to create a single alias for a table that is then used throughout the entire file. Imports would happen at the beginning of a file:

import {{ref('a_long_model_name')}} as table_c

def [model_1_name]

SELECT a, b
from table_c
where d is not null

def [model_2_name]

...

This import functionality will increase code quality and reduce lines of code, as currently model authors are writing verbose CTEs at the beginning of every model to provide this aliasing functionality.

enhancement

Source

jthandy

Most helpful comment

A use case that came up today that I think is a good example of where the one-to-one paradigm breaks down:
A dbt user wants to create sanitized views over their fct_ tables, that excludes PII information (assume we have a list of known PII column names).
At the moment this would be achieved one of two ways:

creating an extra model for each fct_ table, leading to lots of redundant code
creating the view via a post-hook, which means that the objects don't show up in the DAG.

Just thought it worth adding this as a use-case to think through :)

clrcrl on 16 Jan 2019

👍3

All 7 comments

/cc @drewbanin

jthandy on 19 Oct 2016

here are some notes on syntax after discussing this extensively over the past few days.

--how users create things:

create_test(args)
create_model(args)
create_macro(args)


--where args is:

args = [
  sql = "",
  materialized = "",
  ...
]

--syntax options follow


--HCL-like example

model "quickbooks_bill_payments" {
  name = "something_else"
  config {
  materialized = true
  }
  sql = """
    select *
    from ${model.quickbooks_billpayment_lines.table_name}
    where whatever
  """
}


--python-like example

def quickbooks_bill_payments (config=(args)) {

  with bill_payments as (

    select * from {{ref('quickbooks_bill_payments')}}

  ), billpayment_lines as (

    select * from {{ref('quickbooks_billpayment_lines')}}

  ), d1 as (

    select
      bp.id, bp.txn_date, bpl.amount, payment_account_id, ap.id as ap_id
    from bill_payments bp
      inner join billpayment_lines bpl on bp.id = bpl.bill_payment_id
      join (select id from {{ref('quickbooks_accounts')}} where type = 'Accounts Payable') ap
        on 1 = 1

  )

  select
    id, txn_date, amount, payment_account_id as account_id, 'credit' as transaction_type,
    'bill payment' as source
    {% if var('uses_classes') == "true" %}
      , null::bigint as class_id
    {% endif %}
  from
    d1

  union all

  select
    id, txn_date, amount, ap_id, 'debit', 'bill payment'

    {% if var('uses_classes') == "true" %}
      , null::bigint
    {% endif %}
  from
    d1

}

jthandy on 24 Oct 2016

Sweet writeup -- thanks @jthandy

drewbanin on 24 Oct 2016

Reasons we might want this:

type safety when passing data between packages
specific functions (like import)
managing scope
managing file count
more idiomatic override

jthandy on 2 Dec 2016

A quick update here: this is still very relevant. We've doubled down much harder on jinja since this issue was originally raised, so my guess is the most likely syntax would look something closer to this:

--jinja-like example

{% model quickbooks_bill_payments() %}

  with bill_payments as (

    select * from {{ref('quickbooks_bill_payments')}}

  ), billpayment_lines as (

    select * from {{ref('quickbooks_billpayment_lines')}}

  ), d1 as (

    select
      bp.id, bp.txn_date, bpl.amount, payment_account_id, ap.id as ap_id
    from bill_payments bp
      inner join billpayment_lines bpl on bp.id = bpl.bill_payment_id
      join (select id from {{ref('quickbooks_accounts')}} where type = 'Accounts Payable') ap
        on 1 = 1

  )

  select
    id, txn_date, amount, payment_account_id as account_id, 'credit' as transaction_type,
    'bill payment' as source
    {% if var('uses_classes') == "true" %}
      , null::bigint as class_id
    {% endif %}
  from
    d1

  union all

  select
    id, txn_date, amount, ap_id, 'debit', 'bill payment'

    {% if var('uses_classes') == "true" %}
      , null::bigint
    {% endif %}
  from
    d1

{% endmodel %}

jthandy on 12 Oct 2017

👍1

creating an extra model for each fct_ table, leading to lots of redundant code
creating the view via a post-hook, which means that the objects don't show up in the DAG.

Just thought it worth adding this as a use-case to think through :)

clrcrl on 16 Jan 2019

👍3

cooked this up to explore what sub-tables (like @clrcrl is describing) could look like as custom materializations: https://gist.github.com/drewbanin/ea256260dcb8474414660d1c90d39e3f

drewbanin on 25 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

KeyError while using deferred flag and model filters

nave91 · 3Comments

alias models to change their names in the database

drewbanin · 4Comments

ref'ing an ephemeral model inside of a statement does not work

drewbanin · 3Comments

Persist source descriptions to database

whittid4 · 3Comments

Threads cannot be set from an environment variable

keiththompson · 3Comments