Dbt: Support for postgres index creation?

Created on 20 Jun 2018 · 5Comments · Source: fishtown-analytics/dbt

Is there a way, using dbt, to create an index in postgres as part of a run? Given that everything is a select, it's not clear where this functionality would live, and I couldn't find anything in the docs or issues about it.

Or, is the answer to just get out of postgres and move to BigQuery etc :) ?

good first issue help wanted redshifpg

Source

ryantuck

Most helpful comment

Hey @ryantuck - you can accomplish this with a post-hook. You'll probably want to do something like

{{
  config({
    "materialized" : "table",
    "post-hook": 'create index if not exists {{ this.name }}__index_on_FIELD on {{ this }} ("FIELD")'
    })
}}

select ...

^ assuming the index you want to add is on a column called FIELD. If you find yourself doing this a lot, you can create a macro to add an index to an parameterized column. We also have a nascent postgres package that does exactly this :)

I think in general, dbt works best with Redshift/Snowflake/BigQuery, but sounds like there are at least a few folks using it on postgres with some success! Hope this helps :)

drewbanin on 20 Jun 2018

👍6

All 5 comments

Hey @ryantuck - you can accomplish this with a post-hook. You'll probably want to do something like

{{
  config({
    "materialized" : "table",
    "post-hook": 'create index if not exists {{ this.name }}__index_on_FIELD on {{ this }} ("FIELD")'
    })
}}

select ...

I think in general, dbt works best with Redshift/Snowflake/BigQuery, but sounds like there are at least a few folks using it on postgres with some success! Hope this helps :)

drewbanin on 20 Jun 2018

👍6

An issue I've found with the suggested hook is it seems to only add an index every second run. But if I remove the if not exists, then it fails on it already existing (after the first run).

Is this something to do with how dbt handles the cleverness of the table creation and renames? Is there a fix that will solve the immediate problem?

Thank you!

max-sixty on 12 Feb 2020

Discussion for that at https://github.com/fishtown-analytics/dbt/issues/1945

max-sixty on 13 Feb 2020

Let's add first-class support for an index config to models running on Postgres. We support the analogues for Redshift/Snowflake/BigQuery -- we might as well support these natively on Postgres too.

We don't intend to prioritize this directly (we always recommend using a proper data warehouse with dbt like Redshift, Snowflake, or BigQuery), but I would be _very_ happy to work with a community contributor to implement this functionality in dbt!

drewbanin on 22 Feb 2020

👍1

Hi all. Just wanted to update this issue with the workaround (Proposed by drewbanin in https://github.com/fishtown-analytics/dbt/issues/1945#issuecomment-576826021) we are going to use @ DoSomething.org. For context: our former data lead created this related issue https://github.com/fishtown-analytics/dbt/issues/1945#issue-526684234.

Macro

{% macro get_index_name(model, idx_name) %}
    {{ model.schema ~ '_' ~ model.table ~ '_' ~ idx_name ~ '_' ~ run_started_at.strftime('%Y%m%d_%H%M%S') ~ ' ON ' ~ model }}
{% endmacro %}

Usage

Creates indexes consistently on every run for the given table in the pattern <schema>_<table_name>_<idx_name>_<yyyymmdd_hhmmss>, e.g. "public_post_actions_unique_20200804_235704"

models:
    ds_dbt:
        ...
        model_name:
            table_mame:
                alias: post_actions
                materialized: table
                 post-hook:
                    - "CREATE UNIQUE INDEX {{ get_index_name(this, 'unique') }} (created_at, id)"