Dbt: Support for postgres index creation?

Created on 20 Jun 2018  路  5Comments  路  Source: fishtown-analytics/dbt

Is there a way, using dbt, to create an index in postgres as part of a run? Given that everything is a select, it's not clear where this functionality would live, and I couldn't find anything in the docs or issues about it.

Or, is the answer to just get out of postgres and move to BigQuery etc :) ?

good first issue help wanted redshifpg

Most helpful comment

Hey @ryantuck - you can accomplish this with a post-hook. You'll probably want to do something like

{{
  config({
    "materialized" : "table",
    "post-hook": 'create index if not exists {{ this.name }}__index_on_FIELD on {{ this }} ("FIELD")'
    })
}}

select ...

^ assuming the index you want to add is on a column called FIELD. If you find yourself doing this a lot, you can create a macro to add an index to an parameterized column. We also have a nascent postgres package that does exactly this :)

I think in general, dbt works best with Redshift/Snowflake/BigQuery, but sounds like there are at least a few folks using it on postgres with some success! Hope this helps :)

All 5 comments

Hey @ryantuck - you can accomplish this with a post-hook. You'll probably want to do something like

{{
  config({
    "materialized" : "table",
    "post-hook": 'create index if not exists {{ this.name }}__index_on_FIELD on {{ this }} ("FIELD")'
    })
}}

select ...

^ assuming the index you want to add is on a column called FIELD. If you find yourself doing this a lot, you can create a macro to add an index to an parameterized column. We also have a nascent postgres package that does exactly this :)

I think in general, dbt works best with Redshift/Snowflake/BigQuery, but sounds like there are at least a few folks using it on postgres with some success! Hope this helps :)

An issue I've found with the suggested hook is it seems to only add an index every second run. But if I remove the if not exists, then it fails on it already existing (after the first run).

Is this something to do with how dbt handles the cleverness of the table creation and renames? Is there a fix that will solve the immediate problem?

Thank you!

Let's add first-class support for an index config to models running on Postgres. We support the analogues for Redshift/Snowflake/BigQuery -- we might as well support these natively on Postgres too.

We don't intend to prioritize this directly (we always recommend using a proper data warehouse with dbt like Redshift, Snowflake, or BigQuery), but I would be _very_ happy to work with a community contributor to implement this functionality in dbt!

Hi all. Just wanted to update this issue with the workaround (Proposed by drewbanin in https://github.com/fishtown-analytics/dbt/issues/1945#issuecomment-576826021) we are going to use @ DoSomething.org. For context: our former data lead created this related issue https://github.com/fishtown-analytics/dbt/issues/1945#issue-526684234.

Macro

{% macro get_index_name(model, idx_name) %}
    {{ model.schema ~ '_' ~ model.table ~ '_' ~ idx_name ~ '_' ~ run_started_at.strftime('%Y%m%d_%H%M%S') ~ ' ON ' ~ model }}
{% endmacro %}

Usage

Creates indexes consistently on every run for the given table in the pattern <schema>_<table_name>_<idx_name>_<yyyymmdd_hhmmss>, e.g. "public_post_actions_unique_20200804_235704"

models:
    ds_dbt:
        ...
        model_name:
            table_mame:
                alias: post_actions
                materialized: table
                 post-hook:
                    - "CREATE UNIQUE INDEX {{ get_index_name(this, 'unique') }} (created_at, id)"
Was this page helpful?
0 / 5 - 0 ratings