Redshift does not support explicitly setting a diststyle to auto. Instead, if a user configures a diststyle as "auto", dbt should ignore the config. This compels Redshift to use the "auto" diststyle.
See this thread for more info: https://github.com/fishtown-analytics/dbt/issues/1882
-- models/my_model.sql
{{ config(materialized='table', dist='auto') }}
select 1 as id
The model will fail to build indicating that auto is not a valid diststyle.
Which database are you using dbt with?
The output of dbt --version:
0.16.0
Hi @drewbanin
Thanks for keeping the burning flame of this topic! I like the idea of explicit is better than implicit.
I revisited the Redshift Docs and found it explaining how Redshift chooses the distribution key and the sort key for CTAs, referral documentation that brought me into this page here.
In a nutshell, if it is a simple query the CTA table will inherit the distribution key from the source table. In another hand, if it is a complex query with joins and group by, Redshift will choose the distribution key and the sort key based on the Query Plan.
If you don't specify DISTKEY or DISTSTYLE, CTAS determines the distribution style for the new table based on the query plan for the SELECT clause. For more information, see Inheritance of Column and Table Attributes.
My point is, it is still an automatic mechanism since the user can choose to not define the distribution key, and Redshift will choose one. However, it will not mark the distribution key as AUTO on the metadata tables definition. Probably, Redshift will not choose a new distribution key based on table usage for example.
I took the opportunity to execute your suggested example and check the distsyle _(I've omitted the materialization in the config block)_, here it is:
select "table", diststyle from SVV_TABLE_INFO
where "table" like 'my_test%';
-- result:
-- table diststyle
-- my_test EVEN
Anyway, I will be very happy contributing with this PR.
Awesome, thanks @rodrigodelmonte! A PR would be wonderful and very much welcomed!
Most helpful comment
Hi @drewbanin
Thanks for keeping the burning flame of this topic! I like the idea of explicit is better than implicit.
I revisited the Redshift Docs and found it explaining how Redshift chooses the distribution key and the sort key for CTAs, referral documentation that brought me into this page here.
In a nutshell, if it is a simple query the CTA table will inherit the distribution key from the source table. In another hand, if it is a complex query with joins and group by, Redshift will choose the distribution key and the sort key based on the Query Plan.
My point is, it is still an automatic mechanism since the user can choose to not define the distribution key, and Redshift will choose one. However, it will not mark the distribution key as AUTO on the metadata tables definition. Probably, Redshift will not choose a new distribution key based on table usage for example.
I took the opportunity to execute your suggested example and check the distsyle _(I've omitted the materialization in the config block)_, here it is:
Anyway, I will be very happy contributing with this PR.