Docs: https://docs.snowflake.net/manuals/sql-reference/sql/create-materialized-view.html
Ideally this would also support the cluster by keyword:
create materialized view "schema"."my_mview"
cluster by ( id )
as
select 1 as id,
2 as name
Looks like this was added in 0.13.1
(PR #1432 )
@clausherther not so! #1432 fixed a problem where dbt couldn't run _if a materialized view lived in the dbt schema_. dbt still does not support the _creation_ of materialized views on Snowflake, though it is something I've been experimenting with recently.
In your mind, what's the advantage of using a materialized view over a dbt table model that's refreshed with some cadence? I seem to recall some pre-release Snowflake functionality that would auto-update materialized views, but I'm struggling to find that info presently
Ah, I misread this in the release notes: "Add "MaterializedView" relation type to the Snowflake adapter"
Personally, I don't think I have a good use case for MVs in Snowflake, that need has really been largely replaced by dbt. I think I opened the issue alongside the one for _transient_ tables, to keep track of new Snowflake features you potentially want to support.
alright, thanks for the additional context! Happy to leave this open to track the Snowflake feature, but also welcome input from anyone who 👍'd the issue to chime in with some use cases!
I'm trying to think how we would deal with MViews in dbt. You don't really want to drop and recreate them every time because then there would be a lot of extra time and cost by Snowflake bringing them up-to-speed with the underlying data. They do act like views when they are "stale" or not completely refreshed so at least there wouldn't be any problems with what happens if they are queried if not completely rebuilt:
Data accessed through materialized views is always current, regardless of the amount of DML that has been performed on the base table. If a query is run before the materialized view is up-to-date, Snowflake either updates the materialized view or uses the up-to-date portions of the materialized view and retrieves any required newer data from the base table.
https://docs.snowflake.net/manuals/user-guide/views-materialized.html
Perhaps they would be like incremental tables where they are only re-created when a) they don't exist or b ) when forcing a full refresh?
There is also the potential for a lot of run-time error with MViews in Snowflake because they are limited in what they support from a SQL perspective. My concern is that it would be pretty difficult for dbt to parse and do a compile-time validation of what is not legitimate for an MView which would lead in many cases to dbt runs failing if the MView gets compiled but then fails creation or update at run-time. The list of things is too long for me to write here but see link above.
@drewbanin materialized views are useful for:
CREATE MATERIALIZED VIEW cdc_highwater
AS
(
SELECT primary_key, column_name, MAX(wal_ts) AS wal_ts
FROM cdc_table
)
to accelerate consolidation into current-state tables like:
SELECT primary_key
, DECODE(column_name, 'col1', value) AS col1
, DECODE(column_name, 'col2', value) AS col2
, ...
, DECODE(column_name, 'colN', value) AS colN
FROM cdc_table JOIN cdc_highwater USING (primary_key, column_name, wal_ts)
Frankly I think only use case 1 is value-generating, since the performance improvements are (allegedly) nontrivial for end users.
Thanks @mike-weinberg! I don't think it would be too challenging to create a materialized_view materialization in dbt. I think the logic would look like:
full-refresh mode:I still think that the list of caveats are too restrictive for most modeling use cases (no window functions, no unions, limited aggregates, can't query views, etc etc etc). Still, happy to support these as I do think the first use-case you've pointed out here @mike-weinberg is good and legitimate. Plus, I imagine that these materialized views will only become more capable over time.
If someone is interested in creating this materialization, it all should be pretty doable in user-space! Would love to discuss the implementation further if anyone picks it up :)
Redshift is also planning on supporting materialized views (in preview):
https://docs.aws.amazon.com/redshift/latest/dg/mv.html
It appears that many of the same limitations apply as with Snowflake materialized views.
https://docs.aws.amazon.com/redshift/latest/dg/mv-usage-notes.html
In addition, Redshift appears to keep the underlying materialized data static until a refresh command is issued at which time it either incrementally updates the data if certain conditions are met, or does a full rebuild.
https://docs.aws.amazon.com/redshift/latest/dg/mv-refresh.html#mv_REFRESH_MARTERIALIZED_VIEW_usage
While I agree that there will be limited cases where this makes sense in modeling, surely there are valid and interesting use cases.
Thanks @automagic! I used to be not-so-bullish on Materialized views, but the idea of these incremental refreshes is really appealing. One thing I particularly like about them: refreshing a table can happen concurrently with external queries. When dbt replaces tables, it currently blocks if there are long-running select statements hitting the table.
The one challenge here is going to be knowing when dbt should rebuild the MV, vs. just refresh it. We can leverage the --full-refresh flag, but that feels a little heavy handed. It would certainly be pretty cool if dbt could check if the columns in the view have changed, a la #320. Even better would be some sort of logical comparison (eg. check the hash of the SQL that created the model against the hash of the new model).... but that could be something for a separate issue :)
BigQuery has a similar feature https://cloud.google.com/bigquery/docs/materialized-views-intro
Would definitely be useful
Even with the limitations native Snowflake MV's have, supporting them in DBT will become more useful once the Transparent MV usage functionality has been released _(in private preview at the moment of writing this)_. Basically this is functionality akin to Oracle's QUERY_REWRITE.
See my write up on this new feature (and others) and the SF new features video that mentions it:
https://www.linkedin.com/pulse/my-pov-new-snowflake-features-announced-june-2020-werschkull-/
Hey @bashyroger, check out:
materialized_view dbt materialization in this repoQUERY_REWRITE, as it's the biggest feature currently supported for BigQuery's MVs (docs). I don't know as much about Snowflake's support for this, it sounds like it's TK.Since this thread is warming up, let's make it spicy 🌶️ !
Snowflake recently announced a clever feature they're calling "transparent materialized views".
In the future, if a plan node of a query is equivalent to or is a subset of the plan node of a materialized view, then the materialized view will be used, even if it isn't referenced, because it is provably equal.
So, if you identify common aggregations, and build materialized views for those aggregations, then queries which have steps which are equivalent to those materialized views will be rewritten by the planner to use those materialized views. Alternatively, you could maintain multiple copies of the same table, but with different clustering keys, and snowflake will automatically pick the best one. This is analogous to Vertica's concept of projections.
Personally, I think this is a big deal, and if snowflake expands support to include INNER JOIN and UNION ALL clauses, then chains of MV's might become the best way to express many dags. Even in the absence of that, the ability to maintain copies of tables with various clustering keys means that MVs could accelerate regular views enough to enable decent-performing low latency reporting when combined with a CDC ELT vendor.
EDIT: When I wrote this, I either didn't see the prior two comments, or my page needed to be refreshed. Embarrassing! I'll leave this here though =)
What is the latest state on this one ? Is this issue still open ?
I also want to know this as I am going to source control snowflake and views also need to source control like tables.
Hey @nareshkumaryadav, this issue is definitely still open.
view, table, ephemeral, and incremental—which, with some small exceptions, have the same functionality across all four databases—a materialized_view necessarily means something quite different on each of Postgres, Redshift, Snowflake, and BigQuery.All of which is to say: I'd find it really valuable to hear more about what you're thinking of doing with MVs in your own project.
@kamleshsinhchauhan It sounds like what you're after may be slightly different: we have long supported view and table as materialization types on Snowflake. Check out the docs on materializations. If you are thinking about Snowflake materialized views, then this is the right thread!
Most helpful comment
BigQuery has a similar feature https://cloud.google.com/bigquery/docs/materialized-views-intro
Would definitely be useful