Dbt: Database and schema names with new lines work on Snowflake... until they don't

Created on 18 May 2020  Â·  2Comments  Â·  Source: fishtown-analytics/dbt

Not really a bug, more just an observation of some odd behavior.

Describe the bug

  1. Create a snapshot where the schema includes a new line (note: this is more likely to happen because of logic in generate_schema_name etc)
{% snapshot my_snapshot %}

{{
    config(
      target_schema='
      snapshots_test',
      unique_key="id",

      strategy='timestamp',
      updated_at='run_at',
    )
}}


select 1 as id, current_timestamp() as run_at

{% endsnapshot %}

  1. Do an initial run of the snapshot
  2. Do a subsequent run of a snapshot

Expected behavior

There should be two records in the snapshot. OR the snapshot should fail because of the new lines in the schema name

Actual behavior

The snapshot actually gets recreated.

Screenshots and log output

Here's the query that dbt runs to find the existing relation:

(on v0.16.1)

select count(*)
        from analytics.INFORMATION_SCHEMA.schemata
        where upper(schema_name) = upper('
      snapshots_test')
            and upper(catalog_name) = upper('analytics')

Of course, it doesn't return anything.

Meanwhile, the SQL to create the table ... works ¯_(ツ)_/¯

create schema if not exists analytics.
      snapshots_test

      create or replace transient table analytics.
      snapshots_test.my_snapshot  as
      ( ... )

Note, on v0.17.0, this fails, but with a message I'm surprised by:

Completed with 1 error and 0 warnings:

Compilation Error in macro create_schema (macros/adapters/common.sql)
  macro 'dbt_macro__create_schema' takes not more than 1 argument(s)

  > in macro materialization_snapshot_default (macros/materializations/snapshot/snapshot.sql)
  > called by macro create_schema (macros/adapters/common.sql)

I thought this would work since this query returns results:

show terse objects in analytics.
      snapshots_test

System information

Which database are you using dbt with?

  • [ ] postgres
  • [ ] redshift
  • [ ] bigquery
  • [x] snowflake
  • [ ] other (specify: ____________)

The output of dbt --version:
Unexpected behavior on 0.16.1
Odd failure on 0.17.0

Additional context

Add any other context about the problem here.

Super happy to get a wontfix on here. But basically, I was just surprised that the create transient table statement worked in the first place with the new lines, so wanted to jot this down in case it makes someone think something else might be broken.

Other solutions:

  1. Update the default generate_<thing>_name macros to use {{ return() }}, and update the docs too, to prevent people making this mistake in the first place.
  2. Update the adapter SQL to use whitespace control in lots of places:
-- before
 create schema if not exists {{ database }}.{{ schema }}

-- after
 create schema if not exists {{- database -}}.{{- schema -}}

(OK that's not the actual SQL we have, more an illustrative point)

  1. Do validation in dbt somewhere to check that schema / database names don't have whitespace in them
bug

All 2 comments

thanks for recording this one @clrcrl. I'd definitely be in favor of changing our docs to use return instead of implicitly returning! We do trim whitespace from these macro return values in python-land, but you're definitely right that we _don't_ do that when you call these macros from jinja-land directly

For those arriving here from a google search, I also encountered the same error message, in snowflake, from DBT when the role that DBT uses does not have sufficient permissions (usage, select, etc) on the schema/tables used in the snapshot process.

Was this page helpful?
0 / 5 - 0 ratings