Dbt: DBT relationship test does not correctly support column quoting

Created on 19 May 2020  Â·  13Comments  Â·  Source: fishtown-analytics/dbt

Describe the bug

When defining a relationship test (In this case between Person and Invitation) the generated DBT test does not link the relationship back to the destination table column definition. Hence, it does not recognise that a column in a relationship test _may_ require quoting.

Steps To Reproduce

With a Source as follows:
image

The following test is generated, incorrectly leaving the quotes off the column named Id in the generated sql test.
image

The following error occurs:
image

Expected behavior

I would expect DBT to link the column with the definition and respect the quoting setting chosen.
For convenience, it may be useful to also add column quoting configuration at the model _and_ source level.

Screenshots and log output

As above

System information

Which database are you using dbt with?

  • [ x] postgres
  • [ ] redshift
  • [ ] bigquery
  • [ ] snowflake
  • [ ] other (specify: ____________)

The output of dbt --version:

installed version: 0.16.1
   latest version: 0.16.1

The operating system you're using:
Mac
The output of python --version:
Python 3.7.3

Additional context

A valid work-around (via Drew) is to quote the Id column manually in the relationships test. You’ll need to use two sets of quotes — one to denote that the value is a string (for yaml) and the next to be passed into the SQL query.
image

bug good first issue tests

All 13 comments

I followed the thread to here before losing where the relationship test may be held:
https://github.com/fishtown-analytics/dbt/blob/dev/0.16.1/core/dbt/parser/schemas.py#L236-L273

Thanks for the detailed write-up @MarcelvanVliet!

I would expect DBT to link the column with the definition and respect the quoting setting chosen.
For convenience, it may be useful to also add column quoting configuration at the model and source level.

Currently, dbt supports project-level quoting configurations for relation namespaces: database, schema, and identifier. (There are specially scoped overrides for sources and seeds.)

There isn't a broad-purpose quoting configuration for columns; we tend to accomplish this in utility macros via adapter.quote, or by accessing the quoted attribute of get_columns_in_relation. We could add one; it would be a nontrivial amount of work to synchronize behavior across those macros, and quoting is always trickier than expected and especially liable to database idiosyncrasies.

We'd need to establish what exactly such a lift would get us, since (as you mention) there are workarounds available, albeit a bit funny looking.

@jtcohen6 I caught up with @MarcelvanVliet about this on Slack. I think it would be really great if we could leverage the quoting configs from these .yml files directly in the relationships test. I bet it would be a pretty big change two save exactly 4 keystrokes, but dbt _should_ have all the info it needs to do the right thing here automatically!

So, this probably isn't something we'd hit in the immediate future, but I'd like to keep this issue open to track the change :)

Ah ok! That makes sense. I missed it in my first read-through.

I just tagged this a good first issue, as I think it'd be a fairly straightforward change. Hint: there's some prior work in #2733 that may be helpful :)

Noting that this doesn't just fail with relationships requiring quoting, but fails in general with any sources in Postgres.

@jordan8037310 Could you share the specific code snippet + error you're seeing? Not sure I follow fully

When running unique or not_null tests, the tests fail if there are column names e.g. "ipAddress".

Noting that the workaround _does_ work, '"ipAddress"', but I think the scope of the testing for this issue should be expanded to include unique, not_null, and all of the other tests (not just relationship test as indicated in the issue title).

Error

Database Error in test source_unique_public_source_ip_whois_ipAddress (models/sources.schema.yml)
  column "ipaddress" does not exist
  LINE 10:         ipAddress
                   ^
  HINT:  Perhaps you meant to reference the column "source_ip_whois.ipAddress".

Copy of generated sql for the test:

select count(*) as validation_errors
from (

    select
        ipAddress

    from "facetlgp"."public"."source_ip_whois"
    where ipAddress is not null
    group by ipAddress
    having count(*) > 1

) validation_errors

@jordan8037310 Did you specify quote: true for ipAddress in your YAML file like:

sources:
  - name: public
    table: source_ip_whois
    columns:
      - name: ipAddress
        quote: true
        tests:
          - unique

?

@jtcohen6 - no, althought I just tested that and it works! Thanks.

Here are some things I tried. I mainly assumed that if I did applied the settings at the dbt_project.yml level it persisted down.

I tried putting it in at the dbt_project.yml for quoting with the following and it quoted everything in my project... except the sources.

quoting:
      schema: true
      identifier: true

After happening upon this thread, I saw that sources were handled differently so I tried:

# sources.schema.yml

version: 2

sources:
  - name: public
    schema: public
    loaded_at_field: '"updatedAt"'
    loader: serverless_framework
    quoting:
      schema: true
      identifier: true

To be honest I never once saw the quote parameter specified once in documentation when searching google "dbt sources quoting":

So perhaps this is more an issue of a hidden setting for me and documentation about quoting should be updated?

EDIT: Added another link where quote is not mentioned as opposed to quoting

Really fair point. I guess I could've hoped that you would've magically stumbled upon https://docs.getdbt.com/reference/resource-properties/quote/.

Why do we call it quoting for database/schema/identifier names, but then quote for column names? Adding this to my list of nomenclature we might make more consistent in advance of dbt v1.0.

On page 1 of google this links to a 404 and should probably be redirected: https://dbt.readme.io/v0.10/docs/configuring-quoting

Oof, this is _old_.

@jtcohen6 +1 normalization of the property names would have made this easier to search and find.

There's also quote_columns, which is a seed-only config item that lives on its own level but surely belongs inside the quoting config item (or will it be quote?!) as columns 😬

Was this page helpful?
0 / 5 - 0 ratings