Dbt: Storing test results in the database

Created on 25 Jun 2020 · 7Comments · Source: fishtown-analytics/dbt

Describe the feature

dbt testing is powerful; being notified when a test fails can be life-saving; debugging those failed tests could be easier!

There are two long-lived issues related to this subject (#517 and #903), and dozens of valuable comments on both. After discussing with several colleagues and a few community members, we've arrived at an approach I feel good about.

The goal here isn't to build a dimensional model of historical data surrounding dbt invocations. I think we can and should build out capacity for those longitudinal analyses, but it should be powered by dbt artifacts (manifest and run results) rather than adding records to database tables (a la logging package).

Rather, the goal is to add tooling that can be of immediate benefit to dbt developers, for whom debugging tests is a slog that's harder than it needs to be. We want to provide the post-test investigator with a high-level overview of dbt test statuses, and the errant records from any failing tests, within a query or two.

A lot of this work is going to build on top of v2 schema tests (#1173) and data tests (#2274), both of which are on the 1.0.0 docket. So while this feature is not coming in the next month or two, it is firmly on the roadmap.

Spec: v2 schema tests

To get where we want to go, we need to talk a little about our planned changes to schema tests.

Let's imagine a modified version of the current not_null schema test. Several small things are different, the most significant of which is that the test returns a set of records instead of just count(*), similar to how data tests work today:

{% test not_null(model, column_name) %}

    {% set name = 'not_null__' ~ model.name ~ '__' ~ column_name %}

    {% set description %} Assert that {{ column_name }} is never null in {{ model }} {% endset %}

    {% set fail_msg %} Found {{ result }} null values of {{ model }}.{{ column_name }} {% endset %}

    {{ config(name=name, description=description, fail_msg=fail_msg) }}

    select * from {{ model }}
    where {{ column_name }} is null

{% endmacro %}

Let's say we have two not_null tests defined in our project, in a file called resources.yml:

version: 2

models:
  - name: customers
    colums:
      - name: id
        tests:
          - not_null

  - name: orders
    columns:
      - name: order_status
        tests:
          - not_null

The standard way of executing these tests looks like compiling and running some queries, and storing the compiled version of those queries in the target directory:

-- compiled SQL stored in target/compiled/models/resources.yml/schema_test/not_null__customers__id.sql`

select count(*) from (
    select * from dev_jcohen.customers
    where id is null
) validation_errors
;

-- compiled SQL stored in target/compiled/models/resources.yml/schema_test/not_null__orders__order_status.sql`

select count(*) from (
    select * from dev_jcohen.orders
    where order_status is null
) validation_errors
;

After running, dbt returns any failures as log output to the CLI:

12:00:01 | 1 of 2 START test not_null__customers__id.........................[RUN]
12:00:02 | 1 of 2 PASS not_null__customers__id.............................. [PASS in 1.00s]
12:00:03 | 2 of 2 START test not_null__orders__order_status................. [RUN]
12:00:04 | 2 of 2 FAIL not_null__orders__order_status....................... [FAIL 7 in 1.00s]

Failure in test not_null__orders__order_status (models/resources.yml)
  Found 7 null values of orders.order_status

  compiled SQL at target/compiled/models/resources.yml/schema_test/not_null__orders__order_status.sql

There are many features we imagine unlocking with v2 testing:

In addition to warn severity, more configurable warn_after and error_after thresholds based on a scalar count or %
Tagging tests to customize failure notifications

For now, let's focus on storing the results of these tests in the database for easier debugging.

Spec: storing results

Now let's imagine a new test flag:

$ dbt test --store-results

It's unlikely that dbt developers would want to include this flag when iterating on net-new models, but it's quite likely that production or CI test runs would benefit from including it.

When the --store-results flag is included, dbt will instead execute tests like so:

drop schema if exists dev_jcohen__dbt_test_results;
create schema dev_jcohen__dbt_test_results;

create table dev_jcohen__dbt_test_results.not_null__customers__id as (

    -- compiled SQL stored in target/compiled/models/resources.yml/schema_test/not_null__customers__id.sql`

    select * from dev_jcohen.customers
    where id is null

);

create table dev_jcohen__dbt_test_results.not_null__orders__order_status as (

    -- compiled SQL stored in target/compiled/models/resources.yml/schema_test/not_null__orders__order_status.sql`

    select * from dev_jcohen.orders
    where order_status is null

);

Based on the row count in the query status returned by the database, or by running a quick count(*) on each table after creation, dbt will determine whether the test passed or failed and create a summary table of all tests in the invocation. (We'll use a mechanism similar to dbt seed to batch inserts and avoid any limits on inserts or query length.)

create table dev_jcohen__dbt_test_results.summary (
    invocation_id string,
    test_name string,
    status string,
    execution_time timestamp
);

insert into dev_jcohen__dbt_test_results.summary values (
    ('xxxxxxxx-xxxx-0000-0000-xxxxxxxxxxxx', 'not_null__customers__id', 'PASS', '2020-06-25 12:00:01'),
    ('xxxxxxxx-xxxx-0000-0000-xxxxxxxxxxxx', 'not_null__orders__order_status', 'FAIL 7', '2020-06-25 12:00:03')
);

And finally log to the CLI:

12:00:01 | 1 of 2 START test not_null__customers__id.........................[RUN]
12:00:02 | 1 of 2 PASS not_null__customers__id.............................. [PASS in 1.00s]
12:00:03 | 2 of 2 START test not_null__orders__order_status................. [RUN]
12:00:04 | 2 of 2 FAIL not_null__orders__order_status....................... [FAIL 7 in 1.00s]

Failure in test not_null__orders__order_status (models/resources.yml)
  Found 7 null values of orders.order_status
    -------------------------------------------------------------------------
    select * from dev_jcohen__dbt_test_results.not_null__orders__order_status
    -------------------------------------------------------------------------

  compiled SQL at target/compiled/models/resources.yml/schema_test/not_null__orders__order_status.sql

Questions

Test definition

dbt will depend entirely on the test block to circumscribe its failures. Each test block query will be responsible for defining:

Which columns to include in the table of test failures. E.g. *, an explicit column list, dbt_utils.star or * (except) (on BigQuery).
How many rows to include in the table of test failures, e.g. by adding limit 500.

Audit schema

dbt will store all results in one audit schema per dbt test --store-results invocation. It will name the schema {{ target.schema }}__dbt_test_results.

Can I name the schema differently? What about tests on models with custom schemas?

We're going to keep this simple for the first version by storing all tests in one schema with a set name. We envision adding configuration for this later, especially if there is user demand.

Why does dbt drop and recreate the dbt test audit schema every time?

dbt test is a stateless, idempotent operation; dbt test --store-results will be as well. If you run it several times, you will end up with the same artifacts.

How can I preserve history of past test failures?

The same way you preserve historical data from sources, models, or anything else: snapshots on top of tables in the audit schema, or your own custom run-operation macros to copy or unload that data to external storage.

Granular control

The --store-results flag flips dbt test from running a set of queries that performs only introspection to a set of queries that create assets in the database.

Will it be possible to store results for some tests, and not for others?

In the first version of this, the --store-results flag will change the behavior of all tests run. Eventually, we envision adding a test config flag (store_results: false) that takes precedence over the CLI flag, similar to changes to --full-refresh in 0.18.0.

Until then, you can manage this with multiple commands and node selection:

dbt test --store-failures --exclude my_massive_model
dbt test --models my_massive_model

discussion enhancement tests

Source

jtcohen6

👍12 ❤3

Most helpful comment

While I like what is written in the FR, I even more agree with @brunomurino 's argument: '_If you want to write stuff to the database, why not make it a model?'_

I do get that DBT is stateless now, but IMO this should not apply for tests. After all, you also don't delete the default logging metadata after each DBT run, but simply append to the existing log.

if the hassle of managing a database for storing test results is too much work:
Why not write out the results of tests like @jtcohen6 and @brunomurino are proposing here to a testing log, structured as JSON?
We could then simply use that as a data source, for a Snowflake / BigQuery external table _(or other analytical database supporting JSON as source)_

Taking @brunomurino 's input as a starter, I would then write the results of a test to 5 JSON keys per test:

DBT run_id
model name
test name
tested column(s), as an array
test started datetime
test finished datetime
test status, _(with vals like [OK,WARNING,FAIL])_ with the status FAIL based on a treshhold that is 0 by default, but can be overwritten. Like this example: https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966
test count, _see below for more info_
test details , in a nested structure that @brunomurino mentioned, _see below for more info_

The important detail for me would then be _what_ DBT will write to the test_count and test_details fields.
These are some ideas for the standard tests:

UNIQUE:
-Logging the count of violations in test_count is good.
-For better debugging and following DQ over time, logging the actual keys that fail the uniqueness check in test_details is _better_
NOT NULL:
-Logging the count of violations in test_count is good.
-For better debugging and following DQ over time, logging the violation count per unique key that fails the NOT NULL check in test_details is _better_. The latter REQUIRES an unique key to be defined or UNIQUE test to be present on a model.
ACCEPTED VALUES:
-Logging the count of violations in test_count is good.
-Logging the count of violations per _OTHER than accepted_ in test_details better
-Knowing the exact rows that fail this test plus _OTHER than accepted_ value in test_details even better. The last option again REQUIRES an unique key to be defined or UNIQUE test to be present on a model.
RELATIONSHIPS:
-Logging the count of violations in test_count is good.
-Logging the parent keys not being present in test_details better.

If you want more input @jtcohen6 , I can mock up some of my proposed test logging JSON records for you...

bashyroger on 30 Jun 2020

👍3

All 7 comments

Heya, just giving my 2 cents here.

If you want to write stuff to the database, why not make it a model?

A few weeks ago people asked me this exact same thing about storing test failure results in the db, in order for them to be able to fully deprecate a much more expensive tooling they had in place. The solution I came up with was to have tests as models that output a single line, with exactly 1 row and at least the following 3 columns: test_name, test_result and test_details. On the test_details column I aggregate whatever information is relevant to the test, such that when posted on slack (by reading the target compiled file), it has like breaks and etc. After the model runs, there's also a custom schema test to check the column test_results for the value 'FAILED', pretty straightforward. Finally, I used a post-hook to, if the test failed, insert the results in a 'test_history' table, shared between all tests following this patter.

I must say it has worked pretty well for now, but of course, I don't think this would solve every use case.

Here's one of the examples:

with
{{import('base__account_management__transaction')}},
{{import('base__account_management__lender_account')}},

issues as (
    select

        account_id,
        sum(amount) as current_balance

    from base__account_management__transaction t

    join base__account_management__lender_account la
        on t.account_id = la.id

    where t.status in ('CONFIRMED', 'CONFIRMED_AWAITING_APPROVAL', 'CORRECTED')
        and la.account_status = 'CLOSED'

    group by account_id

    having current_balance >= 1

)

select

    'AllClosedAccountsHaveAZeroBalance' as test_name, -- could be {{ model.name }}

    case
        when count(*) != 0 then 'FAILED'
        else 'PASSED'
    end as test_result,

    'Number of accounts = ' || count(*) || '\n\n' ||
    listagg('Account: ' || account_id || '\nBalance: ' || current_balance, '\n\n') as test_details

from issues

Here's the macro I used:

{% macro insert_into_tech_rec(tablename) %}

-- you can check if `select test_results from {{ this }}` is FAILED or not to proceed

INSERT INTO {{ target.schema }}.{{tablename}} (run_date, test_name, test_result, test_details)

with
test_table as (
    select * from {{ this }}
)
SELECT

    getdate() as run_date,
    '{{ this.table }}' as test_name,

    test_table.test_result,
    test_table.test_details

FROM test_table
;
{% endmacro %}

And here's the sample slack alert I got from it.

Screenshot 2020-06-26 at 00 29 35

We're also able to surface that test_history table in our viz tool to try and detect patters in test failures etc.

Maybe this could be translated into a custom materialization, no?

Anyway, just thought it was worth sharing..

brunomurino on 26 Jun 2020

👍3

While I like what is written in the FR, I even more agree with @brunomurino 's argument: '_If you want to write stuff to the database, why not make it a model?'_

I do get that DBT is stateless now, but IMO this should not apply for tests. After all, you also don't delete the default logging metadata after each DBT run, but simply append to the existing log.

Taking @brunomurino 's input as a starter, I would then write the results of a test to 5 JSON keys per test:

DBT run_id
model name
test name
tested column(s), as an array
test started datetime
test finished datetime
test status, _(with vals like [OK,WARNING,FAIL])_ with the status FAIL based on a treshhold that is 0 by default, but can be overwritten. Like this example: https://discourse.getdbt.com/t/creating-an-error-threshold-for-schema-tests/966
test count, _see below for more info_
test details , in a nested structure that @brunomurino mentioned, _see below for more info_

The important detail for me would then be _what_ DBT will write to the test_count and test_details fields.
These are some ideas for the standard tests:

UNIQUE:
-Logging the count of violations in test_count is good.
-For better debugging and following DQ over time, logging the actual keys that fail the uniqueness check in test_details is _better_
NOT NULL:
-Logging the count of violations in test_count is good.
-For better debugging and following DQ over time, logging the violation count per unique key that fails the NOT NULL check in test_details is _better_. The latter REQUIRES an unique key to be defined or UNIQUE test to be present on a model.
ACCEPTED VALUES:
-Logging the count of violations in test_count is good.
-Logging the count of violations per _OTHER than accepted_ in test_details better
-Knowing the exact rows that fail this test plus _OTHER than accepted_ value in test_details even better. The last option again REQUIRES an unique key to be defined or UNIQUE test to be present on a model.
RELATIONSHIPS:
-Logging the count of violations in test_count is good.
-Logging the parent keys not being present in test_details better.

If you want more input @jtcohen6 , I can mock up some of my proposed test logging JSON records for you...

bashyroger on 30 Jun 2020

👍3

Hi @jtcohen6, I can see from the status of this ticket I see that implementing this is not planned yet. Can you give me any indication if / when this might happen?
Imo the most basic way DBT could start implementing this is already shared in ticket #903 , which comes down to adding in functionality for hooking into when a test starts or ends, like on-run-start / end:

on-test-start:
- "create table if not exists {{ target.schema }}.audit_dbt_test_results (node text, status text, execution_time decimal(10,2), time timestamp)"
on-test-end:
- "insert into {{ target.schema }}.audit_dbt_test_results (node, status, execution_time, time) values {{ results_values(results) }}"

bashyroger on 15 Jul 2020

@joshtemple, you also might want to contribute to this FR as it greatly overlaps with https://github.com/fishtown-analytics/dbt/issues/517

bashyroger on 16 Jul 2020

@brunomurino @bashyroger Thank you both for sharing your thoughts, and for engaging seriously with this proposal!

This feature is not currently prioritized, insofar as it depends on—and is meant to inform—work we want to do around v2 test nodes (#1173, #2274) for v1.0.0. My goal in this issue is to shape the feature we want to eventually support, so that we can scope out the required changes to tests that would support it.

You're absolutely right that there are other approaches possible today, which leverage the Results object or dbt logs/artifacts. Bruno laid out a compelling one above. With that multiplicity in mind, I believe the built-in approach needs to be:

Easy to use

This approach needs to make sense for the most basic, straightforward dbt projects. It should be accessible to users who can write and run SQL against their data warehouse. It shouldn't depend on an ability to write Jinja or parse JSON (trickier in some databases than in others). It should feel like a natural outgrowth of what dbt test does today.

My goal is not one of consolidation. It's not to create the be-all and end-all mechanism for storing test results in the database or preserving test metadata. Instead, I want a starting point that works well for most projects, while also opening the door for custom behavior as dbt users become more comfortable writing Jinja.

Idempotent

I believe that we guarantee sensible results and intuitive process by keeping this approach idempotent. I take this point:

I do get that DBT is stateless now, but IMO this should not apply for tests. After all, you also don't delete the default logging metadata after each DBT run, but simply append to the existing log.

It's true that dbt wants to be stateful when it comes to its project files (via git / version control) and debug logs. Every database operation it performs, however, is ultimately idempotent—with snapshots being the exception that prove the rule. It's important to me that dbt test remains a stateless and straightforward operation. If the user wishes to extend that operation and preserve state, they have levers to do so.

To achieve in-database statefulness, you'll be able to create snapshots that select from the stored test result tables. (Can those snapshots ref the test results? I think so—and it would establish a DAG dependency of model --> test --> snapshot.)

You could also put a post-hook on those snapshots, or a run-operation after dbt test --store-results, to offload into external file storage. As far as out-of-database statefulness, though, I think the best answer is saving dbt artifacts (especially run_results.json) to external file storage and performing separate longitudinal analysis. I think dbt Cloud could have compelling answers here, too.

Extensible

The --store-results mechanism needs to rely on the schema test definition as much as possible, since that's the piece that users can most readily copy, customize, and override. I don't believe this feature should be the place to encode special logic about desired behavior for unique that differs from not_null, accepted_values, and the universe of other custom tests.

In order to achieve what Rogier proposed:

For better debugging and following DQ over time, logging the violation count per unique key that fails the NOT NULL check in test_details is better. The latter REQUIRES an unique key to be defined or UNIQUE test to be present on a model.

I think it would reasonable to write a not_null adaptation, not_null_by_key, that looks somethng like the code below. In thinking through this, I realize that schema tests should have the option of defining a result aggregate calc _other_ than count(*).

{% test not_null_by_key(model, column_name, unique_key) %}

    {% set name = 'not_null_by_key__' ~ model.name ~ '__' ~ column_name ~ '_by_' ~ unique_key  %}

    {% set description %} Assert that {{ column_name }} is never null in {{ model }} {% endset %}

    {% set result_calc = 'sum(null_values)' %}

    {% set fail_msg %} Found {{ result }} null values of {{ model }}.{{ column_name }}. See the breakdown by {{ unique_key }}: {% endset %}

    {{ config(name=name, description=description, result_calc=result_calc, fail_msg=fail_msg) }}

    select {{ unique_key }}, count(*) as null_values
    from {{ model }}
    where {{ column_name }} is null
    group by 1

{% endmacro %}

jtcohen6 on 22 Jul 2020

Hi @jtcohen6, I stubled on this article that shares some good ideas on how testing could be improved in DBT:
https://www.infinitelambda.com/post/dbt-testing-tools-gap

What I specifically liked about the article is the suggesting of defining tests as a Gherkin document:
https://cucumber.io/docs/gherkin/reference/

bashyroger on 16 Sep 2020

I want to store results of dbt source snapshot-freshness too in addition to results of tests.