It would be excellent if it were possible to provide vars to models in schema.yml files like so:
version: 2
models:
- name: <model name>
description: <description>
docs:
show: <true|false>
meta:
key1: <value1>
key2: <value2>
vars: // Like this
key1: <value1>
key2: <value2>
tests:
- <test>
- ...
columns:
- name: <column name>
quote: <true|false>
tags: <tags>
description: <column description>
tests:
- <test>
- ...
- ...
I'm suggesting this feature because the dbt_project.yml file gets considerably large and hard to manage at scale, when needing to provide variables to multiple models. It would be a lot easier to manage if there were a way to split the variables into sub-files, and I thought the schema.yml files would be a perfect candidate, as they already exist and are designed to hold other information related to models.
This would mainly benefit larger companies who have created libraries of macros/packages with arguments whose values differ between models, and would prefer not to fill their model files with the setting of variables, but find it too difficult to manage if it's all in the dbt_project.yml
hey @DVAlexHiggs - cool idea! Unfortunately, I don't think this is something we're going to add support for.
If you check out #2312 , you can see that we very recently made a _ton_ of changes around variable scoping. Historically, variables were scoped at the folder-level. This meant that you could do something like:
models:
my_project:
path1:
vars: {myvar: abc123}
path2:
vars: {myvar: def456}
In this setup, a model in models/path1 would get a variable called myvar with the value abc123 whereas a model in models/path2 would get a value of def456. This made it _really_ tricky to correctly build compilation contexts for different node types. Imagine you have a schema.yml file in models/path2 which describes a model in models/path1 -- what do you think the myvar value should evaluate to in that context?
To resolve inconsistencies like this, we've added a vars: block to the dbt_project.yml file which cannot be scoped to a particular path. It looks like:
vars:
myvar: abc123
models:
...
Note though that variables _can_ still be scoped to packages:
vars:
package1:
myvar: abc123
package2:
myvar: abc123
I just wanted to give some background on how we're thinking about variables, and some of the recent changes that impact variable usage. Given these changes, I don't think we'll be able to add model-scoped variables like you've described here. I _do_ however think your use case is a good one, and I can imagine two paths forward:
schema.yml filesThis would look like:
models:
- name: my_model
config:
key1: <value1>
key2: <value1>
We could then provide a way to tap into these configuration values in the model .sql file context, eg:
select {{ config.get('key1') }} from ....
dbt_project.yml fileSee #1790 for more info
Curious to hear what you think about these changes and options
Hey @drewbanin thanks for your response. I think both of these solutions have their merits. Solution 1 is more dbt-esque and provides more functionality to the current schema.yml, and keeps all metadata/information about models in one place.
I think solution 2 is a powerful alternative with a lot of versatility, but perhaps more care than usual would need to be taken with directory structures and organisation of the YAML files.
I think personally I would prefer solution 1.
HI, any news on this? Really excited if this is being considered. I'm happy to contribute it myself as well, though I would need a point in the right direction to start with. Perhaps we could talk about this?
Ok - thanks for the additional context for this one on Slack @DVAlexHiggs.
To recap: You're leveraging the datavault package which relies on vars to configure models pks, natural keys, load timestamps, etc. I do think the dbt_project.yml file was a poor place for configs like that historically, so in light of the vars changes in 0.17.0, we should figure out a better path forwards.
I really like the idea of allowing model configs in .yml files. We have a dedicated issue for this over here: https://github.com/fishtown-analytics/dbt/issues/2401. I think #2401 was inspired by this very issue :)
config, not vars:version: 2
models:
- name: <model name>
description: <description>
docs:
show: <true|false>
meta:
key1: <value1>
key2: <value2>
config: // Like this
key1: <value1>
key2: <value2>
We'll need to think about the order of parsing. I think a sensible order of operations is:
dbt_project.yml configsschema.yml file configs.sql file configsIf we make the configs from the dbt_project.yml file and schema.yml files available in the .sql file context, then you could simply use config.get() instead of var() to capture these config values. This should address your use case with the added benefit of deflating the size of your dbt_project.yml file which is, i bet, huge for large dbt projects on datavault :)
@drewbanin I think the order you've specified makes sense. This will be a huge help for uber-templated models
Yes the order makes sense, and I agree that config is a better way forward
Closing in favor of #2401
Most helpful comment
@drewbanin I think the order you've specified makes sense. This will be a huge help for uber-templated models