via @cabouffard (originally opened here)
Hey,
I鈥檓 hitting some kind of unexpected errors. I want the thread value of my profiles.yml file to be configurable via ENV variables, such as I can do the following:
default:
outputs:
dev:
type: redshift
threads: "{{ env_var('DEV_DBT_THREADS') }}" # [1-8]
host: "{{ env_var('DEV_WAREHOUSE_DB_HOST') }}"
port: "{{ env_var('DEV_WAREHOUSE_DB_PORT') }}"
user: "{{ env_var('DEV_WAREHOUSE_DB_USER') }}"
pass: "{{ env_var('DEV_WAREHOUSE_DB_PASSWORD') }}"
dbname: "{{ env_var('DEV_WAREHOUSE_DB_NAME') }}"
schema: "{{ env_var('DEV_DBT_SCHEMA_OUTPUT') }}"
staging:
type: redshift
threads: "{{ env_var('STAGING_DBT_THREADS') }}" # [1-8]
host: "{{ env_var('STAGING_WAREHOUSE_DB_HOST') }}"
port: "{{ env_var('STAGING_WAREHOUSE_DB_PORT') }}"
user: "{{ env_var('STAGING_WAREHOUSE_DB_USER') }}"
pass: "{{ env_var('STAGING_WAREHOUSE_DB_PASSWORD') }}"
dbname: "{{ env_var('STAGING_WAREHOUSE_DB_NAME') }}"
schema: "{{ env_var('STAGING_DBT_SCHEMA_OUTPUT') }}"
production:
type: redshift
threads: "{{ env_var('PRODUCTION_DBT_THREADS') }}" # [1-8]
host: "{{ env_var('PRODUCTION_WAREHOUSE_DB_HOST') }}"
port: "{{ env_var('PRODUCTION_WAREHOUSE_DB_PORT') }}"
user: "{{ env_var('PRODUCTION_WAREHOUSE_DB_USER') }}"
pass: "{{ env_var('PRODUCTION_WAREHOUSE_DB_PASSWORD') }}"
dbname: "{{ env_var('PRODUCTION_WAREHOUSE_DB_NAME') }}"
schema: "{{ env_var('PRODUCTION_DBT_SCHEMA_OUTPUT') }}"
target: "{{ env_var('DBT_TARGET') }}"
Unfortunately, when building the docs I鈥檓 raised with the following error:
pipeline-dbt-docs | Encountered an error while reading profiles:
pipeline-dbt-docs | ERROR Runtime Error
pipeline-dbt-docs | Runtime Error
pipeline-dbt-docs | Invalid arguments passed to "ProfileConfig" instance: threads.'4' is not of type 'number'
pipeline-dbt-docs | Defined profiles:
pipeline-dbt-docs | - default
I dug a little bit, and saw that env_var was a macro defined by this that uses os.environ. In this context, os.environ will always return a string.
I then dug into how the validation was being done and found this which expects the value to be a number.
I'm unsure how to properly resolve that other than currently hard-coding a value to something.
I wanted to flag and document this error as this seems to be something that should be fixed.
Thanks!
I think the big problem here is that Jinja will _always_ render out values as strings. While environment variables will be stringy in nature, there's a world in which something like this _could_ work:
default:
outputs:
dev:
type: redshift
threads: "{{ env_var('DEV_DBT_THREADS')| int }}"
....
^ note the pipe through the int filter. This will take a stringy env var (DEV_DBT_THREADS), coerce it to an int, then coerce is _back_ to a string when it's being rendered in the Jinja environment. One option we may have here is to use a jinja native environment. I'm not positive I love this feature, but it seems like it exists for this usecase:
With NativeEnvironment, rendering a template produces a native Python type
Outside of threads and port, this also impacts folks that want to set a dbt var via an env_var. Eg:
models:
vars:
lookback_days: "{{ var('lookback_days', env_var("DEFAULT_LOOKBACK_DAYS", 7)) }}"
^ contrived example, but this will _always_ return a string type, and the user of the var will need to coerce it appropriately.
That's a great point! What about providing a flag to env_var and injecting some parameter candidates into the rendering context wherever we use env_var?
default:
outputs:
dev:
type: redshift
threads: "{{ env_var('DEV_DBT_THREADS', 3, cast=cast.INTEGER) }}"
....
^ I don't think that could actually work! env_var could return _any_ type, and Jinja would still coerce it into a string if I'm understanding things correctly. This is why we made the return() context function -- you need to bypass Jinja's render functionality to actually get a non-string type when using the default Jinja Environment.
I'm running into this as well, because I want to be able to run by DBT models in a CI process, but the database server which is started is on a random port (since multiple builds could be running at once).
Any thoughts on how to handle this? (a command line switch I'm not seeing, etc)
Ah, I see that port has actually been sorted:
My specific failure ended up being related to threads because it also has to be an integer.
closing this via the native env jinja rendering in 0.17.0
@drewbanin this problem has hit me after upgrading to 0.17.2 (from 0.17.0).
I had to cast the port to int like this:
port: "{{ env_var('PGPORT') | int }}"
Is it expected that the port from env_var still has to be casted to int? If yes, I think it would be great if it was mentioned in the documentation.
That's correct, based on changes we made in 0.17.1 to native-env rendering. If you're setting port to an environment variable, you need to filter it | int or | as_number.
I agree that it would be good to add a note about this to the env_var docs. Would you be up to propose that change?
Most helpful comment
I think the big problem here is that Jinja will _always_ render out values as strings. While environment variables will be stringy in nature, there's a world in which something like this _could_ work:
^ note the pipe through the
intfilter. This will take a stringy env var (DEV_DBT_THREADS), coerce it to an int, then coerce is _back_ to a string when it's being rendered in the Jinja environment. One option we may have here is to use a jinja native environment. I'm not positive I love this feature, but it seems like it exists for this usecase:Outside of
threadsandport, this also impacts folks that want to set a dbtvarvia anenv_var. Eg:^ contrived example, but this will _always_ return a string type, and the user of the var will need to coerce it appropriately.