Dbt: Type coercion for profiles.yml fields

Created on 24 Jan 2019  路  8Comments  路  Source: fishtown-analytics/dbt

via @cabouffard (originally opened here)

Hey,

I鈥檓 hitting some kind of unexpected errors. I want the thread value of my profiles.yml file to be configurable via ENV variables, such as I can do the following:

default:
  outputs:
    dev:
      type: redshift
      threads: "{{ env_var('DEV_DBT_THREADS') }}" # [1-8]
      host: "{{ env_var('DEV_WAREHOUSE_DB_HOST') }}"
      port: "{{ env_var('DEV_WAREHOUSE_DB_PORT') }}"
      user: "{{ env_var('DEV_WAREHOUSE_DB_USER') }}"
      pass: "{{ env_var('DEV_WAREHOUSE_DB_PASSWORD') }}"
      dbname: "{{ env_var('DEV_WAREHOUSE_DB_NAME') }}"
      schema: "{{ env_var('DEV_DBT_SCHEMA_OUTPUT') }}"
    staging:
      type: redshift
      threads: "{{ env_var('STAGING_DBT_THREADS') }}" # [1-8]
      host: "{{ env_var('STAGING_WAREHOUSE_DB_HOST') }}"
      port: "{{ env_var('STAGING_WAREHOUSE_DB_PORT') }}"
      user: "{{ env_var('STAGING_WAREHOUSE_DB_USER') }}"
      pass: "{{ env_var('STAGING_WAREHOUSE_DB_PASSWORD') }}"
      dbname: "{{ env_var('STAGING_WAREHOUSE_DB_NAME') }}"
      schema: "{{ env_var('STAGING_DBT_SCHEMA_OUTPUT') }}"
    production:
      type: redshift
      threads: "{{ env_var('PRODUCTION_DBT_THREADS') }}" # [1-8]
      host: "{{ env_var('PRODUCTION_WAREHOUSE_DB_HOST') }}"
      port: "{{ env_var('PRODUCTION_WAREHOUSE_DB_PORT') }}"
      user: "{{ env_var('PRODUCTION_WAREHOUSE_DB_USER') }}"
      pass: "{{ env_var('PRODUCTION_WAREHOUSE_DB_PASSWORD') }}"
      dbname: "{{ env_var('PRODUCTION_WAREHOUSE_DB_NAME') }}"
      schema: "{{ env_var('PRODUCTION_DBT_SCHEMA_OUTPUT') }}"
  target: "{{ env_var('DBT_TARGET') }}"

Unfortunately, when building the docs I鈥檓 raised with the following error:

pipeline-dbt-docs | Encountered an error while reading profiles:
pipeline-dbt-docs |   ERROR Runtime Error
pipeline-dbt-docs |   Runtime Error
pipeline-dbt-docs |     Invalid arguments passed to "ProfileConfig" instance: threads.'4' is not of type 'number'
pipeline-dbt-docs | Defined profiles:
pipeline-dbt-docs |  - default

I dug a little bit, and saw that env_var was a macro defined by this that uses os.environ. In this context, os.environ will always return a string.

I then dug into how the validation was being done and found this which expects the value to be a number.

I'm unsure how to properly resolve that other than currently hard-coding a value to something.

I wanted to flag and document this error as this seems to be something that should be fixed.
Thanks!

bug

Most helpful comment

I think the big problem here is that Jinja will _always_ render out values as strings. While environment variables will be stringy in nature, there's a world in which something like this _could_ work:

default:
  outputs:
    dev:
      type: redshift
      threads: "{{ env_var('DEV_DBT_THREADS')| int }}"
      ....

^ note the pipe through the int filter. This will take a stringy env var (DEV_DBT_THREADS), coerce it to an int, then coerce is _back_ to a string when it's being rendered in the Jinja environment. One option we may have here is to use a jinja native environment. I'm not positive I love this feature, but it seems like it exists for this usecase:

With NativeEnvironment, rendering a template produces a native Python type

Outside of threads and port, this also impacts folks that want to set a dbt var via an env_var. Eg:

models:
  vars:
    lookback_days: "{{ var('lookback_days', env_var("DEFAULT_LOOKBACK_DAYS", 7)) }}"

^ contrived example, but this will _always_ return a string type, and the user of the var will need to coerce it appropriately.

All 8 comments

I think the big problem here is that Jinja will _always_ render out values as strings. While environment variables will be stringy in nature, there's a world in which something like this _could_ work:

default:
  outputs:
    dev:
      type: redshift
      threads: "{{ env_var('DEV_DBT_THREADS')| int }}"
      ....

^ note the pipe through the int filter. This will take a stringy env var (DEV_DBT_THREADS), coerce it to an int, then coerce is _back_ to a string when it's being rendered in the Jinja environment. One option we may have here is to use a jinja native environment. I'm not positive I love this feature, but it seems like it exists for this usecase:

With NativeEnvironment, rendering a template produces a native Python type

Outside of threads and port, this also impacts folks that want to set a dbt var via an env_var. Eg:

models:
  vars:
    lookback_days: "{{ var('lookback_days', env_var("DEFAULT_LOOKBACK_DAYS", 7)) }}"

^ contrived example, but this will _always_ return a string type, and the user of the var will need to coerce it appropriately.

That's a great point! What about providing a flag to env_var and injecting some parameter candidates into the rendering context wherever we use env_var?

default:
  outputs:
    dev:
      type: redshift
      threads: "{{ env_var('DEV_DBT_THREADS', 3, cast=cast.INTEGER) }}"
      ....

^ I don't think that could actually work! env_var could return _any_ type, and Jinja would still coerce it into a string if I'm understanding things correctly. This is why we made the return() context function -- you need to bypass Jinja's render functionality to actually get a non-string type when using the default Jinja Environment.

I'm running into this as well, because I want to be able to run by DBT models in a CI process, but the database server which is started is on a random port (since multiple builds could be running at once).

Any thoughts on how to handle this? (a command line switch I'm not seeing, etc)

Ah, I see that port has actually been sorted:

https://github.com/fishtown-analytics/dbt/blob/aa0e58ef3bbc863c880b7f5b0a316433e65e87ac/core/dbt/config/renderer.py#L65-L73

My specific failure ended up being related to threads because it also has to be an integer.

closing this via the native env jinja rendering in 0.17.0

@drewbanin this problem has hit me after upgrading to 0.17.2 (from 0.17.0).

I had to cast the port to int like this:

      port: "{{ env_var('PGPORT') | int }}"

Is it expected that the port from env_var still has to be casted to int? If yes, I think it would be great if it was mentioned in the documentation.

That's correct, based on changes we made in 0.17.1 to native-env rendering. If you're setting port to an environment variable, you need to filter it | int or | as_number.

I agree that it would be good to add a note about this to the env_var docs. Would you be up to propose that change?

Was this page helpful?
0 / 5 - 0 ratings