Dbt: log-path (and other output paths) cannot be set to null, because of aggressive defaults

Created on 18 Sep 2020  路  3Comments  路  Source: fishtown-analytics/dbt

Describe the bug

If in your dbt_project.yml you set log-path: null to disable logging, it will default to 'logs' because, as far as I can tell, at this point an unset log-path and a log-path of null/None are treated the same (the dataclass defaults to None here, but even if it didn't , I believe hologram would assign it a value of None here).

Steps To Reproduce

Set log-path: null in dbt_project.yml.

Expected behavior

Logging should be disabled, per this code-path.

System information

Which database are you using dbt with?
Any

The output of dbt --version:

0.18.0

The operating system you're using:
Linux

The output of python --version:
3.8.5

Additional context

Don't know when this got introduced, but I tend to think it is worthwhile to allow people to disable some of the output-producing features if they want to. It's also a little misleading to set log-path: null, but have it default to something else.

For further context, we like to invoke dbt as a python module (similar to the suggestion from @drewbanin on #2013 to use handle_and_check()), which is admittedly unsupported (although it would be awesome if it got traction). Switching to this aggressive default of log_path='logs' which prevents disabling logging has the effect of making it impossible (as far as I can tell) to call handle_and_check() more than once (i.e. seed, then run, maybe test after too) because of this codepath here.

There are a lot of directions this issue could go to address some or all concerns, but I'm wondering:

  • Is a consistent python API to invoking dbt on the radar?
  • Should disabling logging be a feature that is supported?
  • Does the logger really need to fail if handle_and_check() is invoked more than once?
bug

Most helpful comment

Is a consistent python API to invoking dbt on the radar?

It's not on my radar, but it's always possible. The RPC server is kind of intended to fulfill that need. dbt has a little too much global mutable state at the moment for it to be easy to safely run multiple invocations in sequence.

I think you can do this if you want to use the python modules and suppress file logging, and it should be somewhat reliably safe (though I haven't tested it):

from dbt.logger import log_manager
from dbt.main import handle_and_check

with log_manager.applicationbound():
  log_manager.set_path(None)  # this disables logging and bypasses future set_path calls
  handle_and_check(args)

Should disabling logging be a feature that is supported?

Yes.

We can distinguish between "not set" and None in hologram by creating a new object type that represents the not-set case, registering an encoder for it, and setting an instance of it to be the default. The codepath you linked checks if field.default is MISSING, so any default will bypass that behavior if we have a default value set.

Edit: actually, we can probably skip all the field encoder stuff and just do log_path: Optional[str] = 'logs'

Does the logger really need to fail if handle_and_check() is invoked more than once?

It doesn't need to, but I think it is a good protective measure. I don't have any reason to believe that dbt is process-safe across multiple invocations right now and I'd rather fail there than do the wrong thing and drop tables that shouldn't get dropped, or whatever.

All 3 comments

Is a consistent python API to invoking dbt on the radar?

It's not on my radar, but it's always possible. The RPC server is kind of intended to fulfill that need. dbt has a little too much global mutable state at the moment for it to be easy to safely run multiple invocations in sequence.

I think you can do this if you want to use the python modules and suppress file logging, and it should be somewhat reliably safe (though I haven't tested it):

from dbt.logger import log_manager
from dbt.main import handle_and_check

with log_manager.applicationbound():
  log_manager.set_path(None)  # this disables logging and bypasses future set_path calls
  handle_and_check(args)

Should disabling logging be a feature that is supported?

Yes.

We can distinguish between "not set" and None in hologram by creating a new object type that represents the not-set case, registering an encoder for it, and setting an instance of it to be the default. The codepath you linked checks if field.default is MISSING, so any default will bypass that behavior if we have a default value set.

Edit: actually, we can probably skip all the field encoder stuff and just do log_path: Optional[str] = 'logs'

Does the logger really need to fail if handle_and_check() is invoked more than once?

It doesn't need to, but I think it is a good protective measure. I don't have any reason to believe that dbt is process-safe across multiple invocations right now and I'd rather fail there than do the wrong thing and drop tables that shouldn't get dropped, or whatever.

We can distinguish between "not set" and None in hologram by creating a new object type that represents the not-set case, registering an encoder for it, and setting an instance of it to be the default.

This is exactly what I had in mind, said much better than I could have.

Thanks for chiming in! I did something to the effect of what Jake suggested and it seems to work well for the combination of run and seed. Also had similar thoughts about setting the default on the dataclass directly, so None will be respected :)

Was this page helpful?
0 / 5 - 0 ratings