When developping a kedro plugin, i regularly need to access to configs and potentielly some plugin-specific configs files. Since the plugin use hook mechanism, i no longer can bring whatever context attribute to my hook implemantation (except the parameters defined in the hook specs).
Here in the kedro-mlflow plugin we were forced to redefine a ConfigLoader instance inside the plugin.
That lead to incoherence between the context ConfigLoader property and the new Configloader created inside the hook.
Other plugins will need this functionality, i imagine a kedro-spark plugin that use hook mechanism and access a spark config file from project folder path (spark.yml), or a kedro-sas plugins that do the same thing (getting configs in order to create a parametrized session)
A possible implementation is to pass the context config_loader to the hook.
hook specs
@hook_spec
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Pipeline, catalog: DataCatalog, config_loader: ConfigLoader
) -> None:
context
hook_manager = get_hook_manager()
hook_manager.hook.before_pipeline_run( # pylint: disable=no-member
run_params=record_data, pipeline=filtered_pipeline, catalog=catalog, config_loader=self.config_loader
)
Hi @takikadiri , you've highlighted a very good point. We thought about this and we've actually added a set of hooks to register library components, such as pipelines, data catalog, and config loader, with a Kedro project. I think might solve your use case.
This functionality will be made available in 0.16.5, which is going to be released very soon. :)
Thank you @lorenabalan for the quick reply ! It's realy great having the possibility to registrer library component such as the config loader, i will certainly use it.
But my point here is about not having the possibility to pass the config loader instance (created with register_config_loader) to another hook let's say the pipeline_before_run hook.
There may be something that escapes me about the hook mechanisms :)
Hello @lorenabalan, I am not sure if I miss the point but I think this is not what is at stake here, correct me if I'm wrong.
I don't know if this is the best place to write this or if it should be in another issue, but here is a more detailed description of the problem and discussion on different design decisions and potential decisions.
Since hooks have been released in kedro==0.16.0, they have become a popular tool among developers who create kedro plugins (to be honest the community is small but quite active 馃槈 ).
It is a common pattern for hook to need to access to configuration files (for instance to create a session with an external tool with credentials, to use parameters inside the hook and more likely in the case of kedro-mlflow to use a custom config file for the plugin.
I personnaly feel that this configuration file access must be template-independent. The hook is not supposed to assume anything on the template (which may be changed by the user) since the ProjectContext already have all the necessary informations (i.e. mainly the ConfigLoader initiated but potentially other attributes of the ProjectContext). If the hook needs to recreate any attributes of the ProjectContext (for instance the ConfigLoader), there is a high risk that the hook behaves differently than the ProjectContext, which is something we absolutely want to avoid.
register_config_loader (for instance to use the TemplatedConfigLoader in your documentation):from kedro.config import TemplatedConfigLoader
class ProjectHooks:
@hook_impl
def register_config_loader(self, conf_paths: Iterable[str]) -> ConfigLoader:
return TemplatedConfigLoader(
conf_paths,
globals_pattern="*globals.yml",
globals_dict={"param1": "pandas.CSVDataSet"}
)
mlflow.yml) inside hook calls. For instanceclass MlflowNodeHook:
@hook_impl
def before_node_run(
self,
node: Node,
catalog: DataCatalog,
inputs: Dict[str, Any],
is_async: bool,
run_id: str,
) -> None:
# get the config loader of the current context
config_loader = get_config_loader() # actually, config_loader is not available here, this magic function does not exist! i need to eventually get the one registered in the project
# do whatever I want using the conf and implementing my own logic
conf_mlflow = config_loader.get("mlflow*", "mlflow*/**")
do_my_own_logic(conf_mlflow )
Let's say that I want to create a connection with a remote server (say SAS) globally to interact before/afeter node, and eventually inside node
class MlflowPipelineHook:
@hook_impl
def before_pipeline_run(
self, run_params: Dict[str, Any], pipeline: Pipeline, catalog: DataCatalog
) -> None:
# get the config loader of the current context
config_loader = get_config_loader() # actually, config_loader is not available here, this magic function does not exist!
# do whatever I want using the conf and implementing my own logic
credentials = config_loader.get("credentials*", "credentials*/**")
saspy.SASsession(credentials)
@WaylonWalker @deepyaman You guys seem to develop a lot of hooks, do these use cases are hitting you too? I see you sometimes use environment variable for configuration of your hooks, I guess it is somehow related to this.
For instance, example 1 would become:
class MlflowNodeHook:
@hook_impl
def before_node_run(
self,
node: Node,
catalog: DataCatalog,
inputs: Dict[str, Any],
is_async: bool,
run_id: str,
) -> None:
# recreate the config loader manually
conf_paths = [
str(self.project_path / self.CONF_ROOT / "base"), # these attributes are not accessible outside the context, they must be hardcoded actually
str(self.project_path / self.CONF_ROOT / self.env), # suppressed
]
hook_manager = get_hook_manager()
config_loader = hook_manager.hook.register_config_loader( # pylint: disable=no-member
conf_paths=conf_paths
) or ConfigLoader(conf_paths)
# do whatever I want using the conf and implementing my own logic
conf_mlflow = config_loader.get("mlflow*", "mlflow*/**")
do_my_own_logic(conf_mlflow )
Pros:
Cons:
project_path is hardcodedenv is not accessibleSome hooks methods have access to some of the project context attributes: for instance, after_catalog_created can access credentials, before_pipeline_run and after_pipeline_run can access project_path. In these methods, we can call load_context(project_path) to access to all of the context attributes.
Pros:
Cons:
before_node_run and after_node_run do not have access to the project_path for instance)For the hooks without access to the project_path, call load_context() without the project_path argument.
Pros:
Cons:
As the title of this issue states, a solution would be to pass the config loader to each @hook_spec parameters to make it accessible within hooks
KedroSession ?By digging in the code, I noticed a merged yet not documented feature called KedroSession. This creates a global variable which is accessible without any hypothesis on the template just by calling get_current_session(), and it contains the context, hence the ConfigLoader. It should be accessible in the hooks.
Pros:
Cons:
kedro run command, i.e.KedroSession.create(project_path)KedroSession.create(project_path)@DmitriiDeriabinQB it seems you are the one developing KedroSession, is it how it is intended to be used in the future?
@Galileo-Galilei Steel Toes utilizes the project's context by defining your hooks as a property on the ProjectContext rather than a list.
from steel_toes import SteelToes
class ProjectContext(KedroContext):
project_name = "kedro0160"
project_version = "0.16.1"
package_name = "kedro0160"
@property
def hooks(self):
self._hooks = [ SteelToes(self), ]
return self._hooks
You can see where the context is used inside the hook here. I do feel like this is a bit of a hack and asks users to implement hooks on their project in a non-traditional way. The next upcoming change will make the context argument not required. Note that context contains a config_loader method that might be useful for you.
I would really like to get access to the project's context inside of a hook, especially if we could configure hook behavior inside of .kedro.yml. I think this would align with how plugins work on pytest. I am do not know how it works, but I know when using a plugin like pytest-cov you can pass in command-line arguments, or add to a config file to configure how it is ran. https://pytest-cov.readthedocs.io/en/latest/config.html.
Hello @WaylonWalker and thanks for the reply. This is a clever hack and works like a charm, but it breaks auto-discovery and configuration in kedro.yml as you mention (not to mention that it is a user facing change, even if it easy to setup). I feel that it can be a temporary way to make the hook more stable, but it is definitely not a long term solution and should be integrated to kedro core IMHO. Aligning on pytest sounds reasonable indeed.
By the way, it seems @tamsanh is hitting the same problem and need to access the context inside his KedroWings hook to be able to use interactively (which is the same issue some users mention here and here in kedro-mlflow.)
Some tests on KedroSession look promising (initalise a session before_pipeline_run and retrieve it anywhere you need it), but I don't want to rely on it since it is explicitly mentionned in the script that it is not stable and may change even between releases.
@Galileo-Galilei you are right assuming that KedroSession has been designed to eventually become responsible for carrying KedroContext (and project data in general) which would make the use case that you've describe much less painful. Hence, as you have already noticed it has been made a singleton to ensure its accessibility from hooks, for example.
However, this is still a work in progress and currently it's not at the stage where we can officially announce it and freeze the design. The general idea is that KedroSession will gradually take over the responsibility for the lifecycle events, while KedroContext will be treated as a "gatekeeper to the library components" (definition by @limdauto) in a new model.
Thanks for the insightful answer. I use one of above solutions as a better than nothing way to achieve what I want, and wait for the KedroSession to be more official :smile: If you need beta-testers, feel free to ask!
Most helpful comment
Thanks for the insightful answer. I use one of above solutions as a better than nothing way to achieve what I want, and wait for the
KedroSessionto be more official :smile: If you need beta-testers, feel free to ask!