Kedro: [KED-1799] Adding more than one decorator makes a pipeline be runned more than once

Created on 17 Jun 2020  路  2Comments  路  Source: quantumblacklabs/kedro

Description

Adding two decorators to a pipeline seems to make it be run again after it finished. The second time, there are issues regarding which data are used, which generated an error.

Context

Doing the Spaceshift Tutorial, I started having issue when decorators started being involved .
I wanted to add a decorator log_time decorator but also a mem_profile decorator to the data_engineering pipeline, as mentioned in the tutorial, which caused the error. Removing the second decorator resolves the issue.

Steps to Reproduce

  1. Load data and create nodes and pipelines from Kedro's Spaceshift Tutorial
  2. In main pipeline.py, add a decorator:
    def create_pipelines(**kwargs) -> Dict[str, Pipeline]:

# other imports
from kedro.pipeline.decorators import log_time
from kedro.extras.decorators.memory_profiler import mem_profile

def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
    """Create the project's pipeline.

    Args:
        kwargs: Ignore any additional arguments added in the future.

    Returns:
        A mapping from a pipeline name to a ``Pipeline`` object.

    """

    de_pipeline = de.create_pipeline().decorate(log_time).decorate(mem_profile)
    # Note that the followinf line gives the same issue:
    # de_pipeline = de.create_pipeline().decorate(log_time, mem_profile)
    ds_pipeline = ds.create_pipeline()

    return {
        "de": de_pipeline,
        "ds": ds_pipeline,
        "__default__": de_pipeline + ds_pipeline,
    }
  1. Run the pipeline using kedro run --pipeline de

Expected Result

That the pipelines finishes and that runtime and max memory usage are logged.

Actual Result

The pipeline is run again (see the Arrived here ! printed inside one of its node) after it finishes. This time causing an error because of changed data.

`` (kedro-tutorial-env) C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial>kedro run --pipeline de 2020-06-17 16:30:33,542 - root - INFO - ** Kedro project kedro-tutorial c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\fsspec\implementations\local.py:33: FutureWarning: The default value of auto_mkdir=True has been deprecated and will be changed to auto_mkdir=False by default in a future release. FutureWarning, 2020-06-17 16:30:34,018 - kedro.io.data_catalog - INFO - Loading data fromshuttles(ExcelDataSet)... 2020-06-17 16:30:42,942 - kedro.pipeline.node - INFO - Running node: preprocessing_shuttles: preprocess_shuttles([shuttles]) -> [preprocessed_shuttles] Arrived here ! 2020-06-17 16:30:43,800 - kedro.pipeline.decorators - INFO - Running 'kedro_tutorial.pipelines.data_engineering.nodes.preprocess_shuttles' took 79ms [0.079s] 2020-06-17 16:30:43,800 - kedro.pipeline.decorators - INFO - Running 'kedro_tutorial.pipelines.data_engineering.nodes.preprocess_shuttles' took 79ms [0.079s] Arrived here ! 2020-06-17 16:30:44,638 - kedro.pipeline.node - ERROR - Nodepreprocessing_shuttles: preprocess_shuttles([shuttles]) -> [preprocessed_shuttles]` failed with error:
'float' object has no attribute 'replace'
2020-06-17 16:30:44,638 - kedro.runner.sequential_runner - WARNING - There are 3 nodes that have not run.
You can resume the pipeline run by adding the following argument to your previous command:

Traceback (most recent call last):
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\cgaydon\AppData\Local\Continuum\anaconda3\envs\kedro-tutorial-env\Scripts\kedro.exe__main__.py", line 7, in
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\framework\cli\cli.py", line 633, in main
cli_collection()
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\click\core.py", line 829, in __call__
return self.main(args, *kwargs)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, *ctx.params)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\click\core.py", line 610, in invoke
return callback(
args, *kwargs)
File "C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial\kedro_cli.py", line 230, in run
pipeline_name=pipeline,
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\framework\context\context.py", line 699, in run
raise error
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\framework\context\context.py", line 691, in run
run_result = runner.run(filtered_pipeline, catalog, run_id)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\runner\runner.py", line 101, in run
self._run(pipeline, catalog, run_id)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\runner\sequential_runner.py", line 90, in _run
run_node(node, catalog, self._is_async, run_id)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\runner\runner.py", line 213, in run_node
node = _run_node_sequential(node, catalog, run_id)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\runner\runner.py", line 238, in _run_node_sequential
raise error
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\runner\runner.py", line 228, in _run_node_sequential
outputs = node.run(inputs)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\pipeline\node.py", line 439, in run
raise exc
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\pipeline\node.py", line 428, in run
outputs = self._run_with_one_input(inputs, self._inputs)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\pipeline\node.py", line 461, in _run_with_one_input
return self._decorated_func(inputs[node_input])
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\extras\decorators\memory_profiler.py", line 75, in with_memory
include_children=True,
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\memory_profiler.py", line 343, in memory_usage
returned = f(
args, *kw)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\pipeline\decorators.py", line 75, in with_time
result = func(
args, *kwargs)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\kedro\pipeline\decorators.py", line 75, in with_time
result = func(
args, **kwargs)
File "C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial\src\kedro_tutorial\pipelinesdata_engineering\nodes.py", line 50, in preprocess_shuttles
shuttles["price"] = shuttles["price"].apply(_parse_money)
File "c:\users\cgaydon\appdata\local\continuum\anaconda3\envs\kedro-tutorial-env\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "C:\Users\cgaydon\Documents\Working Materials\Kedro Tutorials\kedro-tutorial\src\kedro_tutorial\pipelinesdata_engineering\nodes.py", line 16, in _parse_money
return float(x.replace("$", "").replace(",", ""))
AttributeError: 'float' object has no attribute 'replace'```

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.16.2
  • Python version used (python -V): Python 3.7.7
  • Operating system and version: lateste Windows
Bug Report

Most helpful comment

@CharlesGaydon thanks for reporting this. This is a known issue and unfortunately the library we rely on for the mem_profile decorator has a bug which causes it. This is pointed out in the documentation of the decorator and you should only add this decorator to nodes that take at least 0.5s or more to execute, otherwise the bug will manifest itself.

If you would like to read more about the cause of the bug, you can go to https://github.com/pythonprofilers/memory_profiler/issues/216 and upvote the issue so the library maintainers can try to fix it.

Meanwhile we can look at the suggested fix in the thread there and limit the number of iterations they do for short-running functions.

All 2 comments

@CharlesGaydon thanks for reporting this. This is a known issue and unfortunately the library we rely on for the mem_profile decorator has a bug which causes it. This is pointed out in the documentation of the decorator and you should only add this decorator to nodes that take at least 0.5s or more to execute, otherwise the bug will manifest itself.

If you would like to read more about the cause of the bug, you can go to https://github.com/pythonprofilers/memory_profiler/issues/216 and upvote the issue so the library maintainers can try to fix it.

Meanwhile we can look at the suggested fix in the thread there and limit the number of iterations they do for short-running functions.

Thank you for your answer, I'll stay away from this particular decorator then, until a fix exists. Good luck with that!

Was this page helpful?
0 / 5 - 0 ratings