Kedro: 'kedro run' command does not seem to run run.py script

Created on 21 Oct 2019  路  8Comments  路  Source: quantumblacklabs/kedro

Description

I have specified the tags and nodes that has to be run in run.py when the 'kedro run' command is passed. However, the kedro run does not seem to run the run.py. I am not able to find the solution from the Kedro Documentation too.

Context

Due to this bug, I am not able to run the nodes selectively.

my run.py script

`"""Application entry point."""

from pathlib import Path
from typing import Iterable, Type, Dict

from kedro.context import KedroContext, load_context, KedroContextError
from kedro.runner import AbstractRunner
from kedro.pipeline import Pipeline

from part_count_v1.pipeline import create_pipelines

class ProjectContext(KedroContext):

project_name = "Part Count v1"
project_version = "0.15.3"

def _get_pipelines(self) -> Dict[str, Pipeline]:
    return create_pipelines()

def main(
tags: Iterable[str] = ["not_once"],
env: str = None,
runner: Type[AbstractRunner] = None,
node_names: Iterable[str] = None,
from_nodes: Iterable[str] = None,
to_nodes: Iterable[str] = None,
from_inputs: Iterable[str] = None,
):

project_context = ProjectContext(Path.cwd(), env=env)
print('********* I am inside the runner, the node_names are {}'.format(node_names))
project_context.run(
    tags=tags,
    runner=runner,
    node_names=node_names,
    from_nodes=from_nodes,
    to_nodes=to_nodes,
    from_inputs=from_inputs,
)

if __name__ == "__main__":
main()
`

Expected Result

only run nodes with tags - "not_once"

Actual Result

The nodes without tags and with different tags in the pipeline are run

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.15.3
  • Python version used (python -V): 3.7.3
  • Operating system and version: ubuntu 18.04.03 LTS

Reaching out for assistance

is there any slack channel or support service to assist my team to embark on using Kedro for our production ready products?

Thanks in advance!

Documentation

Most helpful comment

Thanks @DmitriiDeriabinQB .

I intially tried posting these questions in Stackoverflow but I couldnt create a tag 'kedro' hence resorted to github.
Now there is a kedro tag on stackoverflow and I have posted this question there!

Thank you for your assistance and I hope to see a growing community using kedro:)

All 8 comments

@pryanga-wk Thank you for reporting this. This is indeed a known issue that the documentation is unclear about how to use KedroContext. And indeed main() in run.py is now executed only when you run your Kedro project as a package. We have an open internal ticket aimed to improve the documentation on KedroContext.

Regarding your question about specifying tags and nodes to be run:

  1. The recommended way is to specify those from the command line (e.g., kedro run --tag not_once)
  2. If you would like the changes to take a permanent effect (which we do not recommend unless you have a strong need for), you can change the definition of run() function in kedro_cli.py and, say, change how tag argument is being passed to context.run(). Please let us know whether it solves your issue.

@DmitriiDeriabinQB Thank you for looking into the documentation issue. I am hoping to see growing community for kedro soon.

So I believe, in order for me to conditionally run the nodes in the pipeline, I need to pass the tags to the cmd line API: kedro run.

What do you mean by running Kedro project as a package? Isnt running the cmd line API like running the Kedro project as a package? Please share some details on this. To note that when I pass the
kedro run
It does not print the string I have written in the main() function? :( Why is that so?

I tried kedro run --tag=not_once

Actual Output:
kedro.context.context.KedroContextError: Pipeline contains no nodes with tags: ('not_once',)

My pipeline.py code:
```from typing import Dict

from kedro.pipeline import Pipeline, node, decorators
from .nodes.detect_track_functions import *

def create_pipelines(**kwargs) -> Dict[str, Pipeline]:
pipeline = Pipeline([
node(
check_true_detection,
"parameters",
None,
name="check_true",
tags="test"
),
node(
search_objects,
"parameters",
None,
name="search_and_update",
tags="not_once"
)

])

return {
    "__default__": pipeline
}

```

To note that when I pass the kedro run It does not print the string I have written in the main() function? :( Why is that so?

It happens because, as mentioned above, main() is not being called when kedro run is executed.

What do you mean by running Kedro project as a package?

It happens when you install your package globally by running pip install ./src/ and then run python -m <your-package-name>.run. This feature has a very specific and niche use case, so I wouldn't worry about it for your project since you are able to run it from the CLI.

kedro.context.context.KedroContextError: Pipeline contains no nodes with tags: ('not_once',)

The node constructor expects an iterable containing the sequence of tags, so passing the string doesn't work as expected. You need to wrap tags into the list, for example: tags=["test"] and tags=["not_once"]

Resolved in 6cd082e256475f1b0746d0cecaab61b7b1dff7fd

I have configured my kedro_cli.py to the run.py script.

`from pathlib import Path
from typing import Iterable, Type, Dict

from kedro.context import KedroContext, load_context, KedroContextError
from kedro.runner import AbstractRunner,SequentialRunner
from kedro.pipeline import Pipeline

from .pipeline import create_pipelines

class ProjectContext(KedroContext):
"""Users can override the remaining methods from the parent class here, or create new ones
(e.g. as required by plugins)

"""

project_name = "Part Count v1"
project_version = "0.15.3"

def _get_pipelines(self) -> Dict[str, Pipeline]:
    return create_pipelines()

def main(
tags: Iterable[str] = None,
env: str = None,
runner: Type[AbstractRunner] = None,
node_names: Iterable[str] = None,
from_nodes: Iterable[str] = None,
to_nodes: Iterable[str] = None,
from_inputs: Iterable[str] = None,
):

project_context = ProjectContext(Path.cwd(), env=env)
project_context.run(
    tags=tags,
    runner=runner,
    node_names=node_names,
    from_nodes=from_nodes,
    to_nodes=to_nodes,
    from_inputs=from_inputs,
)

if __name__ == "__main__":
main()
`

The script above is obtained from kedro example project. however, my kedro run API on the commandline does not run the nodes in the pipeline.py in the same sequence as they are declared in the pipelie.py. How to enable the sequential run via my run.py script?

Thank you in advance:)

@pryanga-wk Pipeline determines the node execution order exclusively based on dataset dependencies (node inputs and outputs) at the moment. So the only option to dictate that the node A should run before node B is to put a dummy dataset as an output of node A and an input of node B.

Also, could I suggest you to post such kind of "how to" questions on Stackoverflow (and tag it with kedro)? Since other users may also benefit from the answers and, arguably, SO is more googleable. Thank you.

Thanks @DmitriiDeriabinQB .

I intially tried posting these questions in Stackoverflow but I couldnt create a tag 'kedro' hence resorted to github.
Now there is a kedro tag on stackoverflow and I have posted this question there!

Thank you for your assistance and I hope to see a growing community using kedro:)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

f-istvan picture f-istvan  路  3Comments

kaemo picture kaemo  路  3Comments

tamsanh picture tamsanh  路  3Comments

bensdm picture bensdm  路  4Comments

WaylonWalker picture WaylonWalker  路  3Comments