Kedro: [KED-2300] CLI --from-nodes argument breaks on nodes with multiple inputs / outputs

Created on 14 Nov 2020  路  3Comments  路  Source: quantumblacklabs/kedro

Description

The --from-nodes argument for kedro run breaks on nodes with multiple inputs / outputs.

Context

I had a bug in the middle of my pipeline. After fixing the bug, I followed the hint for resuming from where I left off:

You can resume the pipeline run by adding the following argument to your previous command:
  --from-nodes "see_onnx_model([some_onnx_dataset]) -> None,train_model([example_train_x,example_train_y,parameters]) -> [example_model],predict([example_model,example_test_x]) -> [example_predictions],report_accuracy([example_predictions,example_test_y]) -> None"

I copy-pasted this argument to the end of my kedro run command

Steps to Reproduce

  1. Create the example Iris project
  2. Run kedro run --from-nodes "train_model([example_train_x,example_train_y,parameters]) -> [example_model]"

Expected Result

Pipeline should continue running from train_model nodes.

Actual Result

The following error:

ValueError: Pipeline does not contain nodes named ['parameters]) -> [example_model]', 'example_train_y', 'train_model([example_train_x'].

Your Environment

  • Kedro: 0.16.6
  • Python: 3.8.2
  • Operating system and version: Ubuntu 18.04.5 LTS
Bug Report

All 3 comments

Hi @torazem thanks for raising this. It's rather because --from-nodes is expecting a list of comma-separated node names, so for instance if you explicitly set name="with,comma" on the node definition, it will result in the same error. The verbose names you see, like train([arg1,arg2]) -> this, are the default, auto-generated node names, which are not guaranteed to be machine friendly. We strongly encourage users to define their own node names - they're more readable, more descriptive, and prettier in Viz. _And_ you don't run into issues like the above. :)
To your point, would adding some reasonable node names to the iris example make this clearer for future reference?

Hi @lorenabalan!
That is completely fair, and I will name the nodes in my pipelines going forward :+1:

Setting explicit node names in the iris example will be very helpful, but there may still be instances where the recommendation is incorrect for some user-defined pipeline.

Is it possible to only display that hint if it is valid and will be parsed correctly?

Hi @torazem we've merged a commit to restrict the node names a user can set on a node (auto-generated names not affected though), which raises a more helpful error message if there are any special characters in the name - e.g. comma. Commit https://github.com/quantumblacklabs/kedro/commit/e500f999e8ae479631233ac032c2b9d7a267c258 in develop, will be available in the upcoming 0.17.0 release.

Was this page helpful?
0 / 5 - 0 ratings