Kedro: String as an argument to the node in pipeline.

Created on 5 Feb 2020  路  8Comments  路  Source: quantumblacklabs/kedro

I want to add feature engineering in the pipeline for every feature and for that I want to pass a string as an argument to the function inside the node but that's not allowed in kedro yet and I get following error:-
ValueError: Pipeline input(s) {"xxxx","yyyy"} not found in the DataCatalog
So I want to know if there's a way to use string as in argument inside node.

Thanks in advance.

Feature Request

Most helpful comment

Currently kedro only supports passing parameters or catalog entries to nodes. The way that you would pass other options in would be with a partial or a lambda. I most often use lambdas, you can look up how to do it with functools.partial if you would rather use that.

def my_func(data, param):
   data[feature] = param
   return data

node(
   func= lambda data: my_func(data, 'xxx')
   ...
)

node(
   func= lambda data: my_func(data, 'yyy')
   ...
)

I often generate nodes dynamically, be careful to bind your inputs to your lambdas if you create them in a loop. More on that in this article https://waylonwalker.com/blog/bind-dynamic-lambdas.

All 8 comments

Hi @parulML, you might find this answer helpful: https://stackoverflow.com/questions/58875820/how-to-pass-a-literal-value-to-a-node
If you replace the given example (integer 1), with a string, then you have exactly your use case.

Thanks, @lorenabalan I did what you suggested but I get below error:-
AttributeError: 'str' object has no attribute 'items'

This is the code snipped:-
create_feature_engineering_pipeline = Pipeline([ node(partial(FeatureEngineering.date_features, column_name='DateKey'), inputs=['merged_files'], outputs=None, name="date_features_for_datekey" ), ], name='create_feature_engineering_pipeline')

I'm afraid we can't debug without knowing what the actual node function looks like, and a full stack trace. Your pipeline definition looks fine to me, my feeling is the problem is within the function, not with Kedro itself.

Hey @parulML, were you able to get some help with this? 馃槃 Do let us know, otherwise we'll be closing this ticket.

Hi @parulML, I hope that you managed to get assistance on this. Let us know if you need anything else. I'll be closing this issue. If you are still stuck then just comment on it and we'll reopen it, or you can create a new issue if you have any more feedback or queries.

I think kedro should add the way to pass a literal to a node, just like they did with param:.
Maybe something like literal: or string: and int:

Currently kedro only supports passing parameters or catalog entries to nodes. The way that you would pass other options in would be with a partial or a lambda. I most often use lambdas, you can look up how to do it with functools.partial if you would rather use that.

def my_func(data, param):
   data[feature] = param
   return data

node(
   func= lambda data: my_func(data, 'xxx')
   ...
)

node(
   func= lambda data: my_func(data, 'yyy')
   ...
)

I often generate nodes dynamically, be careful to bind your inputs to your lambdas if you create them in a loop. More on that in this article https://waylonwalker.com/blog/bind-dynamic-lambdas.

Thats a nice suggestion, i must say.

Was this page helpful?
0 / 5 - 0 ratings