Kedro: How to pass the parameters of a function in Pipeline

Created on 7 Jul 2020 · 3Comments · Source: quantumblacklabs/kedro

What are you trying to do?

I defined a function called "split_train_test"which is used to split a dataset into training and test datasets. "split_train_test" has two parameters, one is an input dataframe(defined in catalog.yml), the other one is a specific date used to split the dataset.

I got a error "ValueError: Pipeline input(s) {'201801'} not found in the DataCatalog". It seems that in node function, we are only allowed to pass the names of datasets as parameters to our function.
pipeline.py

node(
                func=split_train_test,
                inputs=dict(df="preprocessed_transactions", test_date="201801"),
                outputs=["preprocessed_training", "preprocessed_test"]
            )

nodes.py
```
def split_train_test(df: pd.DataFrame, test_date: int) -> pd.DataFrame:
log.info(f"Start to split dataset into training and test datasets")
df = train_test_split.split_data(df, test_date=int(test_date))
return df
````

Question

Source

adslwang4601

Most helpful comment

Actually, on second thought, your use case seems like a perfect use case for parameters: https://kedro.readthedocs.io/en/latest/04_user_guide/03_configuration.html#using-parameters. You can specify test date in parameters.yml and refer to it in the node as params:test_date

limdauto on 7 Jul 2020

👍2

All 3 comments

Hi @adslwang4601, input and output indeed need to be a dataset instead of pure python value. So in your case, you can add this to your catalog.yml:

test_date:
    type: MemoryDataSet
    data: 201801 # or whatever test date you have in mind

And define your inputs as

inputs=dict(df="preprocessed_transactions", test_date="test_date")

limdauto on 7 Jul 2020

👍2

Hi @adslwang4601 I'm going to close this issue. If you still need help, please feel free to reopen it.

limdauto on 15 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Jupyter Notebook and iPython launch issues

josephhaaga · 3Comments

[KED-2373] ModuleNotFound when loading kedro context with 0.17

jeffkayne · 4Comments

GUI to handle workflow

bensdm · 4Comments

Reusing pipeline elements in a served model scenario

kaemo · 3Comments

Load data from intermediate after processing?

jmrichardson · 3Comments