Kedro: How can I save a Python list/dict as a json with catalog.save function from a Jupyter notebook?

Created on 6 May 2020 · 4Comments · Source: quantumblacklabs/kedro

In my Jupyter notebook I want to save the following Python list into a json file:

my_list = [
    {
        'a_string': 'World!',
        'a_list': [1, 2, 3]
    },
    {
        'a_string': 'World!',
        'a_list': [4, 5, 6]
    }
]

Here is what I have in the catalog.yml:

my_json_data:
  type: kedro.io.JSONDataSet
  filepath: data/02_intermediate/extended_dataset.json

And this is what I'm running in my Jupyter notebook:

catalog.save("my_json_data", my_list )

I got the following error message:
list' object has no attribute 'to_json

I tried to convert my list to a pandas.DataFrame since it has a to_json method but the result in the extended_dataset.json is an indexed pandas json instead of a normal json format with json arrays:

{"a_string":{"0":"World!","1":"World!"},"a_list":{"0":[1,2,3],"1":[4,5,6]}}

I want this as a result in the extended_dataset.json:

[{
    "a_string": "World!",
    "a_list": [1, 2, 3]
}, {
    "a_string": "World!",
    "a_list": [4, 5, 6]
}]

Could you please help me out how to achieve this?

Question

Source

f-istvan

Most helpful comment

Thank you for the question!

You can pass any pandas arguments with save_args in catalog.yml. It should look something like this

my_json_data:
  type: pandas.JSONDataSet
  filepath: data/02_intermediate/extended_dataset.json
  save_args:
    orient: "records"
    lines: True

Note that catalog type should be pandas.JSONDataSet, not kedro.io.JSONDataSet.
Hope this helps :)

Btw, we have StackOverflow tags where you can ask questions as well :) https://stackoverflow.com/questions/tagged/kedro

921kiyo on 7 May 2020

🚀1 👍1

All 4 comments

Thank you for the question!

You can pass any pandas arguments with save_args in catalog.yml. It should look something like this

my_json_data:
  type: pandas.JSONDataSet
  filepath: data/02_intermediate/extended_dataset.json
  save_args:
    orient: "records"
    lines: True

Note that catalog type should be pandas.JSONDataSet, not kedro.io.JSONDataSet.
Hope this helps :)

Btw, we have StackOverflow tags where you can ask questions as well :) https://stackoverflow.com/questions/tagged/kedro

921kiyo on 7 May 2020

🚀1 👍1

Hi @921kiyo,

Right, the

save_args:
    orient: "records"
    lines: True

options are passed to the
kedro.extras.datasets.pandas.JSONDataSet.__init__ method. This is what I was looking for.

Thank you so much.

Question: why pandas.JSONDataSet is prefered over kedro.io.JSONDataSet?

Can I use kedro.io.JSONLocalDataSet? With kedro.io.JSONLocalDataSet I do not need to convert my json to pd.DataFrame. I can write out the json directly using this catalog entry:

my_json_data:
  type: kedro.io.JSONLocalDataSet
  filepath: data/02_intermediate/extended_dataset.json

Is that correct?

Thanks again!

f-istvan on 13 May 2020

You are welcome:) You could use JSONLocalDataSet, but it will be removed in the Kedro 0.16.0 and replaced by pandas.JSONDataSet. The reason we are moving toward pandas.JSONDataSet is that it is platform agnostic (e.g there is no Local or S3 keyword in the class name). It means it works on local and cloud platform like S3.

If you want to use JSONLocalDataSet in Kedro 0.16.*, then you could copy the existing implementation by following "create a custom dataset" in the documentation https://kedro.readthedocs.io/en/latest/03_tutorial/03_set_up_data.html?20dataset#creating-custom-datasets

Hope this helps :)

921kiyo on 13 May 2020

👍1

Thank you for the help.

f-istvan on 14 May 2020

Was this page helpful?

0 / 5 - 0 ratings