In my Jupyter notebook I want to save the following Python list into a json file:
my_list = [
{
'a_string': 'World!',
'a_list': [1, 2, 3]
},
{
'a_string': 'World!',
'a_list': [4, 5, 6]
}
]
Here is what I have in the catalog.yml:
my_json_data:
type: kedro.io.JSONDataSet
filepath: data/02_intermediate/extended_dataset.json
And this is what I'm running in my Jupyter notebook:
catalog.save("my_json_data", my_list )
I got the following error message:
list' object has no attribute 'to_json
I tried to convert my list to a pandas.DataFrame since it has a to_json method but the result in the extended_dataset.json is an indexed pandas json instead of a normal json format with json arrays:
{"a_string":{"0":"World!","1":"World!"},"a_list":{"0":[1,2,3],"1":[4,5,6]}}
I want this as a result in the extended_dataset.json:
[{
"a_string": "World!",
"a_list": [1, 2, 3]
}, {
"a_string": "World!",
"a_list": [4, 5, 6]
}]
Could you please help me out how to achieve this?
Thank you for the question!
You can pass any pandas arguments with save_args in catalog.yml. It should look something like this
my_json_data:
type: pandas.JSONDataSet
filepath: data/02_intermediate/extended_dataset.json
save_args:
orient: "records"
lines: True
Note that catalog type should be pandas.JSONDataSet, not kedro.io.JSONDataSet.
Hope this helps :)
Btw, we have StackOverflow tags where you can ask questions as well :) https://stackoverflow.com/questions/tagged/kedro
Hi @921kiyo,
Right, the
save_args:
orient: "records"
lines: True
options are passed to the
kedro.extras.datasets.pandas.JSONDataSet.__init__ method. This is what I was looking for.
Thank you so much.
Question: why pandas.JSONDataSet is prefered over kedro.io.JSONDataSet?
Can I use kedro.io.JSONLocalDataSet? With kedro.io.JSONLocalDataSet I do not need to convert my json to pd.DataFrame. I can write out the json directly using this catalog entry:
my_json_data:
type: kedro.io.JSONLocalDataSet
filepath: data/02_intermediate/extended_dataset.json
Is that correct?
Thanks again!
You are welcome:) You could use JSONLocalDataSet, but it will be removed in the Kedro 0.16.0 and replaced by pandas.JSONDataSet. The reason we are moving toward pandas.JSONDataSet is that it is platform agnostic (e.g there is no Local or S3 keyword in the class name). It means it works on local and cloud platform like S3.
If you want to use JSONLocalDataSet in Kedro 0.16.*, then you could copy the existing implementation by following "create a custom dataset" in the documentation https://kedro.readthedocs.io/en/latest/03_tutorial/03_set_up_data.html?20dataset#creating-custom-datasets
Hope this helps :)
Thank you for the help.
Most helpful comment
Thank you for the question!
You can pass any pandas arguments with
save_argsincatalog.yml. It should look something like thisNote that catalog
typeshould bepandas.JSONDataSet, notkedro.io.JSONDataSet.Hope this helps :)
Btw, we have StackOverflow tags where you can ask questions as well :) https://stackoverflow.com/questions/tagged/kedro