kedro ipython fails with
DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.
if dataset referenced as pandas.CSVDataSet.
I follow the tutorial until the "Setting up the data"-"Reference all datasets" step. I reference the two datasets as pandas.CSVDataSet and check that the datasets are correctly referenced by running context.catalog.load("companies").head() in a kedro ipython session.
`companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv`
kedro ipython session should start and context.catalog.load("companies").head() should display the first rows of the dataset.
when I run kedro ipython I get:
DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.
If dataset referenced as CSVLocalDataSet, then the kedro ipython session starts correctly and context.catalog.load("companies").head() displays the first rows of the dataset. However, if I then run from kedro.extras.datasets.pandas import CSVDataSet, it fails with:
ModuleNotFoundError: No module named 'kedro.extras'
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro): 0.15.5python -V): Python 3.7.6Hello @Mshindi777! 👋
Thank you for raising this issue. I’ve explained why this is happening in this PR comment here: https://github.com/quantumblacklabs/kedro/pull/222#issuecomment-586580697 and intend to fix it first thing Monday morning.
To view the correct documentation, at the bottom right of the sidebar on the left in the documentation, you should be able to switch the documentation version from latest to stable.
Hope that helps and sorry for the confusion re: docs!
Thank you, I'm closing it.
This should be fixed now! Thank you for raising it. :)
@ZainPatelQB
I just did a fresh install of Kedro and am still getting this error. The stable docs (and 0.15.6/.7) say to use pandas.CSVDataSet. Upon using that in the catalog.yml and attempting a kedro run, I get:
Class `pandas.CSVDataSet` not found.
Hi @pmbaumgartner,
We’re aware of this and it’s due to the way we handle our dependencies. As a stopgap for now, pip install “kedro[all]” should get you up and running.
@ZainPatelQB Great, will do! Is this listed in the docs somewhere and I missed it? I was migrating from an old version of Kedro and reviewing the data catalog, so it's possible I could have missed this earlier in the docs.
Thanks for your help and awesome work with Kedro.
@pmbaumgartner Nope, it's a bug on our end. :( We're trying to release a patch today or tomorrow to get it fixed and we're implementing steps to make sure it doesn't slip through again. :)
Thank you for the report and being engaged, it's super helpful!
@pmbaumgartner got the same error but fixed it replacing pandas.CSVDataSet with CSVLocalDataSet, as mentioned by @ZainPatelQB in the PR comment above
Hi @saccodd,
Using CSVLocalDataSet will work, but it’s currently deprecated and will be removed in 0.16 onwards. The solution here is to run pip install “kedro[all]” and continue using the (recommended) pandas.CSVDataSet.
We’ll be releasing a patch tomorrow or soon to make it work out of the box. Thanks for raising this. :)
perfect, thanks!
I'm testing kedro 15.5 and trying pip install “kedro[all]” didn't solve my problem (seems this version does not provide the extra 'all').
Ah, this applies to 15.6 and 15.7, on 15.5, we only have CSVLocalDataSet and not pandas.CSVDataSet
Most helpful comment
Hello @Mshindi777! 👋
Thank you for raising this issue. I’ve explained why this is happening in this PR comment here: https://github.com/quantumblacklabs/kedro/pull/222#issuecomment-586580697 and intend to fix it first thing Monday morning.
To view the correct documentation, at the bottom right of the sidebar on the left in the documentation, you should be able to switch the documentation version from
latesttostable.Hope that helps and sorry for the confusion re: docs!