Kedro: Kedro Tutorial pandas.CSVDataSet not found and kedro.extras ModuleNotFoundError

Created on 16 Feb 2020  ·  11Comments  ·  Source: quantumblacklabs/kedro

Description

kedro ipython fails with

DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.

if dataset referenced as pandas.CSVDataSet.

Context

I follow the tutorial until the "Setting up the data"-"Reference all datasets" step. I reference the two datasets as pandas.CSVDataSet and check that the datasets are correctly referenced by running context.catalog.load("companies").head() in a kedro ipython session.

Steps to Reproduce

  1. Create a new project and install dependencies as shown in the Kedro Spaceflights tutorial.
  2. Download the data files and reference the files as shown in the Setting up the data part:

`companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv

reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv`

  1. run kedro ipython

Expected Result

kedro ipython session should start and context.catalog.load("companies").head() should display the first rows of the dataset.

Actual Result

when I run kedro ipython I get:

DataSetError: An exception occurred when parsing config for DataSet `companies`:
Class `pandas.CSVDataSet` not found.

If dataset referenced as CSVLocalDataSet, then the kedro ipython session starts correctly and context.catalog.load("companies").head() displays the first rows of the dataset. However, if I then run from kedro.extras.datasets.pandas import CSVDataSet, it fails with:

ModuleNotFoundError: No module named 'kedro.extras'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (using pip show kedro): 0.15.5
  • Python version used (python -V): Python 3.7.6
  • Operating system and version: macOS High Sierra 10.13.6
Bug Report

Most helpful comment

Hello @Mshindi777! 👋

Thank you for raising this issue. I’ve explained why this is happening in this PR comment here: https://github.com/quantumblacklabs/kedro/pull/222#issuecomment-586580697 and intend to fix it first thing Monday morning.

To view the correct documentation, at the bottom right of the sidebar on the left in the documentation, you should be able to switch the documentation version from latest to stable.

Hope that helps and sorry for the confusion re: docs!

All 11 comments

Hello @Mshindi777! 👋

Thank you for raising this issue. I’ve explained why this is happening in this PR comment here: https://github.com/quantumblacklabs/kedro/pull/222#issuecomment-586580697 and intend to fix it first thing Monday morning.

To view the correct documentation, at the bottom right of the sidebar on the left in the documentation, you should be able to switch the documentation version from latest to stable.

Hope that helps and sorry for the confusion re: docs!

Thank you, I'm closing it.

This should be fixed now! Thank you for raising it. :)

@ZainPatelQB

I just did a fresh install of Kedro and am still getting this error. The stable docs (and 0.15.6/.7) say to use pandas.CSVDataSet. Upon using that in the catalog.yml and attempting a kedro run, I get:

Class `pandas.CSVDataSet` not found.

Hi @pmbaumgartner,

We’re aware of this and it’s due to the way we handle our dependencies. As a stopgap for now, pip install “kedro[all]” should get you up and running.

@ZainPatelQB Great, will do! Is this listed in the docs somewhere and I missed it? I was migrating from an old version of Kedro and reviewing the data catalog, so it's possible I could have missed this earlier in the docs.

Thanks for your help and awesome work with Kedro.

@pmbaumgartner Nope, it's a bug on our end. :( We're trying to release a patch today or tomorrow to get it fixed and we're implementing steps to make sure it doesn't slip through again. :)

Thank you for the report and being engaged, it's super helpful!

@pmbaumgartner got the same error but fixed it replacing pandas.CSVDataSet with CSVLocalDataSet, as mentioned by @ZainPatelQB in the PR comment above

Hi @saccodd,

Using CSVLocalDataSet will work, but it’s currently deprecated and will be removed in 0.16 onwards. The solution here is to run pip install “kedro[all]” and continue using the (recommended) pandas.CSVDataSet.

We’ll be releasing a patch tomorrow or soon to make it work out of the box. Thanks for raising this. :)

perfect, thanks!
I'm testing kedro 15.5 and trying pip install “kedro[all]” didn't solve my problem (seems this version does not provide the extra 'all').

Ah, this applies to 15.6 and 15.7, on 15.5, we only have CSVLocalDataSet and not pandas.CSVDataSet

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WaylonWalker picture WaylonWalker  ·  3Comments

jmrichardson picture jmrichardson  ·  3Comments

WaylonWalker picture WaylonWalker  ·  3Comments

josephhaaga picture josephhaaga  ·  3Comments

yetudada picture yetudada  ·  4Comments