Cannot read csv in chunks with kedro data catalog.
df = pd.read_csv(csv, chunksize=1000)
df.get_chunk()
How has this bug affected you? What were you trying to accomplish?
train_dataset:
type: pandas.CSVDataSet
filepath: 'mycsv.csv'
load_args:
chunksize: 50000
df = catalog.load("train_dataset")
df.get_chunk()
ValueError: I/O operation on closed file.
df
I should be able to loop over the reader.
ValueError: I/O operation on closed file.
-- If you received an error, place it here.
ValueError: I/O operation on closed file.
```yaml
train_dataset:
type: pandas.CSVDataSet
filepath: 'mycsv.csv'
load_args:
chunksize: 50000
-- Separate them if you have more than one.
```
Include as many relevant details about the environment in which you experienced the bug:
pip show kedro or kedro -V):python -V):Its been awhile since I have used chunksize. If I remember correct it returns a generator.
chunks = catalog.load("train_dataset")
for chunk in chunks:
# chunk is a DataFrame do what you need with it
process(chunk)
@WaylonWalker Thanks for jumping in, I have read your blog about Kedro befoe it helps me understand some concepts better.
When I iterate it it throws error that saying file is closed already.
I was able to replicate. I setup a pipeline with a csv and a catalog entry just as you did. I run into the same error if I try to kedro run or catalog.load it. I am not able to replicate the issue just loading with pandas, even if I use fsspec like the pandas.CSVDataSet does. Someone with a deeper understanding of the internals may need to take a look
I posted my replica of the issue here https://github.com/WaylonWalker/kedro_chunked.
I have read your blog about Kedro befoe it helps me understand some concepts better.
That is awesome!!! and potentially motivating to keep making more content.
@WaylonWalker I did the same thing for checking if it is the problem of fsspec -> seems not too.
catalog.load() will first call fsspec, then it also calls the transformer, I suspect transformer tries to read that generator and closed it.
But I haven't dig dive into transformer before yet, it would be great if someone has more knowledge jump in.
I'm facing the same issue, anyone has updates on this problem?