Pandas: Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1

Created on 24 May 2017  Â·  14Comments  Â·  Source: pandas-dev/pandas

Hello,

Problem description

When we create a data frame with pandas ≤ 0.19.2 and pickle it (using pickle.dump), it is not possible to unpickle it using pandas 0.20.1.

# Using pandas 0.19.2
import pandas as pd
import pickle as pkl
data = pd.DataFrame({'x': [1, 2]})
pkl.dump(data, open("data_pd_0.19.2.pkl", "wb"))
# After upgrade to pandas 0.20.1
import pandas as pd
import pickle as pkl
pkl.load(open("data_pd_0.19.2.pkl", "rb"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas.indexes'

First analysis

  • It seems that pandas.indexes has been refactored to pandas.core.indexes.
  • I don't know if there are other such incompatible changes

Proposal

It would be great to have:

  • A deprecation warning when unpicking old data frame
  • Load old data frame supported but automatically converted to the new format, so that we can upgrade by pickling the unpickled data frames

Thanks a lot for your help,
Best regards.

IO Data Usage Question

Most helpful comment

Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load fails with exactly the same error:
ImportError: No module named 'pandas.indexes'

When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'

After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.

Two (ugly) workarounds:

import sys
# 1
import pandas.core.indexes 
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)

All 14 comments

Big red box, is clear that pd.read_pickle is the pickle reader and makes things backward compatible. Further whatsnew notes have a quite large section on what changed here

sure a direct call will work to pickle.loads, but this is not guaranteeed across versions.

Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load fails with exactly the same error:
ImportError: No module named 'pandas.indexes'

When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'

After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.

Two (ugly) workarounds:

import sys
# 1
import pandas.core.indexes 
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)

see the above and simply use pd.read_pickle

I would if I could. But... I have a complex class (consisting of numpy objects, pandas series and dataframes, dictionaries ...), stored in a compressed joblib archive, so pd.read_pickle is of no use to me. As I said, this might be useful for joblib developers as for now it is impossible to load any joblib archive created when pandas < 0.20. I first had to downgrade pandas and now I'm using the above workarounds.

@matjazk Would you like to open an issue at joblib for this?

Already did and passed @jreback's suggestion.

Thanks. pd.read_pickle works, but, for your information, it is extremely slow - see benchmark.
I've made a script to pd.read_pickle and then pd.to_pickle every file.
benchmark

@mhooreman the timings of "reading old" look suspiciously consistent with "writing". Are you sure you timed the correct thing?

I need to double check, but I'm sure about the performance difference while
reading: I got performance issues and I converted to fix those.

Le 8 juin 2017 5:50 PM, "Joris Van den Bossche" notifications@github.com
a écrit :

@mhooreman https://github.com/mhooreman the timings of "reading old"
look suspiciously consistent with "writing". Are you sure you timed the
correct thing?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/16474#issuecomment-307145401,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHCDlsDJNzuoFr8GMwsOIyQwCFGjUEZKks5sCBhTgaJpZM4Nku1h
.

@mhooreman of course its slower. its falling back to the python based unpickler which is much more flexible. so you can either have fast or correctness. you get to choose.

I got the same problem when unpickling the data in pandas 0.20.2. I have used df.to_pickle() to pickle my dataframe in pandas 0.19.2 but failed to unpickle it using pandas.read_pickle() in pandas 0.20.2. I got the error message

ImportError: No module named 'pandas.indexes'

pandas.read_pickle() and pickle.load() both generate this error message.

@TheodoreZhao If you have this error with read_pickle as well, please open a new issue with a reproducible example.

@jreback similar to @matjazk, pd.read_pickle doesn't work if you're using pickle.loads to load a string (retrieved from some store other than the filesystem). Can pd.read_pickle be updated to handle a file-like object rather than just a path?

its an open issue: https://github.com/pandas-dev/pandas/issues/5924

if you want to submit a PR to do this, its not difficult.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

scls19fr picture scls19fr  Â·  3Comments

BDannowitz picture BDannowitz  Â·  3Comments

swails picture swails  Â·  3Comments

nathanielatom picture nathanielatom  Â·  3Comments

andreas-thomik picture andreas-thomik  Â·  3Comments