Pandas: Cannot unpickle data frame made with 0.19.2 after upgrade to 0.20.1

Created on 24 May 2017 · 14Comments · Source: pandas-dev/pandas

Hello,

Problem description

When we create a data frame with pandas ≤ 0.19.2 and pickle it (using pickle.dump), it is not possible to unpickle it using pandas 0.20.1.

# Using pandas 0.19.2
import pandas as pd
import pickle as pkl
data = pd.DataFrame({'x': [1, 2]})
pkl.dump(data, open("data_pd_0.19.2.pkl", "wb"))

# After upgrade to pandas 0.20.1
import pandas as pd
import pickle as pkl
pkl.load(open("data_pd_0.19.2.pkl", "rb"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas.indexes'

First analysis

It seems that pandas.indexes has been refactored to pandas.core.indexes.
I don't know if there are other such incompatible changes

Proposal

It would be great to have:

A deprecation warning when unpicking old data frame
Load old data frame supported but automatically converted to the new format, so that we can upgrade by pickling the unpickled data frames

Thanks a lot for your help,
Best regards.

IO Data Usage Question

Source

mhooreman

Most helpful comment

Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load fails with exactly the same error:
ImportError: No module named 'pandas.indexes'

When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'

After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.

Two (ugly) workarounds:

import sys
# 1
import pandas.core.indexes 
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)

matjazk on 1 Jun 2017

👍6

All 14 comments

Big red box, is clear that pd.read_pickle is the pickle reader and makes things backward compatible. Further whatsnew notes have a quite large section on what changed here

sure a direct call will work to pickle.loads, but this is not guaranteeed across versions.

jreback on 24 May 2017

👍3

Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load fails with exactly the same error:
ImportError: No module named 'pandas.indexes'

When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'

After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.

Two (ugly) workarounds:

import sys
# 1
import pandas.core.indexes 
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)

matjazk on 1 Jun 2017

👍6

see the above and simply use pd.read_pickle

jreback on 1 Jun 2017

👍2

I would if I could. But... I have a complex class (consisting of numpy objects, pandas series and dataframes, dictionaries ...), stored in a compressed joblib archive, so pd.read_pickle is of no use to me. As I said, this might be useful for joblib developers as for now it is impossible to load any joblib archive created when pandas < 0.20. I first had to downgrade pandas and now I'm using the above workarounds.

matjazk on 1 Jun 2017

@matjazk Would you like to open an issue at joblib for this?

jorisvandenbossche on 1 Jun 2017

Already did and passed @jreback's suggestion.

matjazk on 1 Jun 2017

Thanks. pd.read_pickle works, but, for your information, it is extremely slow - see benchmark.
I've made a script to pd.read_pickle and then pd.to_pickle every file.

mhooreman on 8 Jun 2017

@mhooreman the timings of "reading old" look suspiciously consistent with "writing". Are you sure you timed the correct thing?

jorisvandenbossche on 8 Jun 2017

I need to double check, but I'm sure about the performance difference while
reading: I got performance issues and I converted to fix those.

Le 8 juin 2017 5:50 PM, "Joris Van den Bossche" notifications@github.com
a écrit :

@mhooreman https://github.com/mhooreman the timings of "reading old"
look suspiciously consistent with "writing". Are you sure you timed the
correct thing?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/16474#issuecomment-307145401,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHCDlsDJNzuoFr8GMwsOIyQwCFGjUEZKks5sCBhTgaJpZM4Nku1h
.

mhooreman on 8 Jun 2017

@mhooreman of course its slower. its falling back to the python based unpickler which is much more flexible. so you can either have fast or correctness. you get to choose.

jreback on 8 Jun 2017

I got the same problem when unpickling the data in pandas 0.20.2. I have used df.to_pickle() to pickle my dataframe in pandas 0.19.2 but failed to unpickle it using pandas.read_pickle() in pandas 0.20.2. I got the error message

ImportError: No module named 'pandas.indexes'

pandas.read_pickle() and pickle.load() both generate this error message.

TheodoreZhao on 2 Jul 2017

👍1

@TheodoreZhao If you have this error with read_pickle as well, please open a new issue with a reproducible example.

jorisvandenbossche on 22 Aug 2017

@jreback similar to @matjazk, pd.read_pickle doesn't work if you're using pickle.loads to load a string (retrieved from some store other than the filesystem). Can pd.read_pickle be updated to handle a file-like object rather than just a path?