Hello,
When we create a data frame with pandas ≤ 0.19.2 and pickle it (using pickle.dump), it is not possible to unpickle it using pandas 0.20.1.
# Using pandas 0.19.2
import pandas as pd
import pickle as pkl
data = pd.DataFrame({'x': [1, 2]})
pkl.dump(data, open("data_pd_0.19.2.pkl", "wb"))
# After upgrade to pandas 0.20.1
import pandas as pd
import pickle as pkl
pkl.load(open("data_pd_0.19.2.pkl", "rb"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas.indexes'
pandas.indexes
has been refactored to pandas.core.indexes
.It would be great to have:
Thanks a lot for your help,
Best regards.
Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib. joblib.load
fails with exactly the same error:
ImportError: No module named 'pandas.indexes'
When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'
After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.
Two (ugly) workarounds:
import sys
# 1
import pandas.core.indexes
sys.modules['pandas.indexes'] = pandas.core.indexes
# 2
import pandas.core.base, pandas.core.indexes.frozen
setattr(sys.modules['pandas.core.base'],'FrozenNDArray', pandas.core.indexes.frozen.FrozenNDArray)
see the above and simply use pd.read_pickle
I would if I could. But... I have a complex class (consisting of numpy objects, pandas series and dataframes, dictionaries ...), stored in a compressed joblib archive, so pd.read_pickle is of no use to me. As I said, this might be useful for joblib developers as for now it is impossible to load any joblib archive created when pandas < 0.20. I first had to downgrade pandas and now I'm using the above workarounds.
@matjazk Would you like to open an issue at joblib for this?
Already did and passed @jreback's suggestion.
Thanks. pd.read_pickle works, but, for your information, it is extremely slow - see benchmark.
I've made a script to pd.read_pickle and then pd.to_pickle every file.
@mhooreman the timings of "reading old" look suspiciously consistent with "writing". Are you sure you timed the correct thing?
I need to double check, but I'm sure about the performance difference while
reading: I got performance issues and I converted to fix those.
Le 8 juin 2017 5:50 PM, "Joris Van den Bossche" notifications@github.com
a écrit :
@mhooreman https://github.com/mhooreman the timings of "reading old"
look suspiciously consistent with "writing". Are you sure you timed the
correct thing?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/16474#issuecomment-307145401,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHCDlsDJNzuoFr8GMwsOIyQwCFGjUEZKks5sCBhTgaJpZM4Nku1h
.
@mhooreman of course its slower. its falling back to the python based unpickler which is much more flexible. so you can either have fast or correctness. you get to choose.
I got the same problem when unpickling the data in pandas 0.20.2. I have used df.to_pickle()
to pickle my dataframe in pandas 0.19.2 but failed to unpickle it using pandas.read_pickle()
in pandas 0.20.2. I got the error message
ImportError: No module named 'pandas.indexes'
pandas.read_pickle()
and pickle.load()
both generate this error message.
@TheodoreZhao If you have this error with read_pickle
as well, please open a new issue with a reproducible example.
@jreback similar to @matjazk, pd.read_pickle doesn't work if you're using pickle.loads to load a string (retrieved from some store other than the filesystem). Can pd.read_pickle be updated to handle a file-like object rather than just a path?
its an open issue: https://github.com/pandas-dev/pandas/issues/5924
if you want to submit a PR to do this, its not difficult.
Most helpful comment
Going from panda 0.18.1 to 0.20.1 I encountered the same problem when loading with joblib.
joblib.load
fails with exactly the same error:ImportError: No module named 'pandas.indexes'
When you fix this (see the first workaround), there is an error
AttributeError: module 'pandas.core.base' has no attribute 'FrozenNDArray'
After workaround 2, files load. It seems that in my case this is more of a question for joblib devs.
Two (ugly) workarounds: