Pickling a RandomForestClassifier pulled from an sklearn Pipeline appears to result in a ModuleNotFoundError
when loading into another notebook. The errant module does exist, but cannot be found: sklearn.ensemble._forest
.
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
import pickle
import numpy as np
pipeline = make_pipeline(
# Other steps in pipeline as well
RandomForestClassifier(),
)
# Create some fake data
X_train = np.array([[2,8,5],[4,7,2],[1,9,4]])
y_train = np.array([26, 29, 18])
# Train the model
pipeline.fit(X_train, y_train)
# Pickle the model
model = pipeline.named_steps['randomforestclassifier']
outfile = open("model.pkl", "wb")
pickle.dump(model, outfile)
outfile.close()
In another notebook:
from sklearn.ensemble import RandomForestClassifier
import pickle
# Attempt to load the pickled model in another file / notebook:
infile = open("model.pkl", "rb")
model = pickle.load(infile)
infile.close()
# It's lonely over here
model
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-20-c8d1783e8b58> in <module>
5 # Attempt to load the pickled model in another file / notebook:
6 infile = open("model.pkl", "rb")
----> 7 model = pickle.load(infile)
8 infile.close()
ModuleNotFoundError: No module named 'sklearn.ensemble._forest'
System:
python: 3.7.4 (default, Oct 9 2019, 16:55:50) [GCC 7.4.0]
executable: /usr/local/bin/python3.7
machine: Linux-4.15.0-88-generic-x86_64-with-debian-buster-sid
Python deps:
pip: 19.3
setuptools: 40.8.0
sklearn: 0.21.3
numpy: 1.17.2
scipy: 1.3.1
Cython: None
pandas: 0.25.1
EDIT: Fix description
Don't appear to be able to import from sklearn.ensemble._forest
. Is this intended?
Could you tell use which version of scikit-learn did you use to pickle the pipeline and which version are you trying to unpickle. I assume that you are trying to unpickle in 0.21.3 while pickling in 0.22.1
You can then just update. However, be aware that we don't consider it as a bug because we don't support pickling/unpickling across different scikit-learn version
That was exactly the issue. I didn't realize that I had started the new notebook in a different kernel. Thank you very much for the prompt response.