When performing a grid search on a pipeline that has None
for a transformer step, an AttributeError
is raised. This snippet below previously ran successfully with scikit-learn==0.23.2
but no longer works the 0.24.dev0
.
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
iris = load_iris()
X, y = iris.data, iris.target
pipe = Pipeline([("setup", None), ("svc", SVC(kernel="linear", random_state=0))])
param_grid = [
{"svc__C": [0.1, 0.1]},
{"setup": [StandardScaler()]},
]
gs = GridSearchCV(pipe, param_grid=param_grid, return_train_score=True, cv=3)
gs.fit(X, y)
The GridSearchCV.fit
call is able to successfully complete
The following error is raised (I've included the full traceback further down):
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 863, in _is_pairwise
pairwise_tag = estimator._get_tags().get('pairwise', False)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 348, in _get_tags
more_tags = base_class._more_tags(self)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/pipeline.py", line 626, in _more_tags
estimator_tags = self.steps[0][1]._get_tags()
AttributeError: 'NoneType' object has no attribute '_get_tags'
It appears that the _is_pairwise
check doesn't work as expected when applied to a pipeline with None
for a step transformer.
Full traceback:
Traceback (most recent call last):
File "test-pipeline.py", line 18, in <module>
gs.fit(X, y)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/validation.py", line 60, in inner_f
return f(*args, **kwargs)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 841, in fit
self._run_search(evaluate_candidates)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 1288, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 795, in evaluate_candidates
out = parallel(delayed(_fit_and_score)(clone(base_estimator),
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 1048, in __call__
if self.dispatch_one_batch(iterator):
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch
self._dispatch(tasks)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 784, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 572, in __init__
self.results = batch()
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 585, in _fit_and_score
X_train, y_train = _safe_split(estimator, X, y, train)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 198, in _safe_split
if _is_pairwise(estimator):
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 863, in _is_pairwise
pairwise_tag = estimator._get_tags().get('pairwise', False)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/base.py", line 348, in _get_tags
more_tags = base_class._more_tags(self)
File "/Users/james/miniforge3/envs/dask-ml/lib/python3.8/site-packages/sklearn/pipeline.py", line 626, in _more_tags
estimator_tags = self.steps[0][1]._get_tags()
AttributeError: 'NoneType' object has no attribute '_get_tags'
System:
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56) [Clang 10.0.1 ]
executable: /Users/james/miniforge3/envs/dask-ml/bin/python3.8
machine: macOS-10.15.5-x86_64-i386-64bit
Python dependencies:
pip: 20.2.4
setuptools: 49.6.0.post20201009
sklearn: 0.24.dev0
numpy: 1.19.4
scipy: 1.5.3
Cython: None
pandas: 1.1.4
matplotlib: None
joblib: 0.17.0
threadpoolctl: 2.1.0
Built with OpenMP: True
Thanks for the report @jrbourbeau , we can reproduce. We're investigating the best solution in the different issues linked above if you're interested
I'll mark it as a blocker because the error will not just appear when using None
, but when using any step that doesn't have _get_tags
attribute (likely because it doesn't inherit from BaseEstimator
)
Fixed by #18797. Thanks for the timely bug report @jrbourbeau .
Most helpful comment
I'll mark it as a blocker because the error will not just appear when using
None
, but when using any step that doesn't have_get_tags
attribute (likely because it doesn't inherit fromBaseEstimator
)