python package: version=='2.3.1'
LightGBM version or commit hash: 2.3.1
When I try to print an aliased variable by passing it at instantiation, I will get a KeyError. When I looked it up, I found that it was occurring in most of the variables with aliases
Python 3.7.7 (default, Mar 26 2020, 15:48:22)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lightgbm import LGBMClassifier
>>> from lightgbm.basic import _ConfigAliases
>>>
>>> error_key_values = []
>>>
>>> for k, values in _ConfigAliases.aliases.items():
... for v in list(values):
... clf = LGBMClassifier(**{ k: v })
... try:
... print(clf)
... except KeyError as e:
... print(f'{k}\t{v}')
...
bin_construct_sample_cnt subsample_for_bin
bin_construct_sample_cnt bin_construct_sample_cnt
boosting boost
boosting boosting_type
boosting boosting
categorical_feature categorical_feature
categorical_feature cat_column
categorical_feature categorical_column
categorical_feature cat_feature
data_random_seed data_seed
data_random_seed data_random_seed
early_stopping_round early_stopping
early_stopping_round n_iter_no_change
early_stopping_round early_stopping_rounds
early_stopping_round early_stopping_round
enable_bundle bundle
enable_bundle is_enable_bundle
enable_bundle enable_bundle
eval_at eval_at
eval_at ndcg_eval_at
eval_at map_eval_at
eval_at ndcg_at
eval_at map_at
group_column group_id
group_column group
group_column query_column
group_column query
group_column group_column
group_column query_id
header has_header
header header
ignore_column ignore_column
ignore_column blacklist
ignore_column ignore_feature
is_enable_sparse enable_sparse
is_enable_sparse is_enable_sparse
is_enable_sparse is_sparse
is_enable_sparse sparse
label_column label
label_column label_column
machines machines
machines workers
machines nodes
metric metric
metric metrics
metric metric_types
num_class num_classes
num_class num_class
num_iterations num_boost_round
num_iterations num_round
num_iterations num_rounds
num_iterations n_iter
num_iterations num_iterations
num_iterations n_estimators
num_iterations num_iteration
num_iterations num_trees
num_iterations num_tree
LGBMClassifier(objective='objective')
LGBMClassifier(objective='application')
LGBMClassifier(objective='app')
LGBMClassifier(objective='objective_type')
pre_partition pre_partition
pre_partition is_pre_partition
two_round use_two_round_loading
two_round two_round
two_round two_round_loading
verbosity verbose
verbosity verbosity
weight_column weight_column
weight_column weight
The details of the error are as follows (when set verbose).
KeyErrorTraceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj, include, exclude)
968
969 if method is not None:
--> 970 return method(include=include, exclude=exclude)
971 return None
972 else:
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in _repr_mimebundle_(self, **kwargs)
461 def _repr_mimebundle_(self, **kwargs):
462 """Mime bundle used by jupyter kernels to display estimator"""
--> 463 output = {"text/plain": repr(self)}
464 if get_config()["display"] == 'diagram':
465 output["text/html"] = estimator_html_repr(self)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in __repr__(self, N_CHAR_MAX)
277 n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW)
278
--> 279 repr_ = pp.pformat(self)
280
281 # Use bruteforce ellipsis when there are a lot of non-blank characters
/opt/conda/lib/python3.7/pprint.py in pformat(self, object)
142 def pformat(self, object):
143 sio = _StringIO()
--> 144 self._format(object, sio, 0, 0, {}, 0)
145 return sio.getvalue()
146
/opt/conda/lib/python3.7/pprint.py in _format(self, object, stream, indent, allowance, context, level)
159 self._readable = False
160 return
--> 161 rep = self._repr(object, context, level)
162 max_width = self._width - indent - allowance
163 if len(rep) > max_width:
/opt/conda/lib/python3.7/pprint.py in _repr(self, object, context, level)
391 def _repr(self, object, context, level):
392 repr, readable, recursive = self.format(object, context.copy(),
--> 393 self._depth, level)
394 if not readable:
395 self._readable = False
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in format(self, object, context, maxlevels, level)
168 def format(self, object, context, maxlevels, level):
169 return _safe_repr(object, context, maxlevels, level,
--> 170 changed_only=self._changed_only)
171
172 def _pprint_estimator(self, object, stream, indent, allowance, context,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _safe_repr(object, context, maxlevels, level, changed_only)
412 recursive = False
413 if changed_only:
--> 414 params = _changed_params(object)
415 else:
416 params = object.get_params(deep=False)
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _changed_params(estimator)
96 init_params = {name: param.default for name, param in init_params.items()}
97 for k, v in params.items():
---> 98 if (repr(v) != repr(init_params[k]) and
99 not (is_scalar_nan(init_params[k]) and is_scalar_nan(v))):
100 filtered_params[k] = v
KeyError: 'verbosity'
Other features, such as learning and reasoning, work just fine.
This is probably due to a version of scikit learn. The version that gave the previous error was 0.23.0. Downgrade this to 0.22.1, and it works fine.
I'm getting the same KeyError (_changed_params raised in sklearn/utils/_pprint.py) but for a different LGBM hyperparameter - num_iterations when trying to print a model object (LGBM regressor or classifier model, freshly trained or imported from a picke) which worked many times before for the same binary object until the latest upgrade in scikit-learn.
We train many LGBM models versions each day (thank you for their stability!) and preserve both their model objects and the output of pip freeze, so I can rather confidently pinpoint the dependencies versions where a bug was introduced.
So now it looks like the compatibility issue was introduced only in scikit-learn==0.23.0 (but up until version 0.22.2.post1 it was not present) :
First sklearn version with the error:
(modeling pipeline run from 2020-05-18 19:28:06)
scikit-learn==0.23.0
Last scikit-learn version without the error:
(modeling pipeline run from 2020-05-14 18:55:47)
scikit-learn==0.22.2.post1
Detailed error message - classifier:
1353 # compute single model evaluation metrics:
1354 # - on the test set
-> 1355 print("A single deployment LightGBM model (%s) had the following metrics on the test set:" % model)
1356 if objective == "regression":
1357 (mse, rmse, mae, r2) = get_eval_metrics(actual=yTest, pred=y_pred_test_set, objective=objective)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in __repr__(self, N_CHAR_MAX)
277 n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW)
278
--> 279 repr_ = pp.pformat(self)
280
281 # Use bruteforce ellipsis when there are a lot of non-blank characters
/opt/conda/lib/python3.7/pprint.py in pformat(self, object)
142 def pformat(self, object):
143 sio = _StringIO()
--> 144 self._format(object, sio, 0, 0, {}, 0)
145 return sio.getvalue()
146
/opt/conda/lib/python3.7/pprint.py in _format(self, object, stream, indent, allowance, context, level)
159 self._readable = False
160 return
--> 161 rep = self._repr(object, context, level)
162 max_width = self._width - indent - allowance
163 if len(rep) > max_width:
/opt/conda/lib/python3.7/pprint.py in _repr(self, object, context, level)
391 def _repr(self, object, context, level):
392 repr, readable, recursive = self.format(object, context.copy(),
--> 393 self._depth, level)
394 if not readable:
395 self._readable = False
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in format(self, object, context, maxlevels, level)
168 def format(self, object, context, maxlevels, level):
169 return _safe_repr(object, context, maxlevels, level,
--> 170 changed_only=self._changed_only)
171
172 def _pprint_estimator(self, object, stream, indent, allowance, context,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _safe_repr(object, context, maxlevels, level, changed_only)
412 recursive = False
413 if changed_only:
--> 414 params = _changed_params(object)
415 else:
416 params = object.get_params(deep=False)
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _changed_params(estimator)
96 init_params = {name: param.default for name, param in init_params.items()}
97 for k, v in params.items():
---> 98 if (repr(v) != repr(init_params[k]) and
99 not (is_scalar_nan(init_params[k]) and is_scalar_nan(v))):
100 filtered_params[k] = v
KeyError: 'num_iterations'
Detailed error message - regressor:
KeyError Traceback (most recent call last)
<ipython-input-296-12ed8596d301> in <module>
2
3 saved_premium_model = joblib.load(saved_prem_model_path+saved_prem_model_file, mmap_mode=None)
----> 4 print(saved_premium_model)
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in __repr__(self, N_CHAR_MAX)
277 n_max_elements_to_show=N_MAX_ELEMENTS_TO_SHOW)
278
--> 279 repr_ = pp.pformat(self)
280
281 # Use bruteforce ellipsis when there are a lot of non-blank characters
/opt/conda/lib/python3.7/pprint.py in pformat(self, object)
142 def pformat(self, object):
143 sio = _StringIO()
--> 144 self._format(object, sio, 0, 0, {}, 0)
145 return sio.getvalue()
146
/opt/conda/lib/python3.7/pprint.py in _format(self, object, stream, indent, allowance, context, level)
159 self._readable = False
160 return
--> 161 rep = self._repr(object, context, level)
162 max_width = self._width - indent - allowance
163 if len(rep) > max_width:
/opt/conda/lib/python3.7/pprint.py in _repr(self, object, context, level)
391 def _repr(self, object, context, level):
392 repr, readable, recursive = self.format(object, context.copy(),
--> 393 self._depth, level)
394 if not readable:
395 self._readable = False
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in format(self, object, context, maxlevels, level)
168 def format(self, object, context, maxlevels, level):
169 return _safe_repr(object, context, maxlevels, level,
--> 170 changed_only=self._changed_only)
171
172 def _pprint_estimator(self, object, stream, indent, allowance, context,
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _safe_repr(object, context, maxlevels, level, changed_only)
412 recursive = False
413 if changed_only:
--> 414 params = _changed_params(object)
415 else:
416 params = object.get_params(deep=False)
/opt/conda/lib/python3.7/site-packages/sklearn/utils/_pprint.py in _changed_params(estimator)
96 init_params = {name: param.default for name, param in init_params.items()}
97 for k, v in params.items():
---> 98 if (repr(v) != repr(init_params[k]) and
99 not (is_scalar_nan(init_params[k]) and is_scalar_nan(v))):
100 filtered_params[k] = v
KeyError: 'num_iterations'
I can now confirm that indeed the problem was introduced in scikit-learn==0.23.0, but downgrading the package to scikit-learn==0.22.2.post1 eliminates it.
I think @nyk510 that it should be an issue for scikit-learn devs too - will you report it there or shall I?
@mirekphd
That's certainly true.
I've created an issue here:D It would be great if you could add any missing parts to the issue.
This will be fixed in 0.23.1 which should be out soon
In fact 0.23.1 is already out so the issue can probably be closed (it's a scikit-learn issue anyway)
@NicolasHug Thank you for your polite comment! It might be better to describe the fact that the scikit-learn version may not work in the requirements. There may be others who have the same trouble as me.
This will be fixed in 0.23.1 which should be out soon
@nyk510, the issue negatively impacting lightgbm was indeed fixed in scikit-learn==0.23.1, so the issue here can be probably closed.
On the other hand, the fixed version of scikit-learn 0.23.1 does not get installed by default by pip in my containers, and it has to be pinned or else the buggy version 0.23.0 gets installed by default (probably other packages that use scikit-learn did not get upgraded yet).
So I suppose it takes time and the buggy version to be rooted out in all dependencies, and it cannot be simply taken down from PyPI (probably will need to stay there for a long time).
the fixed version of scikit-learn 0.23.1 does not get installed by default by pip
This is surprising, what are you using? On a clean env, pip install scikit-learn will install 0.23.1. If you already have a scikit-learn version installed you'd need to use the --upgrade flag.
the fixed version of scikit-learn 0.23.1 does not get installed by default by pip
This is surprising, what are you using? On a clean env,
pip install scikit-learnwill install 0.23.1. If you already have a scikit-learn version installed you'd need to use the--upgradeflag.
Right, but I thought that the need for a forced upgrade does not apply to docker builds... apparently the use of docker's cache replicates the situation of a previous package version being already "installed" (and of all dependencies ">=" version requirements being already satisfied).
When I look into Jenkins log for a build triggered by my unpinning of scikit-learn (from previous working version 0.22.2.post1), I see that package installations layers with both pip and conda were retrieved from docker cache, restoring the buggy version 0.23.0:
Step 30/50 : RUN pip install -r /tmp/python-packages/pypi-packages.txt --no-cache-dir
---> Using cache
---> 0975530a5deb
Step 31/50 : RUN conda install --channel conda-forge --yes --file /tmp/python-packages/conda-forge-packages.txt
---> Using cache
---> 59b63c286bb4
```
But wait, there's more.:) During the next automatic build of this python container 15 minutes later (this time a time-triggered one intended to check for available package and security upgrades) did correctly identify that a newer version of scikit-learn is available, but since only build has changed, the script ignored this change and left the buggy version 0.23.0. So I suppose we had the well known scenario "it's not a bug, it's a feature":)
scikit-learn :
- version installed: 0.23.0
- latest available: 0.23.1
Note: ignoring build differences
- package upgradeable: False
The caching bug is pretty persistent, and you cannot safely unpin scikit-learn now on my build server, because it would revert to the buggy version (tried first pinning 0.23.1 and then unpinning, but it reverted to the cached layer with 0.23.0). Considering this I decided to stop ignoring build diffs during version checks in my build pipeline, as these seemingly less-than-minor changes are sometimes essential bug fixes resolving issues that break ML modeling pipelines for several people.
Most helpful comment
This will be fixed in 0.23.1 which should be out soon
https://github.com/scikit-learn/scikit-learn/issues/17206