Optuna: LightGBMTunerCV seems to not handle user specified CV-folds

Created on 24 Aug 2020 · 3Comments · Source: optuna/optuna

Expected behavior

When users specify training and validation folds in the manner that the basic lightgbm.cv function accepts, this should (from what I understand work)

Environment

Optuna version: 2.0.0
Python version: 3.6.9
OS: Google Collab/Linux
(Optional) Other libraries and their versions: LightGBM 2.3.1

Error messages, stack traces, or logs

0%|          | 0/7 [00:00<?, ?it/s]
feature_fraction, val_score: inf:   0%|          | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:41:09,973] Trial 0 failed because of the following error: ValueError('For early stopping, at least one dataset and eval metric is required for evaluation',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
    result = func(trial)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 302, in __call__
    cv_results = lgb.cv(self.lgbm_params, self.train_set, **self.lgbm_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/engine.py", line 576, in cv
    evaluation_result_list=res))
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 221, in _callback
    _init(env)
  File "/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py", line 191, in _init
    raise ValueError('For early stopping, '
ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-0ec8edbe946a> in <module>()
      2                      label = np.array( data['target'] ).flatten())
      3 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
----> 4 tuner.run()

10 frames
/usr/local/lib/python3.6/dist-packages/lightgbm/callback.py in _init(env)
    189             return
    190         if not env.evaluation_result_list:
--> 191             raise ValueError('For early stopping, '
    192                              'at least one dataset and eval metric is required for evaluation')
    193 

ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

As well as (on the second version without early stopping, which I think is an issue that's already reported in another issue?):

0%|          | 0/7 [00:00<?, ?it/s]


feature_fraction, val_score: inf:   0%|          | 0/7 [00:00<?, ?it/s][W 2020-08-24 15:42:03,262] Trial 0 failed because of the following error: KeyError('l1-mean',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/optuna/study.py", line 709, in _run_trial
    result = func(trial)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 304, in __call__
    val_scores = self._get_cv_scores(cv_results)
  File "/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py", line 294, in _get_cv_scores
    val_scores = cv_results["{}-mean".format(metric)]
KeyError: 'l1-mean'
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-21-942a30076787> in <module>()
      1 tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
----> 2 tuner.run()

8 frames
/usr/local/lib/python3.6/dist-packages/optuna/integration/_lightgbm_tuner/optimize.py in _get_cv_scores(self, cv_results)
    292 
    293         metric = self._get_metric_for_objective()
--> 294         val_scores = cv_results["{}-mean".format(metric)]
    295         return val_scores
    296 

KeyError: 'l1-mean'

Steps to reproduce

Get a Google Collab, then run the code below (extra installations beyond default collab explicitly via !pip below)

Reproducible examples (optional)

!pip install lightgbm==2.3.1
import numpy as np
import pandas as pd
from sklearn.model_selection import GroupKFold

import lightgbm as lgb
lgb.__version__
np.random.seed(123)
data = pd.DataFrame({'var1': np.random.normal(loc=0, scale=1, size=100),
                    'var2': np.random.normal(loc=0, scale=1, size=100),
                    'var3': np.random.normal(loc=0, scale=1, size=100),
                     'testfold': np.random.choice(a=np.repeat([x for x in range(5)], 20), size=100, replace=False)})
data['target'] = 7 + 0.1*data['var1'] + 1.0*data['var2'] + 5.0*data['var3'] - 2.0*data['var1']*data['var2'] + np.random.normal(loc=0, scale=0.5, size=100)
data.head()
params = {
    'objective': 'l1',
    'metric': 'l1',    
    "verbosity": -1,
    "boosting_type": "gbdt",
    'seed': 1979
    }

dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
                     label = np.array( data['target'] ).flatten())

folds = GroupKFold().split(np.array( data[ ['var1', 'var2', 'var3'] ] ),
                            np.array( data['target'] ).flatten(), 
                            np.array(data['testfold']).flatten())
lgb.cv(params, dtrain, folds=folds, verbose_eval=100) # This is how base lightgbm does this, and it works fine


!pip install optuna
import optuna.integration.lightgbm as lgb

dtrain = lgb.Dataset(data= np.array( data[ ['var1', 'var2', 'var3'] ] ),
                     label = np.array( data['target'] ).flatten())
tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, early_stopping_rounds=100, folds=folds)
tuner.run()

tuner = lgb.LightGBMTunerCV(params, dtrain, verbose_eval=100, folds=folds)
tuner.run()

Additional context (optional)

Same issue in Kaggle kernels, but thought it would be easier to share a simplified Collab example.

bug stale

Source

bjoernholzhauer

Most helpful comment

Thank you for your bug report. I'm not aware of the first issue. I'll investigate it.

As well as (on the second version without early stopping, which I think is an issue that's already reported in another issue?):

I think it is the same issue as #1602. The cause is the lack of the metric mapping in LightGBMTunerCV and @thigm85 is working on it.

toshihikoyanase on 25 Aug 2020

👀2 👍2

All 3 comments

Thank you for your bug report. I'm not aware of the first issue. I'll investigate it.

As well as (on the second version without early stopping, which I think is an issue that's already reported in another issue?):

I think it is the same issue as #1602. The cause is the lack of the metric mapping in LightGBMTunerCV and @thigm85 is working on it.

toshihikoyanase on 25 Aug 2020

👀2 👍2

Yes, you are right, #1602 was indeed what I had seen before (but did not find, again).

bjoernholzhauer on 25 Aug 2020

This issue has not seen any recent activity.

github-actions[bot] on 9 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Visualization not working

djKooks · 4Comments

Pruner doesn't prune at the first step.

himkt · 4Comments

`plot_parallel_coordinate` connects data points from different trials when `objective` function has conditional branches.

ytknzw · 3Comments

njobs

uvinetz · 3Comments

Release Tasks for v2.0.0.

toshihikoyanase · 3Comments