Lightgbm: API design choice: why not enable early stopping by default?

Created on 18 Jul 2019 · 3Comments · Source: microsoft/LightGBM

In scikit-learn https://github.com/scikit-learn/scikit-learn/issues/14303 we're considering enabling early stopping by default.

We're curious about why you chose not to enable it by default in LightGBM, considering that early stopping is almost always useful in practice?

Thanks!

question

Source

NicolasHug

Most helpful comment

I agree with @jameslamb .
As an ML model tool, we focus on the training/prediction.
Enable ES by default will need the data partition in our side, And I believe there are many partition methods for different tasks, implementing them is duplicated.

Another point is the consistency with ML domain knowledge and other tools.
The concept of the training set, validation set, and test set are the basic knowledge in ML domain, and most tools will distinguish them.
So I don't think to convert a user-passed training set to training set + validation set is a good idea.

guolinke on 19 Jul 2019

👍2

All 3 comments

Hi @NicolasHug , interesting question! I'm the newest of the LightGBM team members, so I don't have the historical context for that decision. I think it would be best to have @guolinke or @StrikerRUS comment.

One thing I do know is that we require you to explicitly pass in your validation set if you want to take advantage of early stopping, So it's possible that we don't have it enabled by default because we wanted to give users finer control over the dataset they validate on than just saying "x% of rows, randomly held out".