Lightgbm: [Feature] Support for monotonic constraints?

Created on 18 Oct 2016 · 21Comments · Source: microsoft/LightGBM

Are you planning support for monotonic constraints? See e.g. here https://github.com/dmlc/xgboost/issues/1514

feature request help wanted

Source

alexvorobiev

👍4

Most helpful comment

@aldanor any updates ?

guolinke on 31 Jan 2018

👍7

All 21 comments

I'm pasting the snippets for the monotonic constraints here

IF (split is a continuous variable and monotonic)

THEN take average of left and right child nodes if current split is used

IF monotonic increasing THEN CHECK left average <= right average

IF monotonic decreasing THEN CHECK left average >= right average

@alexvorobiev , do you have referable papers for this features?

chivee on 18 Oct 2016

@chivee I only have the reference to the R GBM package https://cran.r-project.org/package=gbm

alexvorobiev on 18 Oct 2016

@alexvorobiev , thanks for your sharing. I'm trying to get the idea behind this method.

chivee on 19 Oct 2016

Note that the given pseudo code only ensures the split to be in the correct order and not the whole model as a later split could lead the model to be non monotonic

AbdealiJK on 22 Sep 2017

Any thoughts on this?

aldanor on 20 Jan 2018

👍2

From practical perspective (outside kaggle-world!), this feature would be extremely helpful in many applications where reasonable model behavior is relevant.

mayer79 on 21 Jan 2018

@guolinke Would you be able to advise how to approach this and whether it's feasible? I.e., where should it belong, would it be sufficient to implement it just somewhere in feature_histogram.hpp? I guess FeatureMetainfo could just contain the -1/0/1 constraint then.

Here's the meat of the implementation in XGBoost, for reference: https://github.com/dmlc/xgboost/blob/master/src/tree/param.h#L422 -- all of it pretty much contained in CalcSplitGain(), plus CalcWeight(). Where would stuff like this go in LightGBM?

aldanor on 22 Jan 2018

@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?

following may is useful:

The split gain calculation: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L291-L297

The leaf-output calculation:
https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L305-L308

guolinke on 22 Jan 2018

@guolinke I may add some links here about the implementation in XGBoost:
https://xgboost.readthedocs.io/en/latest//tutorials/monotonic.html
https://github.com/dmlc/xgboost/issues/1514
https://github.com/dmlc/xgboost/pull/1516

StrikerRUS on 22 Jan 2018

@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?

@guolinke Monotonic constraints may be a very important requirement for the resulting models. For many reasons: e.g., as noted above, there could be domain knowledge that must be respected - e.g., in insurance and risk management problems.

How about we all cooperate and make this work?

aldanor on 23 Jan 2018

@aldanor very cool, would like to work together with it.

guolinke on 23 Jan 2018

It seems the MC(Monotonic constraints) could be cumulative, that is, if both model A and B is MC, then A+B is MC.
So we only need to enable MC in decision tree learning.

combine @chivee 's pseudo code and @AbdealiJK 's suggestion.

I think the algorithm is:


min_value = node.min_value
max_value = node.max_value

check(min_value <= split.left_output) 
check(min_value <= split.right_output)
check(max_value >= split.left_otput)
check(max_value >= split.right_output)
mid = (split.left_output + split.right_output) / 2;

if (split.feature is monotonic increasing) {
  check(split.left_output <= split.right_output)
  node.left_child.set_max_value(mid)
  node.right_child.set_min_value(mid)
}
if (split.feature is monotonic decreasing ) {
  check(split.left_output >= split.right_output)
  node.left_child.set_min_value(mid)
  node.right_child.set_max_value(mid)
}

guolinke on 23 Jan 2018

@aldanor would you like to create a PR first ? I can provide my help in the PR.

guolinke on 23 Jan 2018

@guolinke I will give it a try, yep. Your suggested algorithm in the snippet above looks fine, that's kind of what like xgboost does (in exact mode though, not histogram; do you think there would be any complications here because of binning?)

Where would this code belong then, treelearner/feature_histogram.hpp? (I still have to read through most of the code).

Edit: what do you mean by check(...) here? E.g., if (!(...)) { return; }?

aldanor on 23 Jan 2018

@aldanor
The check means return gain with -inf if didn't meet the condition, as a result, that split will not be chosen.
I think there is not different for the MC in binned algorithm.

We need to update the calculation of gain: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L354-L357 and https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L415-L418 .

We may need to wrap these to a new function, and implement both non-constraint and MC for them.

guolinke on 23 Jan 2018

@aldanor any updates ?

guolinke on 31 Jan 2018

👍7

@guolinke @chivee

I would also be very interested in seeing this feature implemented in LightGBM. As aldanor stated above the Pseudo-code suggested earlier is correct and is how XGBoost implements monotonic constraints.

As such this feature should be fairly trivial to implement for someone with an intimate knowledge of the codebase.

redditur on 6 Mar 2018

👍2

< removed due to irrelevance>

j-mark-hou on 23 Mar 2018

@j-mark-hou
there is one bug in your code, refer to @AbdealiJK `s comment and my algorithm below.

guolinke on 23 Mar 2018

got it, I'll wait for someone with a better understanding of the codebase to implement this then.

j-mark-hou on 23 Mar 2018

you can try #1314

guolinke on 16 Apr 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Lightgbm not giving same results for same environments on different computers

MuhammedBuyukkinaci · 3Comments

bug/segfault when using add_features_from and somewhat sparse data

ivinogra · 3Comments

R Package lgb.Dataset.construct() throwing api error: cannot open data file k

hack-r · 4Comments

LightGBMError: b'Wrong size of feature_names'

ahbon123 · 4Comments

feval metric score ignored for early stopping with Python API

ClimbsRocks · 3Comments