Are you planning support for monotonic constraints? See e.g. here https://github.com/dmlc/xgboost/issues/1514
I'm pasting the snippets for the monotonic constraints here
IF (split is a continuous variable and monotonic)
THEN take average of left and right child nodes if current split is used
IF monotonic increasing THEN CHECK left average <= right average
IF monotonic decreasing THEN CHECK left average >= right average
@alexvorobiev , do you have referable papers for this features?
@chivee I only have the reference to the R GBM package https://cran.r-project.org/package=gbm
@alexvorobiev , thanks for your sharing. I'm trying to get the idea behind this method.
Note that the given pseudo code only ensures the split to be in the correct order and not the whole model as a later split could lead the model to be non monotonic
Any thoughts on this?
From practical perspective (outside kaggle-world!), this feature would be extremely helpful in many applications where reasonable model behavior is relevant.
@guolinke Would you be able to advise how to approach this and whether it's feasible? I.e., where should it belong, would it be sufficient to implement it just somewhere in feature_histogram.hpp? I guess FeatureMetainfo could just contain the -1/0/1 constraint then.
Here's the meat of the implementation in XGBoost, for reference: https://github.com/dmlc/xgboost/blob/master/src/tree/param.h#L422 -- all of it pretty much contained in CalcSplitGain(), plus CalcWeight(). Where would stuff like this go in LightGBM?
@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?
following may is useful:
The split gain calculation: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L291-L297
The leaf-output calculation:
https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L305-L308
@guolinke I may add some links here about the implementation in XGBoost:
https://xgboost.readthedocs.io/en/latest//tutorials/monotonic.html
https://github.com/dmlc/xgboost/issues/1514
https://github.com/dmlc/xgboost/pull/1516
@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?
@guolinke Monotonic constraints may be a very important requirement for the resulting models. For many reasons: e.g., as noted above, there could be domain knowledge that must be respected - e.g., in insurance and risk management problems.
How about we all cooperate and make this work?
@aldanor very cool, would like to work together with it.
It seems the MC(Monotonic constraints) could be cumulative, that is, if both model A and B is MC, then A+B is MC.
So we only need to enable MC in decision tree learning.
combine @chivee 's pseudo code and @AbdealiJK 's suggestion.
I think the algorithm is:
min_value = node.min_value
max_value = node.max_value
check(min_value <= split.left_output)
check(min_value <= split.right_output)
check(max_value >= split.left_otput)
check(max_value >= split.right_output)
mid = (split.left_output + split.right_output) / 2;
if (split.feature is monotonic increasing) {
check(split.left_output <= split.right_output)
node.left_child.set_max_value(mid)
node.right_child.set_min_value(mid)
}
if (split.feature is monotonic decreasing ) {
check(split.left_output >= split.right_output)
node.left_child.set_min_value(mid)
node.right_child.set_max_value(mid)
}
@aldanor would you like to create a PR first ? I can provide my help in the PR.
@guolinke I will give it a try, yep. Your suggested algorithm in the snippet above looks fine, that's kind of what like xgboost does (in exact mode though, not histogram; do you think there would be any complications here because of binning?)
Where would this code belong then, treelearner/feature_histogram.hpp? (I still have to read through most of the code).
Edit: what do you mean by check(...) here? E.g., if (!(...)) { return; }?
@aldanor
The check means return gain with -inf if didn't meet the condition, as a result, that split will not be chosen.
I think there is not different for the MC in binned algorithm.
We need to update the calculation of gain: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L354-L357 and https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L415-L418 .
We may need to wrap these to a new function, and implement both non-constraint and MC for them.
@aldanor any updates ?
@guolinke @chivee
I would also be very interested in seeing this feature implemented in LightGBM. As aldanor stated above the Pseudo-code suggested earlier is correct and is how XGBoost implements monotonic constraints.
As such this feature should be fairly trivial to implement for someone with an intimate knowledge of the codebase.
< removed due to irrelevance>
@j-mark-hou
there is one bug in your code, refer to @AbdealiJK `s comment and my algorithm below.
got it, I'll wait for someone with a better understanding of the codebase to implement this then.
you can try #1314
Most helpful comment
@aldanor any updates ?