I agree this would be a useful addition to ML.NET. We provide a wrapper for LightGBM but this parameter is not exposed. I will file this for our triage team to review and prioritize.
Here are links from lightGBM for adding monotone constraints:
Issue filed here: https://github.com/Microsoft/LightGBM/issues/14
Committed here: https://github.com/Microsoft/LightGBM/pull/1314
The version of lightGBM we are using in ml.net is 2.2.1.1 -- need to confirm if this version contains the support for monotone_constraints.
@daholste: Assuming I'm understanding this parameter correctly, this can also help with model stacking. Currently, LightGBM is allowed to map the output of the sub-models in the stack without the constraint that "as the sub-models score increases/decreases, so should the final score". Hence it could map as f(x) = ( x < 3 ? 1.0 : (x < 4 ? 0.0 : 2.0 )). When the sub-models correlate well with the label, we likely would benefit from a monotonically increasing meta-model for stacking.
Update the LightGBM and expose the arg. Note if this fixes #1625 as well.
@glebuk: No need to update LightGBM for monotone_constraints.
Our current LightGBM version is from 3mo ago; the monotone_constraints was added 9 mo ago.
So our current version of LightGBM should work without updating.
Work item should be: _Expose the monotone_constraints parameter of LightGBM_
Hi @justinormont, @glebuk and @petterton,
LightGBM allows for the setting of monotone constraints for each feature. For example, if you have a column called Features that is made up of 10 features, you can specify what constraint to use in the order of the features by setting the constraint value of either 1 for increasing, 0 for no constraint and -1 for decreasing. So 1,-1,0,0,0..0 would apply an increasing constraint to the first feature, decreasing constraint to the second feature and no constraint to the remaining features.
While having a per feature control is probably very powerful, I start to think about how does the user specify this in cases where they have a large number of features -- even with 20 or 30 features, no one wants to type in an array that long as that is not only tedious but error prone.
Do we need to control the constraint at a per feature level? Or would applying the same constraint to all features suffice?
I think all or nothing is ok.
I'd like to have control at the per column level, but I'm not sure how to make it user friendly.
My specific use case is, I have a stacked model. The sub-models are rational, therefore I'd like them to be positive-monotone when combined to give the final score; I also want to feed some raw features to the final learner as this stacking method shows promise. To get the same results, I currently duplicate the raw features as NegRawFeat = RawFeat * -1, then xf=Concat{Features:SubModelScores,NegRawFeat,RawFeat} tr=LogisticRegression{nn=+}. This accomplishes the goal (for LR, though not for LightGBM), though looses the slot names making feature importance difficult.
Two ways that are slightly user friendly:
Features, PosFeatures, NegFeatures }, where Features is the normal as current. Then users can concatenate the columns into PosFeatures which they want to be positive-monotone. I think inventing new input columns is confusion to users, so this is bad.Features column, as current, then the user specifies the names of the column (within) the Features column that they want as positive/negative. I don't think it's possible to locate the slots within the Features column corresponding to the user's specified column. Perhaps match on slot names?After talking with Tom, this needs more thought through on how this can be made more user friendly. The PR that I currently have here: #2330 has the user specifying the constraints based upon indices. This was primarily to handle the way LightGBM works, but ML.Net does not work this way and using indices after concatenating columns into a Features column does not give a clear way to know which indices map to which features.
Tom recommended to do something similar to how Categorical Features works and somehow we manage the mapping of feature name to indices (such as in the metadata). This would allow the user to specify the constraints based upon a name (which would map to indices) rather than the specific indices.
Also from talking with Tom, this work can be done post v1.0 as it would not affect any API changes.
My vote is to pause on this for now and reinvestigate post 1.0.
@shauheen and @TomFinley feel free to comment if you have anything additional.
TLDR; For the moment, I'd be quite happy to have purely positive / purely negative for all slots.
I agree w/ @TomFinley. I have similar concerns about the usability.
I would also like the style similar to what I think @TomFinley is purposing, the Categorical Features style, is this similar to the second style on my list; I thought it would be _too hard_; but if Tom thinks it's doable it seems like a great longer term solution.
https://github.com/dotnet/machinelearning/issues/1651#issuecomment-456688022
- I don't think it is possible, but another route would be to take in a single
Featurescolumn, as current, then the user specifies the names of the column (within) theFeaturescolumn that they want as positive/negative. I don't think it's possible to locate the slots within theFeaturescolumn corresponding to the user's specified column. Perhaps match on slot names?
For the moment, I'd be quite happy to have purely positive / purely negative for all slots. This addresses AutoML team's ability to use it for model stacking:
https://github.com/dotnet/machinelearning/issues/1651#issuecomment-449535672,
@daholste: Assuming I'm understanding this parameter correctly, this can also help with model stacking. Currently, LightGBM is allowed to map the output of the sub-models in the stack without the constraint that "as the sub-models score increases/decreases, so should the final score". Hence it could map as
f(x) = ( x < 3 ? 1.0 : (x < 4 ? 0.0 : 2.0 )). When the sub-models correlate well with the label, we likely would benefit from a monotonically increasing meta-model for stacking.https://github.com/dotnet/machinelearning/issues/1651#issuecomment-456688022
My specific use case is, I have a stacked model. The sub-models are rational, therefore I'd like them to be positive-monotone when combined to give the final score; I also want to feed some raw features to the final learner as this stacking method shows promise. To get the same results, I currently duplicate the raw features as
NegRawFeat = RawFeat * -1, thenxf=Concat{Features:SubModelScores,NegRawFeat,RawFeat} tr=LogisticRegression{nn=+}. This accomplishes the goal (for LR, though not for LightGBM), though looses the slot names making feature importance difficult.
/cc @shauheen