Hi,
I am running Lightgbm in R (R version 4.0.0, Lightgbm version 2.3.2), with metric = "mape". I see that when I run cross validation (lgb.cv), the best iteration is always the one with the maximum value of mape.
For example, I run cross validation with nrounds = 200. Last value:
best.iter <- lgb.cv$best_iter
Public:
best_iter: 200
best_score: 0.476690009429843
boosters: list
initialize: function (x)
record_evals: list
reset_parameter: function (new_params)
However, if I see other values,
In this case, the best iteration should be 151 and not 200.
I do not see this behavior in python.
Am I doing something wrong?
Thank you.
Thanks for the report @parayamelo, and sorry for the issue !
I will try to reproduce this behavior on a sample dataset. If I can't reproduce it, I'll need some more information from you.
Thank you @jameslamb! Yes, please, let me know if I can help you with something else.
@jameslamb We should make this regex to not catch MAPE
https://github.com/microsoft/LightGBM/blob/1f3e72c43ca8485eeba988738ecb0e977c7977f1/R-package/R/lgb.Booster.R#L599
@jameslamb We should make this regex to not catch
MAPE
https://github.com/microsoft/LightGBM/blob/1f3e72c43ca8485eeba988738ecb0e977c7977f1/R-package/R/lgb.Booster.R#L599
:scream: thank you for pointing that out. I won't have time to investigate for a few more hours, that will help!
@parayamelo , @StrikerRUS was right, the issue was https://github.com/microsoft/LightGBM/issues/3099#issuecomment-630401721
mape conflicted with map (mean average precision, used in ranking tasks) and gets classified as a maximization problem.
I think I've fixed this in #3101. If you'd like to test, you can build from that pull request:
git clone https://github.com/jameslamb/LightGBM.git
cd LightGBM
git fetch origin fix/r-metrics
git checkout fix/r-metrics
Rscript build_r.R
Thank you very much for reporting this issue and for using LightGBM!
Thanks @jameslamb for the fix and @StrikerRUS for pointing out where the error was! I will test it and come back to you.
@jameslamb I tested the solution on 2 different datasets, in one that I trusted it worked, in the other one it worked, but I think there is a mistake on how I construct the dataset, because it finds the lowest mape at the first iteration. I will use this branch for the moment, will this be fix for future releases? Thank you once again!
Glad it's working!
I will use this branch for the moment, will this be fix for future releases
Yes absolutely. #3101 will be merged in the next few days. It just needs to go through our normal review process.
Great! Thanks again, I will close the issue then.
@parayamelo thanks! I am actually going to leave this open. Until that pull request is merged, other users might experience the same problem and then open a new issue if they don't see this one.
@parayamelo this fix has been merged to master, thanks again for using LightGBM and for reporting this issue!
@jameslamb Thank you!
Most helpful comment
:scream: thank you for pointing that out. I won't have time to investigate for a few more hours, that will help!