Xgboost: [PYTHON] Make Feature Name Validation Optional

Created on 22 Sep 2016 · 4Comments · Source: dmlc/xgboost

While feature name validation could be helpful in some cases, it also brings trouble when the new data does not contain the corresponding feature. This could happen frequently in sparse data and is not the desired behavior. Frequently verify feature names also harms speed during training phase when the data is sparse.

I would propose to remove the feature name validation and make it optional. I would like to understand consequence and see if who want to make a PR on it. @sinhrks @terrytangyuan @phunterlau

I am hoping we can resolve https://github.com/dmlc/xgboost/issues/1238 completely and add regression tests to prevent this from happening

Source

tqchen

👍2

Most helpful comment

I agree with this change.

I don't have much spare time to PR this recently though but I think it should be fairly straightforward to make this optional.

I don't think it's necessary to validate it during each update and boost call though. I feel like we only need to validate it once during initialization of DMatrix and Booster. We should definitely explicitly notify the users the consequences and trade-offs of skipping validations.