While feature name validation could be helpful in some cases, it also brings trouble when the new data does not contain the corresponding feature. This could happen frequently in sparse data and is not the desired behavior. Frequently verify feature names also harms speed during training phase when the data is sparse.
I would propose to remove the feature name validation and make it optional. I would like to understand consequence and see if who want to make a PR on it. @sinhrks @terrytangyuan @phunterlau
I am hoping we can resolve https://github.com/dmlc/xgboost/issues/1238 completely and add regression tests to prevent this from happening
I agree with this change.
I don't have much spare time to PR this recently though but I think it should be fairly straightforward to make this optional.
I don't think it's necessary to validate it during each update and boost call though. I feel like we only need to validate it once during initialization of DMatrix
and Booster
. We should definitely explicitly notify the users the consequences and trade-offs of skipping validations.
Is this resolved?
ditto. waiting for this to be resolved. still running 0.4a30
...
Closing, since #3323 addresses this issue.
Most helpful comment
I agree with this change.
I don't have much spare time to PR this recently though but I think it should be fairly straightforward to make this optional.
I don't think it's necessary to validate it during each update and boost call though. I feel like we only need to validate it once during initialization of
DMatrix
andBooster
. We should definitely explicitly notify the users the consequences and trade-offs of skipping validations.