i don`t know that when i set xgboost(data, label, para, missing = nan) , what missing = nan means.
just missing = nan means that filling NA into the NAN? or filling empty valu into the NAN?
You specify your _nan_ value this way. Let's say you did something like data.fillna(-999), then missing = -999 would be correct.
so you mean that if i set the missing= 100, NA fills in 1000?
No, xgb does not change the data at all. It just treats the specified _missing_ value as a special one.
hmm.... could you please make concrete the special one? i don`t understand
is it means that missing = nan is treating NA to nan ? no fill in nan?
Let's say you have the following numpy array: data = [1,2, np.nan, 4]. Xgb will complain about the np.nan within that array if you call DMatrix(data). So, what you do is replacing np.nan by an arbitrary value, let's say "-999". Hence data becomes: data_new = [1,2, -999, 4]. Now you tell xgb, that you encoded missing values by "-999": DMatrix(data_new, missing=-999).
sorry for my ignorance... i dont know np.nan and numpy array ... I
m R user
@tngks66 So say you data has 99999, which in real life means this data is missing for your problem, if you set missing = 99999
, xgb will treat this special value as missing value. It will treat 99999 as NA.
good explains
Most helpful comment
sorry for my ignorance... i don
t know np.nan and numpy array ... I
m R user