Hi,
Recently pandas introduced nullable integer dtype:
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
I would expect numpy to automatically convert such arrays into a float type and fill with np.nan. However, for some reason it converts it to object:
pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values.dtype
which leads to errors like this one
np.nanmax(pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values)
returns "TypeError: boolean value of NA is ambiguous"
Is that expected ?
Is that expected ?
I would say yes, as nullable integers are not a NumPy type.
I agree, this is basically by design and means things are working as expected. It is not impossible that NumPy might understand this at some point, but it is unlikely to happen soon.
By using the new UInt8
, you are in a sense choosing that NaN
really is not the same as NA, and pandas does have pd.NA
now to make this distinction more clear.
@Kreol64 I opened pandas-dev/pandas#37460 to discuss this on the pandas side