Numpy: Nullable integers conversion

Created on 28 Oct 2020  路  3Comments  路  Source: numpy/numpy

Hi,

Recently pandas introduced nullable integer dtype:
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

I would expect numpy to automatically convert such arrays into a float type and fill with np.nan. However, for some reason it converts it to object:

pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values.dtype

which leads to errors like this one

np.nanmax(pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values)

returns "TypeError: boolean value of NA is ambiguous"

Is that expected ?

All 3 comments

Is that expected ?

I would say yes, as nullable integers are not a NumPy type.

I agree, this is basically by design and means things are working as expected. It is not impossible that NumPy might understand this at some point, but it is unlikely to happen soon.

By using the new UInt8, you are in a sense choosing that NaN really is not the same as NA, and pandas does have pd.NA now to make this distinction more clear.

@Kreol64 I opened pandas-dev/pandas#37460 to discuss this on the pandas side

Was this page helpful?
0 / 5 - 0 ratings