Numpy: Nullable integers conversion

Created on 28 Oct 2020 · 3Comments · Source: numpy/numpy

Hi,

Recently pandas introduced nullable integer dtype:
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html

I would expect numpy to automatically convert such arrays into a float type and fill with np.nan. However, for some reason it converts it to object:

pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values.dtype

which leads to errors like this one

np.nanmax(pd.DataFrame({'col': [1, np.nan, 3]}).astype('UInt8').values)

returns "TypeError: boolean value of NA is ambiguous"

Is that expected ?

Source

Kreol64

All 3 comments

Is that expected ?

I would say yes, as nullable integers are not a NumPy type.

charris on 28 Oct 2020

👍1

I agree, this is basically by design and means things are working as expected. It is not impossible that NumPy might understand this at some point, but it is unlikely to happen soon.

By using the new UInt8, you are in a sense choosing that NaN really is not the same as NA, and pandas does have pd.NA now to make this distinction more clear.