In [1]: import pandas as pd
In [2]: foo = pd.DataFrame({'a': [1]}).astype('Int64')
In [3]: bar = pd.DataFrame({'a': [2], 'b': [3]}).astype('Int64')
In [4]: pd.concat((foo, bar), sort=False)
Out[4]:
a b
0 1 NaN
0 2 3
In [5]: pd.concat((foo, bar), sort=False).dtypes
Out[5]:
a Int64
b object
dtype: object
As shown in the code above, pd.concat(foo, bar) adds column 'b' to foo by filling it NaN and stacks them.
In this time, I expect the column 'b' still hold Int64 because it accepts NaN but current behavior is not.
In [5]: pd.concat((foo, bar), sort=False).dtypes
Out[5]:
a Int64
b Int64
dtype: object
pd.show_versions()
In [2]: pd.show_versions()
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : 3.10.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.9.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
Thanks for the report!
This is related to https://github.com/pandas-dev/pandas/issues/22994, although that issue is about multiple extension dtype blocks with different dtypes. While here it is concatting with a non-existent block in the other frame.
I'll give this one a try.
I believe this is a duplicate of #27692; closing to keep things consolidated