Pandas: Include missing data count in pd.Dataframe.describe method

Created on 30 Jun 2018  路  6Comments  路  Source: pandas-dev/pandas

Code Sample

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.describe()

Problem description

  • Output
    image

The describe method generally only include 9 summary statistics(count, mean, std, min, 25%, 50%, 75%, max, missing) but no missing count which is very import in realworld data analysis.

To include missing count I have to use the following code,

d = {'col1': [1, np.nan], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
des1 = df.describe()
des2 = df.isnull().sum().to_frame(name = 'missing').T
pd.concat([des1, des2])

And the output
image

Expected Output

Expect include missing count in describe method.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Enhancement Missing-data

Most helpful comment

Agree, this is default behavior of R's summary(df) function for obvious reasons. More useful than sd anyway.

All 6 comments

@77QingLiu : Are you proposing that we mix together some of the output of DataFrame.info() (this gives you non-null info) and DataFrame.describe()?

@gfyoung , Yes, Exactly

cc @jreback @jorisvandenbossche

Include missing data count in pd.Dataframe.describe() is definitely necessary.

count is the non missing length
so i guess you could add length (or size) is what we call it

Agree, this is default behavior of R's summary(df) function for obvious reasons. More useful than sd anyway.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

songololo picture songololo  路  3Comments

mfmain picture mfmain  路  3Comments

scls19fr picture scls19fr  路  3Comments

matthiasroder picture matthiasroder  路  3Comments

Ashutosh-Srivastav picture Ashutosh-Srivastav  路  3Comments