Pandas: read_csv() fills empty string with nan

Created on 25 May 2015  路  1Comment  路  Source: pandas-dev/pandas

Pandas stores strings (str and unicode) with dtype=object. As such, some unexpected things happen, like empty fields being filled with nan, which is a float. Expected behavior should fill with empty string "" or at least None.

>>> import pandas as pd
>>> from StringIO import StringIO
>>> pd.read_csv(StringIO('col1,col2,col3\nfoo,,bar'),dtype=str)
  col1 col2 col3
0  foo  NaN  bar
>>> type(pd.read_csv(StringIO('col1,col2,col3\nfoo,,bar'),dtype=str).iloc[0,1])
float
Missing-data Strings Usage Question

Most helpful comment

I know it may seem a little strange, but pandas actually uses NaN intentionally in string arrays, as a marker for missing values. The pandas string methods will skip NaN, and likewise we use it for methods like isnull, fillna and dropna.

If you'd like to replace them with another value, try df.fillna('').

>All comments

I know it may seem a little strange, but pandas actually uses NaN intentionally in string arrays, as a marker for missing values. The pandas string methods will skip NaN, and likewise we use it for methods like isnull, fillna and dropna.

If you'd like to replace them with another value, try df.fillna('').

Was this page helpful?
0 / 5 - 0 ratings