Pandas stores strings (str and unicode) with dtype=object. As such, some unexpected things happen, like empty fields being filled with nan, which is a float. Expected behavior should fill with empty string "" or at least None.
>>> import pandas as pd
>>> from StringIO import StringIO
>>> pd.read_csv(StringIO('col1,col2,col3\nfoo,,bar'),dtype=str)
col1 col2 col3
0 foo NaN bar
>>> type(pd.read_csv(StringIO('col1,col2,col3\nfoo,,bar'),dtype=str).iloc[0,1])
float
I know it may seem a little strange, but pandas actually uses NaN intentionally in string arrays, as a marker for missing values. The pandas string methods will skip NaN, and likewise we use it for methods like isnull, fillna and dropna.
If you'd like to replace them with another value, try df.fillna('').
Most helpful comment
I know it may seem a little strange, but pandas actually uses NaN intentionally in string arrays, as a marker for missing values. The pandas string methods will skip NaN, and likewise we use it for methods like
isnull,fillnaanddropna.If you'd like to replace them with another value, try
df.fillna('').