Pandas: Should empty strings be null for notnull() and isnull()?

Created on 12 Aug 2015 · 6Comments · Source: pandas-dev/pandas

Is there a reason that notnull() and isnull() consider an empty string to not be a missing value?

pd.isnull('')
False

Seems like in string data, people usually think of the empty string as "missing".

API Design IO JSON Missing-data Strings

Source

nickeubank

Most helpful comment

Just to add my two cents, I'm not a fan of the idea of treating empty strings as null values. Personally I wouldn't even do the coercion we do at the moment (and I've used a str converter to avoid it). Since it sounds like any convention is going to annoy _someone_, I'd argue that having the default be as information-preserving as possible is the most forward-friendly.

dsm054 on 4 Jul 2017

👍3

All 6 comments

well these ARE set to nan when reading from a text file. Are you suggesting that other formats should do this (maybe by default), e.g. JSON? or in general?

you can see if changing this breaks anything and report back.

jreback on 15 Aug 2015

you are correct the way most imports interpret missing strings is as NaN. But if one passes an empty string to pd.notnull() or pd.isnull(), one gets back False. Just wondering if that's appropriate.

I'll check if any tests fail if I change it and get back to you.

nickeubank on 15 Aug 2015

Any updates on this issue? I've run into this problem personally and had to solve it by making my own not_null function.

moonshoes87 on 14 Apr 2017

As i said above I suppose you could make an option to read_json, but read_csv already does this. Generally making 0-len strings == null loses information. A missing value is not the same as a 0-length string.

jreback on 14 Apr 2017

dsm054 on 4 Jul 2017

👍3

Agreed I don't think empty string should be interpreted as NULL, especially since we have a string extension data type with NA support. Going to close this issue.