Is there a reason that notnull() and isnull() consider an empty string to not be a missing value?
pd.isnull('')
False
Seems like in string data, people usually think of the empty string as "missing".
well these ARE set to nan when reading from a text file. Are you suggesting that other formats should do this (maybe by default), e.g. JSON? or in general?
you can see if changing this breaks anything and report back.
you are correct the way most imports interpret missing strings is as NaN. But if one passes an empty string to pd.notnull() or pd.isnull(), one gets back False. Just wondering if that's appropriate.
I'll check if any tests fail if I change it and get back to you.
Any updates on this issue? I've run into this problem personally and had to solve it by making my own not_null function.
As i said above I suppose you could make an option to read_json, but read_csv already does this. Generally making 0-len strings == null loses information. A missing value is not the same as a 0-length string.
Just to add my two cents, I'm not a fan of the idea of treating empty strings as null values. Personally I wouldn't even do the coercion we do at the moment (and I've used a str converter to avoid it). Since it sounds like any convention is going to annoy _someone_, I'd argue that having the default be as information-preserving as possible is the most forward-friendly.
Agreed I don't think empty string should be interpreted as NULL, especially since we have a string extension data type with NA support. Going to close this issue.
Most helpful comment
Just to add my two cents, I'm not a fan of the idea of treating empty strings as null values. Personally I wouldn't even do the coercion we do at the moment (and I've used a str converter to avoid it). Since it sounds like any convention is going to annoy _someone_, I'd argue that having the default be as information-preserving as possible is the most forward-friendly.