Seaborn: unused column breaks map_dataframe

Created on 7 Apr 2017 · 5Comments · Source: mwaskom/seaborn

Lifted straight from the docs, this works:

df = pd.DataFrame(
    data=np.random.randn(5, 4),
    columns=pd.Series(list("ABCD"), name="walk"),
    index=pd.date_range("2015-01-01", "2015-01-05", name="date"))
df = df.cumsum(axis=0).stack().reset_index(name="val")

def dateplot(x, y, **kwargs):
    ax = plt.gca()
    data = kwargs.pop("data")
    data.plot(x=x, y=y, ax=ax, grid=False, **kwargs)
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")

Adding a new, unused column to df also works:

df['val2'] = df['val']
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")

However, the second column interferes if it's NaN for a given group:

df.loc[df['walk'] == 'A', 'val2'] = np.nan
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")

The code above fails because an empty dataframe is passed to date_plot:

TypeError: Empty 'DataFrame': no numeric data to plot

The weird part is that the new column val2 is not used anywhere.

Source

ian-contiamo

Most helpful comment

(If it wasn't obvious from my answer, your example works fine with dropna=False).

mwaskom on 7 Apr 2017

👍2

All 5 comments

The error is getting raised from the pandas plotting method because it doesn't know what to do when asked to plot "empty" data. (Unlike most matplotlib functions that add a ghost artist or return without doing anything). You'll have to handle the empty data case in your dateplot function.

mwaskom on 7 Apr 2017

But my point is: why is the dataframe empty? I have added another column (val2) that is unrelated to what the function does (i.e. plot val). The data in val is not empty, only the data in val2 is. I'm not plotting or doing anything on val2.

In short: the data in an unused column seems to affect what is passed to the plotting function. I suspect there is a dropna applied to both columns, but maybe it should be a dropna(subset=[...]).

The workaround is to always drop the columns that are not explicitly needed before passing the dataframe to FacetGrid, but that seems needlessly complicated.

ian-contiamo on 7 Apr 2017

Oh right I missed the var/var2 distinction. FacetGrid has a dropna parameter that defaults to True. In FacetGrid.map missing are dropped from a reduced version of the dataframe defined by the variables that go into that facet. I believe that can't be the case for map_dataframe because the idea of that method is that dataframe variables can be identified in the kwargs, not just in the list of args.