Lifted straight from the docs, this works:
df = pd.DataFrame(
data=np.random.randn(5, 4),
columns=pd.Series(list("ABCD"), name="walk"),
index=pd.date_range("2015-01-01", "2015-01-05", name="date"))
df = df.cumsum(axis=0).stack().reset_index(name="val")
def dateplot(x, y, **kwargs):
ax = plt.gca()
data = kwargs.pop("data")
data.plot(x=x, y=y, ax=ax, grid=False, **kwargs)
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")
Adding a new, unused column to df also works:
df['val2'] = df['val']
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")
However, the second column interferes if it's NaN for a given group:
df.loc[df['walk'] == 'A', 'val2'] = np.nan
g = sns.FacetGrid(df, col="walk", col_wrap=2, size=3.5)
g = g.map_dataframe(dateplot, "date", "val")
The code above fails because an empty dataframe is passed to date_plot:
TypeError: Empty 'DataFrame': no numeric data to plot
The weird part is that the new column val2 is not used anywhere.
The error is getting raised from the pandas plotting method because it doesn't know what to do when asked to plot "empty" data. (Unlike most matplotlib functions that add a ghost artist or return without doing anything). You'll have to handle the empty data case in your dateplot function.
But my point is: why is the dataframe empty? I have added another column (val2) that is unrelated to what the function does (i.e. plot val). The data in val is not empty, only the data in val2 is. I'm not plotting or doing anything on val2.
In short: the data in an unused column seems to affect what is passed to the plotting function. I suspect there is a dropna applied to both columns, but maybe it should be a dropna(subset=[...]).
The workaround is to always drop the columns that are not explicitly needed before passing the dataframe to FacetGrid, but that seems needlessly complicated.
Oh right I missed the var/var2 distinction. FacetGrid has a dropna parameter that defaults to True. In FacetGrid.map missing are dropped from a reduced version of the dataframe defined by the variables that go into that facet. I believe that can't be the case for map_dataframe because the idea of that method is that dataframe variables can be identified in the kwargs, not just in the list of args.
Got it, thanks!
(If it wasn't obvious from my answer, your example works fine with dropna=False).
Most helpful comment
(If it wasn't obvious from my answer, your example works fine with
dropna=False).