Seaborn: What to do with missing values in heatmaps

Created on 20 Nov 2014  路  17Comments  路  Source: mwaskom/seaborn

Running this snippet:

import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'a': [1, 1, 1],
                        'b': [2, np.nan, 2],
                        'c': [3, 3, np.nan]})

sns.heatmap(df)

We get something like this:

missing-get-small

Missing values are happily assigned "the color of the minimum" in the color bar. I wonder if this is a good default or if missing values should be treated differently. I would go for "treating them differently by default" and then the question would be how (maybe having a _missing_color_ parameter and issuing a warning if the color is in the colormap/colorbar).

plots question

Most helpful comment

As of now, you could manually mask them,

import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'a': [1, 1, 1],
                        'b': [2, np.nan, 2],
                        'c': [3, 3, np.nan]})

mask = df.isnull()
sns.heatmap(df, mask=mask)

image
(edit: added image)

Is this what you suggest should happen by default?

All 17 comments

+1

As of now, you could manually mask them,

import seaborn as sns
import numpy as np
import pandas as pd

df = pd.DataFrame(data={'a': [1, 1, 1],
                        'b': [2, np.nan, 2],
                        'c': [3, 3, np.nan]})

mask = df.isnull()
sns.heatmap(df, mask=mask)

image
(edit: added image)

Is this what you suggest should happen by default?

Yes, I think that missings are masked-out should be the default behavior. It probably would suffice to create / "or" the mask with np.isnan in __HeatMapper_.

Probably overkill, but would it make sense to also add a check that the _bad_ color in the color map does not overlap with any of the colors that can be assigned to non-missing data, and if it does to somehow show a warning? That would ensure a bit more that people are not interpreting incorrectly missings.

I agree that masking should happen by default; in fact, I don't really see how the current behaviour would ever be useful as it either doesn't affect things or produces misleading visualisations.

+1

There's a masking argument that works pretty well, but it could be improved in at least two respects. First, it could be useful to add an option that makes masked out cells the same color as the image background (this would be especially pleasing for heat maps of upper-triangular or lower-triangular matrices). Second, masks currently prevent the annot argument from being set to "True" (a mask and annot=True throws an error). Not clear why, but it must be that whatever values the mask places on masked-out cells is incompatible with what the heatmap function is allowed to display there.

+1
I agree. Masking works fine and should perhaps be default for missings. There is indeed an issue with masking when used in combination with annotations as jm-contreras mentioned.
The for loop in _annotate_heatmap should skip array values (val) that are equal to np.ma.masked.

Having looked into this a bit, the issue seems to be that plt.pcolormesh doesn't treat np.nan as "missing", and instead assigns those cells the "under" value in the colormap. E.g.:

cmap = sns.cubehelix_palette(as_cmap=True, light=.9)
sns.heatmap(flights_missing, cmap=cmap)

pwz xybmyopqlbsupejin2a3wcfmallix4rdkmsuogvuyrjxctelklsfzgxs5luruzskir1ero7jeldxmqusvix x8cv eomwcuqwaaaabjru5erkjggg

cmap.set_under(".5")
sns.heatmap(flights_missing, cmap=cmap)

kz5akpwicj4jak8 jvl5cfqxhv7ruanyc7gncz lvqsmozz7flxsgilqooyt ymc 3utmskmtflnwzingf dxwozo6tocxypckqytysuus1evm7jikdretuyrjxctelklsfzgxs5luruzskir1kf8h 1lnqm15jhcaaaaasuvork5cyii

Based on this, it's not really clear what the best thing to do in heatmap is. However, note that from a user-perspective you can use the mask parameter to be safe (But see #512):

sns.heatmap(flights_missing, cmap=cmap, mask=flights_missing.isnull())

wczboeovfjeuwaaaabjru5erkjggg

There's a masking argument that works pretty well, but it could be improved in at least two respects. First, it could be useful to add an option that makes masked out cells the same color as the image background (this would be especially pleasing for heat maps of upper-triangular or lower-triangular matrices).

@jm-contreras I think this already happens? Perhaps you mean something different:

with sns.axes_style("white"):
    sns.heatmap(flights_missing, cmap=cmap, mask=flights_missing.isnull())

bbvgncetu9iaaaaasuvork5cyii

Yes, I think that missings are masked-out should be the default behavior. It probably would suffice to create / "or" the mask with np.isnan in _HeatMapper.

@sdvillal This could work but it does raise the issue of what to do when a user supplies a mask. Take the union of the supplied mask and the null mask? Ignore the missings and just use the supplied mask? Either way feels like it will surprise people.

A broader question is why pcolormesh behaves this way and if it is intentional. Other "matrix" functions in matplotlib (e.g. imshow) treat nans as missing out of the box.

See #523, I decided to take the option advocated by @sdvillal and automatically include missing values in the mask. Sorry it took so long to get to this.

Closed with #523

Great thread. I learned a lot.

@mwaskom That's exactly what I meant. It just wasn't clear that the heat map code needed to be indented within the statement with sns.axes_style('white'). Thanks for sharing!

Hello, is there a way to display only the non-zero values for a heatmap plot with seaborn? I understand that 'annot=True' will display values for each grid point. However, I want to show only the values that are non-zero.

I am not sure you can suppress the annotation of specific values. Perhaps you can use a mask if you do not want to plot zero values at all?

mask = data == 0
ax = sns.heatmap(data, annot=True, mask=mask)

Try this:

sns.heatmap(df.isnull(),cbar=False,cmap='viridis')

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sofiatti picture sofiatti  路  4Comments

Bercio picture Bercio  路  3Comments

tritemio picture tritemio  路  3Comments

alexpetralia picture alexpetralia  路  3Comments

bondarevts picture bondarevts  路  3Comments