Seaborn: Default style for more than 6 data series

Created on 23 Jul 2018 · 8Comments · Source: mwaskom/seaborn

Hi all,

Just using the lineplot example but extending this to 8 series:

rs = np.random.RandomState(365)
values = rs.randn(365, 8).cumsum(axis=0)
dates = pd.date_range("1 1 2016", periods=365, freq="D")
data = pd.DataFrame(values, dates, columns=["A", "B", "C", "D", "E", "F", "G", "H"])
data = data.rolling(7).mean()

sns.lineplot(data=data, palette="tab10", linewidth=2.5)

I get this exception, which seems to indicate that I need to specify styles for data series beyond the 6th.

ValueError                                Traceback (most recent call last)
<ipython-input-12-272055c27972> in <module>()
----> 1 sns.lineplot(data=data, palette="tab10", linewidth=2.5)

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size
_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, s
ort, err_style, err_kws, legend, ax, **kwargs)
   1076         dashes=dashes, markers=markers, style_order=style_order,
   1077         units=units, estimator=estimator, ci=ci, n_boot=n_boot,
-> 1078         sort=sort, err_style=err_style, err_kws=err_kws, legend=legend,
   1079     )
   1080

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
__init__(self, x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes
, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_b
oot, sort, err_style, err_kws, legend)
    670         self.parse_hue(plot_data["hue"], palette, hue_order, hue_norm)
    671         self.parse_size(plot_data["size"], sizes, size_order, size_norm)

--> 672         self.parse_style(plot_data["style"], markers, dashes, style_orde
r)
    673
    674         self.units = units

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
parse_style(self, data, markers, dashes, order)
    492
    493             dashes = self.style_to_attributes(
--> 494                 levels, dashes, self.default_dashes, "dashes"
    495             )
    496

C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
style_to_attributes(self, levels, style, defaults, name)
    303             if any(missing_levels):
    304                 err = "These `style` levels are missing {}: {}"
--> 305                 raise ValueError(err.format(name, missing_levels))
    306
    307         return attrdict

ValueError: These `style` levels are missing dashes: {'H', 'G'}

Is there a way to automatically set style for large datasets?

[Edit] Seems like it's because relational.py:30 defines 6 default dashes. Setting sns.lineplot(data=data, dashes=False) fixes the issue.

Source

stevenwong

Most helpful comment

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

ventilator on 5 Apr 2019

👍10 🎉2

All 8 comments

So there's a few related but distinguishable questions here:

Should there be more than 6 default dash styles?
What should happen when there are more than style levels than dash patterns?
Given the constraints on the style semantic, should it be applied by default to "wide-form" data (where you haven't explicitly asked for it?

This kind of issue is what I was talking about in the release notes when I said "default behavior may change" because i'm interested in hearing what people find surprising or annoying.

My current thoughts are

I don't think there have to be exactly 6, but that's about the number that I felt could be reliably distinguished in a variety of plots. Users can specify larger dash sets that are tailored to their specific visualizations, but I'm not wild about having defaults that don't really provide useful information.
Things in matplotlib cycle, but things in seaborn generally don't. (i.e. with hue you'll always get unique colors). #1511 suggested cycling different numbers of dashes/markers sets to get a relatively large number of unique combinations. That's a clever suggestion, but I'm not sure it's the best approach because for most datasets the dashes and markers go together and it might be confusing that at some point they become independent. So I'm open to better ideas but it seemed best to start with the most disruptive response (raising an exception) and then possibly scaling back than going in the opposite direction.
This is just a balance between "by default make maximally accessibly plots" and "by default try not to raise in a confusing way". I'm ambivalent and could be persuaded either way. Unfortunately the logic of how the functions work make it a little difficult to defer on whether there should be a style semantic until we know how many style levels are needed.

mwaskom on 23 Jul 2018

Apologies I didn't notice the other issue. For my purposes, I didn't need dashes so I can just turn it off, but I think at least a more informative error message would be nice.

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount? Or turning dashes off by default so that users won't be surprised with an error if they are just trying out a generic plot.

stevenwong on 24 Jul 2018

Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount?

It's possible in principle but I'm not convinced it's a good idea. So e.g. you can programmatically generate markers with an arbitrary number of sides. But it's really hard to discriminate exactly how many sides the markers have above ~5. And once you get above ~7 it's very hard to tell that it's a polygon and not a circle. Similarly, you could programmatically generate dashes with slightly longer and longer solid segments, but for most plots it will be impossible to tell which is which.

These points depends on things like size and density, so it's possible to make a custom plot that works. But I don't think it's a great approach to defaults.

mwaskom on 24 Jul 2018

If somebody is trying to add additional styles to default 6 styles:

dash_styles = ["",
               (4, 1.5),
               (1, 1),
               (3, 1, 1.5, 1),
               (5, 1, 1, 1),
               (5, 1, 2, 1, 2, 1),
               (2, 2, 3, 1.5),
               (1, 2.5, 3, 1.2)]

sns.relplot(...,  dashes=dash_styles,...)

Styles tuple must have even number of elements (segment, gap)

jemshit on 30 Mar 2019

👍10 🎉2

ventilator on 5 Apr 2019

👍10 🎉2

I think the error message is also a little unintuitive. Perhaps it could be changed to something clearer, e.g.

err = "These `style` levels do not have defaults {}: {}"

instead of

err = "These `style` levels are missing {}: {}"

At first, I thought the categorical column values were missing dashes in the strings, which was quite confusing.

An alternative possibility to this issue is to wrap around the dash_styles (e.g. index 6 becomes 0, 7 becomes 1, etc.) as long as a UserWarning is given.

craymichael on 28 Jun 2019

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:

ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}

Any other ideas?

alemol on 7 Feb 2020

👍2

sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html

I tried this but it didn't work (for lineplot):

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)

produced:

even worst

ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:
ValueError                                Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
      6 ax = sns.lineplot(x="iteration", y="kappa",
      7                   hue="r_state", style="r_state", markers=filled_markers,
----> 8                   data=df)
      9 
     10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])

3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
    307             if any(missing_levels):
    308                 err = "These `style` levels are missing {}: {}"
--> 309                 raise ValueError(err.format(name, missing_levels))
    310 
    311         return attrdict

ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}
Any other ideas?