Hi all,
Just using the lineplot example but extending this to 8 series:
rs = np.random.RandomState(365)
values = rs.randn(365, 8).cumsum(axis=0)
dates = pd.date_range("1 1 2016", periods=365, freq="D")
data = pd.DataFrame(values, dates, columns=["A", "B", "C", "D", "E", "F", "G", "H"])
data = data.rolling(7).mean()
sns.lineplot(data=data, palette="tab10", linewidth=2.5)
I get this exception, which seems to indicate that I need to specify styles for data series beyond the 6th.
ValueError Traceback (most recent call last)
<ipython-input-12-272055c27972> in <module>()
----> 1 sns.lineplot(data=data, palette="tab10", linewidth=2.5)
C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
lineplot(x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes, size
_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_boot, s
ort, err_style, err_kws, legend, ax, **kwargs)
1076 dashes=dashes, markers=markers, style_order=style_order,
1077 units=units, estimator=estimator, ci=ci, n_boot=n_boot,
-> 1078 sort=sort, err_style=err_style, err_kws=err_kws, legend=legend,
1079 )
1080
C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
__init__(self, x, y, hue, size, style, data, palette, hue_order, hue_norm, sizes
, size_order, size_norm, dashes, markers, style_order, units, estimator, ci, n_b
oot, sort, err_style, err_kws, legend)
670 self.parse_hue(plot_data["hue"], palette, hue_order, hue_norm)
671 self.parse_size(plot_data["size"], sizes, size_order, size_norm)
--> 672 self.parse_style(plot_data["style"], markers, dashes, style_orde
r)
673
674 self.units = units
C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
parse_style(self, data, markers, dashes, order)
492
493 dashes = self.style_to_attributes(
--> 494 levels, dashes, self.default_dashes, "dashes"
495 )
496
C:\Program Files\Anaconda3\envs\py35\lib\site-packages\seaborn\relational.py in
style_to_attributes(self, levels, style, defaults, name)
303 if any(missing_levels):
304 err = "These `style` levels are missing {}: {}"
--> 305 raise ValueError(err.format(name, missing_levels))
306
307 return attrdict
ValueError: These `style` levels are missing dashes: {'H', 'G'}
Is there a way to automatically set style for large datasets?
[Edit] Seems like it's because relational.py:30 defines 6 default dashes. Setting sns.lineplot(data=data, dashes=False) fixes the issue.
So there's a few related but distinguishable questions here:
style semantic, should it be applied by default to "wide-form" data (where you haven't explicitly asked for it?This kind of issue is what I was talking about in the release notes when I said "default behavior may change" because i'm interested in hearing what people find surprising or annoying.
My current thoughts are
hue you'll always get unique colors). #1511 suggested cycling different numbers of dashes/markers sets to get a relatively large number of unique combinations. That's a clever suggestion, but I'm not sure it's the best approach because for most datasets the dashes and markers go together and it might be confusing that at some point they become independent. So I'm open to better ideas but it seemed best to start with the most disruptive response (raising an exception) and then possibly scaling back than going in the opposite direction.Apologies I didn't notice the other issue. For my purposes, I didn't need dashes so I can just turn it off, but I think at least a more informative error message would be nice.
Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount? Or turning dashes off by default so that users won't be surprised with an error if they are just trying out a generic plot.
Rather than "randomly" cycling through dash/markers, what if you define a more predictable pattern? Co-vary the two by some amount?
It's possible in principle but I'm not convinced it's a good idea. So e.g. you can programmatically generate markers with an arbitrary number of sides. But it's really hard to discriminate exactly how many sides the markers have above ~5. And once you get above ~7 it's very hard to tell that it's a polygon and not a circle. Similarly, you could programmatically generate dashes with slightly longer and longer solid segments, but for most plots it will be impossible to tell which is which.
These points depends on things like size and density, so it's possible to make a custom plot that works. But I don't think it's a great approach to defaults.
If somebody is trying to add additional styles to default 6 styles:
dash_styles = ["",
(4, 1.5),
(1, 1),
(3, 1, 1.5, 1),
(5, 1, 1, 1),
(5, 1, 2, 1, 2, 1),
(2, 2, 3, 1.5),
(1, 2.5, 3, 1.2)]
sns.relplot(..., dashes=dash_styles,...)
Styles tuple must have even number of elements (segment, gap)
sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html
I think the error message is also a little unintuitive. Perhaps it could be changed to something clearer, e.g.
err = "These `style` levels do not have defaults {}: {}"
instead of
err = "These `style` levels are missing {}: {}"
At first, I thought the categorical column values were missing dashes in the strings, which was quite confusing.
An alternative possibility to this issue is to wrap around the dash_styles (e.g. index 6 becomes 0, 7 becomes 1, etc.) as long as a UserWarning is given.
sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.html
I tried this but it didn't work (for lineplot):
ax = sns.lineplot(x="iteration", y="kappa",
hue="r_state", style=True, markers=filled_markers,
data=df)
produced:

even worst
ax = sns.lineplot(x="iteration", y="kappa",
hue="r_state", style="r_state", markers=filled_markers,
data=df)
causes:
ValueError Traceback (most recent call last)
<ipython-input-80-ee157b4ed38e> in <module>()
6 ax = sns.lineplot(x="iteration", y="kappa",
7 hue="r_state", style="r_state", markers=filled_markers,
----> 8 data=df)
9
10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)])
3 frames
/usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name)
307 if any(missing_levels):
308 err = "These `style` levels are missing {}: {}"
--> 309 raise ValueError(err.format(name, missing_levels))
310
311 return attrdict
ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}
Any other ideas?
sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')
and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)
Source: https://matplotlib.org/2.0.2/api/markers_api.htmlI tried this but it didn't work (for lineplot):
ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style=True, markers=filled_markers, data=df)produced:
even worst
ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=filled_markers, data=df)
causes:ValueError Traceback (most recent call last) <ipython-input-80-ee157b4ed38e> in <module>() 6 ax = sns.lineplot(x="iteration", y="kappa", 7 hue="r_state", style="r_state", markers=filled_markers, ----> 8 data=df) 9 10 plt.legend(labels=['random initialization {}'.format(i) for i in range(n_randstates)]) 3 frames /usr/local/lib/python3.6/dist-packages/seaborn/relational.py in style_to_attributes(self, levels, style, defaults, name) 307 if any(missing_levels): 308 err = "These `style` levels are missing {}: {}" --> 309 raise ValueError(err.format(name, missing_levels)) 310 311 return attrdict ValueError: These `style` levels are missing dashes: {8, 9, 6, 7}Any other ideas?
For sns.lineplot, I'm just dynamically disabling markers if there would be more than six; example:
ax = sns.lineplot(x="iteration", y="kappa", hue="r_state", style="r_state", markers=len(df['r_state'].drop_duplicates()) <= 6, data=df)
Most helpful comment
sns.scatterplot with a dataframe with more than 6 columns of data has the same problem, it runs out of markers.
A workaround for 15 markers is to define
filled_markers = ('o', 'v', '^', '<', '>', '8', 's', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')and set markers to filled_markers
sns.scatterplot(data=df, markers=filled_markers)Source: https://matplotlib.org/2.0.2/api/markers_api.html