Previously discussed (some lists are from @chriddyp) :
A groupbytransform should split apart traces as per unique values or bins of the groupby dimension. Example:
groupby: ['a', 'b', 'a', 'b']
x: [1, 2, 1, 2]
y: [10, 20, 30, 40]
should generate two traces:
trace 1:
x: [1, 2]
y: [10, 20]
trace 2:
x: [1, 2]
y: [30, 40]
groupby as a means of splitting spatially and/or aestheticallydatetime stringsdatetime strings) in the groupby attribute, reusing logic of the preexisting plotly algorithm for histograms
Functional aspects:
groupby needs to work across numbers, dates, and categories (@chriddyp in the JS context, meaning strings, correct?)groupby needs to split across all of the arrays or array-like specifications in a trace, not just x and y. For example, marker.color or marker.line.color. Not all array-like specifications in a trace are actual arrays (consider colorscale)
transform:
groupby: ['a', 'b', 'a', 'b']
marker:
color:
a: 'blue'
b: 'red'
transforms and API. That's OK - transforms was made for groupbygroupby, and the related animation split use (see below) need to be in the JSON format for serializability, fitting in the current declarative structuregroupby must work in the restyle and relayout steps, not just the initial plot stepgd.data is expected to preserve the single trace and the groupby spec as the user supplied, and _fullData on the other hand has the individual (spllt) traces and no longer has the groupby attribute_fullData back to groups or styles in data. Styling controls will be populated with the defaults from _fullData (e.g. _fullData[4].marker.color) but they’ll need to update the attributes in the data object (e.g. data[0].transform.marker.color.d). That’s because we serialize and save data, not _fullData.Related PR, containing the initial, analogous filter work by @timelyportfolio : https://github.com/plotly/plotly.js/pull/859
groupby: https://github.com/plotly/plotly.js/blob/master/test/jasmine/assets/transforms/groupby.js
groupby coverage of the initial sprintgroupby such as x and y but not all at once - HOWEVER the preferred solution aims for generality because other transforms will need to use a similar approach e.g. filter, and future arraylike attributes should be covered without code coupling to transformations (consequence: we'll have to check if there's enough attribute metadata that allows us to tell if it's arraylike, or we need further metadata; also, whether there's a programmatic way of separating arraylike data e.g. colorscale that's not represented as an array at input, otherwise we need to handle them attribute by attribute (we'll have to come back to this topic after a first round of work).x, y, marker.color, marker.size (scatter, bar, histogram, box)lat, lon (maps), a, b, c (ternary), ‘z’ (scatter3d), error_y.arrayIt is expected that the trace separation (and transformations in general) is being performed in the supply defaults step.
Instead of generating n different paths as described above, plotly would arrive at a temporal sequence of n frames
A quick update on progress:
As styling can be hierarchical, such as `{marker: {line: {color: "cyan"}}} and users already make a big investment learning about them, and in addition, we seek to avoid property-by-property handling (attribute metadata extension or manual additions) of styles, we agreed that the styling defs for groups would look as normal. Here's an example:
var mockData02 = [{
mode: 'markers',
x: [1, -1, -2, 0, 1, 2, 3],
y: [0, 1, 2, 3, 4, 5, 6],
transforms: [{
type: 'groupby',
groups: ['a', 'a', 'b', 'a', 'b', 'b', 'a'],
styles: {
a: {
marker: {
color: "orange",
size: 20,
line: {
color: "red",
width: 1
}
}
},
b: {
// heterogeonos attributes are OK:
// group "a" needn't define e.g .`mode` if defaults are alright
mode: "markers+lines",
marker: {
color: "cyan",
size: 15,
line: {
color: "purple",
width: 4
},
opacity: 0.5,
symbol: "triangle-up"
},
line: {
width: 1,
color: "purple"
}
}
}
}]
}];
This is how the result looks like, OK it's decidedly outré but serves the point:

The benefit of the solution is that
groupby)Its drawback stems from the same properties:
As in the related PR, one additional note that, in general, scatter traces can now have ids in addition to x and y data arrays, which can be very useful for these sorts of operations.
@etpinard @rreusser Here's another example, for these things:
var mockData03 = [{
mode: 'markers',
x: [1, -1, -2, 0, 1, 2, 3],
y: [0, 1, 2, 3, 5, 4, 6],
marker: {
color: "darkred", // general "default" color
line: {
width: 8,
// a general, not overridden array will be interpreted per group
color: ["orange", "red", "green", "cyan"]
}
},
transforms: [{
type: 'groupby',
groups: ['a', 'a', 'b', 'a', 'b', 'b', 'a'],
styles: {
a: {marker: {size: 30}, mode: "markers+lines"},
b: {marker: {size: 15, color: "lightblue"}, mode: "markers+lines"} // override general color
}
}]
}];
Result:

I like it. Transforms in general are kinda free-form and extremely flexible, which means it's probably good to develop a set of conventions (like styles the way you've defined it) so that it's clear how to write a new transform that conforms to the conventions used in the rest of the transforms.
@monfera your API looks great.
I'd vote for transforms[i].style instead of transforms[i].styles as we like to keep plurals for Array containers.
One thing that we should attempt to handle better is the findArrayAttributes step. What we need to do is something similar to what Plotly.PlotSchema.get() does here where it looks for data_array and arrayOk attributes (which e.g. correctly skips over colorscale and domain) by looking into the fullData[i]._module.attributes
The more I think about it the more I think finding the list of all data_array + arrayOk attributes in a given trace will be very common to almost all transforms (including possible transforms written by community users). So I suggest we should find that list somewhere in plots.js and pass it to as an argument to the transform methods here.
@etpinard @rreusser Do I understand that anything that's data_array and arrayOk must split by group just like x and y now? I.e. is it the only condition? I'd have thought there are array attributes that represent some value extent [from, to] or whatever in an array such that they must not be split by groupby trace.
Assuming the answer is yes: probably I can make (or plug into) code that crawls the entire set of attributes and distinguish between splittable arrays and non-splittable arrays. But there's the issue that the attribute tree can differ by plot type, and according to other values. I'm concerned that some attribute locations in a mother of all JSON attribute dictionary will be group-splitting arrays under some circumstances and non-splitting arrays under others.
Do I understand that anything that's data_array and arrayOk must split by group just like x and y now?
Yes. When an arrayOk attribute is set to an array, it should be interpreted as per-datum specifications (e.g. just like ids[i] that @rreusser mentioned earlier).
But there's the issue that the attribute tree can differ by plot type, and according to other values
That's correct. The list of data_array + arrayOk attribute should be given per plot-type.
@etpinard Awesome, thanks! With this answer, @rreusser's answers and your examples I feel there's enough nooks and crannies to continue rock climbing :-)
Climb on!

Most helpful comment
closed in https://github.com/plotly/plotly.js/pull/936 and https://github.com/plotly/plotly.js/pull/978