I'm making categorical histograms (with and without z and histfunc: sum). I'd like to be able to sort these histograms in descending by value order without using transforms.
Maybe this would fit under categoryorder: 'value ascending' and categoryorder: 'value descending'? https://github.com/plotly/plotly.js/pull/419/files#diff-0e41c2e162564438ff091d0ed6b5b455R472
I suppose similarly with stacked bar charts. in this case, it'd be nice to keep the groups (grouping the bars by color) but sort within each group. the sort could be the sum of all bars _or_ the bars within that trace.

Maybe this would fit under
categoryorder: 'value ascending'andcategoryorder: 'value descending'
It _could_, but maybe we want this sorting-by-value feature for non-category axes also?
What would it mean to sort a non-category (ie continuous) axis?
maybe we want this sorting-by-value feature for non-category axes also
I can't think of any use case for this off the top of my head
In the presence of multiple traces, one should be able to specify how to aggregate the values (sum, mean, max, median, etc...). This will require a new attribute. @nicolaskruchten can you remind me the name you suggested for this attribute.
I don't think I had a good name for it :) categoryorderfunc ?
Please note that this will need to apply not only across traces but across all values of the same category (i.e. you can have multiple data rows at the same category within a trace too!)
In branch sort-by-value (compare to master), I use the values obtained from calc to create new sorted categorical axes that are then used to perform a second calc operation.
Unfortunately, running calc again is not optimal, but it is necessary at the moment since the calc routine is sometimes coupled to the axis object (for example in histogram, it is used for the autobinning logic). Note that this second calc call will only happen if categoryorder is "value (de|a)scending".
cc @plotly/plotly_js, @alexcjohnson
Very nicely done @antoinerg
I don't think we can implement this w/o calling _module.calc twice, so :ok_hand: . We should make sure though that we _only_ call _module.calc twice on subplots with categoryorder: value (de|a)scending" axes (hint: try using gd._fullLayout.xaxis._traceIndices).
Looks like the trickiest part of this PR now becomes this block:
for(var k = 0; k < cd.length; k++) {
if(type === 'scatter') {
categoriesValue[cd[k].x][1] += cd[k].y;
} else if(type === 'histogram') {
categoriesValue[cd[k].p][1] += cd[k].s;
}
}
which will be hard to generalize. Note that categoryorder works also on polar, carpet and gl3d (which doesn't use gd.calcdata) axes. Note also that "key" in questions depends not only on the trace type but also on the axis letter. For example,
if(type === 'scatter') {
categoriesValue[cd[k].x][1] += cd[k].y;
}
works when the x-axis has a set categoryorder, but we should have:
if(type === 'scatter') {
categoriesValue[cd[k].y][1] += cd[k].x;
}
for y-axes with categoryorder.
Quick question: for horizontal bars, when ordering categories by ascending value, should we have the biggest value at the top or the bottom? In the figure below, the biggest value is at the top. I think I prefer the opposite but I'm not completely sure either. What do you think?

should we have the biggest value at the top or the bottom?
You could say the same about category (ascending|descending) - ascending will put a at the bottom and e at the top so they're actually in reverse alphabetical order. In fact you could even make that argument about the order based on data in the trace.
I guess with a categorical Y axis you generally do read the graph from top to bottom, as opposed to numerical axes that you almost always read bottom to top. But making that the default behavior I think has much more extensive consequences than just choosing it for the new feature. So unless we're prepared to alter the rest of that behavior, I think ascending has to put the biggest at the top.
Looks like the trickiest part of this PR now becomes this block:
for(var k = 0; k < cd.length; k++) { if(type === 'scatter') { categoriesValue[cd[k].x][1] += cd[k].y; } else if(type === 'histogram') { categoriesValue[cd[k].p][1] += cd[k].s; } }which will be hard to generalize.
Yes, indeed. What trace types should initially be supported?
@nicolaskruchten What aggregation do we want to support initially? Would sum, min and max be sufficient?
Sum/min/max is a fine start for this release, yes!
Adding avg can be done in one pass also though no?
Re trace types I would say bar and histogram are quite important
What trace types should initially be supported?
ALL 2d cartesian please.
ALL 2d cartesian please.
Ok, then I will need to inspect the output of calc data for each traces.
Looks like the trickiest part of this PR now becomes this block:
for(var k = 0; k < cd.length; k++) { if(type === 'scatter') { categoriesValue[cd[k].x][1] += cd[k].y; } else if(type === 'histogram') { categoriesValue[cd[k].p][1] += cd[k].s; } }which will be hard to generalize.
To support all traces, we could either have a long list of else if in src/plots/plots.js or we could also add a new field with a standardized name to each trace's calcdata.
To be continued
add a new field with a standardized name to each trace's calcdata.
sounds like the winner to me! To be computed & added only when necessary.
sounds like the winner to me! To be computed & added only when necessary.
Good, I also think it's a winner. histogram already has x and y and all traces on cartesian axes should probably have them. I could simply remove the distinction in commit https://github.com/plotly/plotly.js/commit/d53e52cb78078de433ebaa786e8d9a806846a0a1 and it just works.
I now need to write tests to ensure all cartesian traces have x y in their calcdata and compute/add it when necessary in the ones that don't.
Thanks @alexcjohnson for pointing me in the right direction!
Ok, I made good progress in branch sort-by-value (compare to master), all 2d cartesian traces work except for:
scatterglcontourcontourcarpetheatmap which work but is buggy because of issue https://github.com/plotly/plotly.js/issues/1097To be continued
Awesome work @antoinerg !
Would you be interested in trying to fix #1097 at some point this week? Sounds like fixing that bug and adding some custom logic for scattergl should be enough to :white_check_mark: this feature. I'm not sure if contourcarpet traces even work on category axes, I'm not sure if they _should_ work either :smirk:
I'm not sure if
contourcarpettraces even work on category axes, I'm not sure if they _should_ work either
Ok good to know. They are indeed a bit different and I wasn't sure how to tackle them on Friday :) I'll focus on scattergl and fixing #1097!
Note: if we add median as a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)
Does the current design leave room for a future way of saying "sort by the values of trace X" ?
Does the current design leave room for a future way of saying "sort by the values of trace X" ?
How "future"? Does the data/traces split count? 馃槈
less future than that :) ... I guess if we added "first" and "last" and expected people to reorder their traces to match that'd be a workaround. But it would be nice to maybe sort by the nth?
Does the current design leave room for a future way of saying "sort by the values of trace X" ?
Thank you for the comment @nicolaskruchten. It will be easy to include in the code: when collecting values associated with each category, we could loop only over the traces we care about at this line: https://github.com/plotly/plotly.js/compare/sort-by-value#diff-ad4f76ccd6044ed16514297078e13b84R2855.
As for how it should be specified via attributes, I am not sure yet. In the end, we need to deduce the indices of the traces we want to consider. So maybe an array of trace indices? :thinking:
Just leaving a note here: I assume that this sorting will take into account matching axes, right? I.e. if I have two traces on separate subplots where one's axis matches the other, sorting will apply to both and take both into account?
question re sorting within/across traces...
For a chart like this:

what should sort by "max" do? the height of the blues implies "p2, p1" but the maximum individual bars ignoring trace-membership implies "p1,p2"
Note: if we add
medianas a possible sort ordering we get stuff like Ridgeplots almost for free, which is really really nice :)
As discussed in private, adding support for median or say mean raises the following question in the presence of multiple traces: do you want the mean of the means or the mean of all values. We probably will need to support both.
wanted to braindump part of a conversation I had with @antoinerg here. It's not critical to resolve this for 1.48 but I want to make sure we don't lock ourselves out of a more complete API.
It seems that in categoryorder = "value (a|de)scending" mode there are at least 3 extra pieces of information we need to specify the flavours of ordering I can imagine:
The motivation for 1 is that I might want to sort the axis by a single trace, and we'd likely want to specify a way of breaking ties, so in extremis we'd want to allow an exhaustive ordered list, and we'd also want to specify "no among-trace order preference".
The motivation for 2 is that in the case of grouped bars at least, I might want to sort by the height of the maximum stack, rather than the maximum single element.
The motivation for 3 is pretty clear: we need to aggregate somehow.
Taking just the case of 3 and 2 together, we might imagine an attr that takes two functions that interact such that my max-stack idea would be "max sum". In that case, the options that @antoinerg already implemented already work nicely as "max max", "min min" and "sum sum". It would also allow us to do something like "mean median" to order grouped box-plots in a particular way. Not all options make sense, clearly, such as "min max" or "mean mean" or "median median" so we could imagine a world where we enumerate the options perhaps.
So in terms of a half-baked attempt at an API, we could accept two attrs: a composite aggregation function that accepts "sum", "max sum" and friends (default=sum?), and a trace-scope array that defaults to [] meaning "no preference" but accepts an ordered list of ... trace.uids? trace indices?
I'll leave the comment above as-is but upon further reflection, I think that point 1 is maybe interesting but doesn't allow you to sort by 'trace 1 with ties broken by trace 3'. For that we need a modification on point 3.
Elaborating: sort by max of per-trace sum of values of trace 1 is fine, but sort by max of per-trace sum of values of traces 1 and 3 isn't the same as sort by the sum of the values of trace 1, breaking ties with the sum of values of trace 3 so simply adding a trace-scoping operator won't be enough.
Maybe the 'outer' operator needs to be able to say not only sum or max or whatever but also [trace 1, trace 3] so you could say sort by [select trace 1, trace 3] of sum of values.
@antoinerg can you open a new issue about up-coming category* improvements?
Most helpful comment
wanted to braindump part of a conversation I had with @antoinerg here. It's not critical to resolve this for 1.48 but I want to make sure we don't lock ourselves out of a more complete API.
It seems that in
categoryorder = "value (a|de)scending"mode there are at least 3 extra pieces of information we need to specify the flavours of ordering I can imagine:The motivation for 1 is that I might want to sort the axis by a single trace, and we'd likely want to specify a way of breaking ties, so in extremis we'd want to allow an exhaustive ordered list, and we'd also want to specify "no among-trace order preference".
The motivation for 2 is that in the case of grouped bars at least, I might want to sort by the height of the maximum stack, rather than the maximum single element.
The motivation for 3 is pretty clear: we need to aggregate somehow.
Taking just the case of 3 and 2 together, we might imagine an attr that takes two functions that interact such that my max-stack idea would be "max sum". In that case, the options that @antoinerg already implemented already work nicely as "max max", "min min" and "sum sum". It would also allow us to do something like "mean median" to order grouped box-plots in a particular way. Not all options make sense, clearly, such as "min max" or "mean mean" or "median median" so we could imagine a world where we enumerate the options perhaps.
So in terms of a half-baked attempt at an API, we could accept two attrs: a composite aggregation function that accepts
"sum","max sum"and friends (default=sum?), and a trace-scope array that defaults to[]meaning "no preference" but accepts an ordered list of ...trace.uids? trace indices?