It seems as this is a regression in the latest version (data.table_1.9.8), as it worked before
library(data.table)
test <- data.table(V1=factor(rep(c("a","b"), 10), levels=c("a", "b"), ordered=TRUE),
V2=rep(c("c","d", "e", "f"), 5))
test[,min(V1)] # (1)
test[,min(V1),by=V2] # (2)
(1) correctly works but (2) returns an error:
Error in gmin(V1) : min is not meaningful for factors.
(1) dispatches to base R. (2) dispatches to GForce grouping.
(You can pass verbose=TRUE to queries to get more insight.)
GForce was changed here (a1b1c083ab9e12d9dcb49b05d7053592f0409ae2) but I don't see any notes in NEWS or any tests added.
R's min treats ordered factors and non-ordered factors differently, as you nicely showed.
R --vanilla
> x = factor(letters)
> min(x)
Error in Summary.factor(1:26, na.rm = FALSE) :
‘min’ not meaningful for factors
> x = factor(letters, ordered=TRUE)
> min(x)
[1] a
26 Levels: a < b < c < d < e < f < g < h < i < j < k < l < m < n < o < ... < z
Yes I guess we should be in line with base R in this regard. Thanks for highlighting.
Is there a status update about data.table's support for summarizing ordered factors? Won't be implemented? Thx.
@mbacou as luck would have it I think the fix was trivial (#2944). Thanks for the impetus!!
In any case, you can always set options(datatable.optimize = 0) to prevent GForce from failing on ordered factors, it'll just be slower.
Excellent thanks! Was surprised to see that feature work well in e.g. PostgreSQL and not in data.table. Glad that was an easy fix.
if I would venture a guess it's only survived so long because ordered
factors are almost criminally underused as a data type (myself included) :)
On Wed, Jun 20, 2018, 11:35 PM Mel. Bacou notifications@github.com wrote:
Excellent thanks! Was surprised to see that feature work well in e.g.
PostgreSQL and not in data.table. Glad that was an easy fix.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Rdatatable/data.table/issues/1947#issuecomment-398795332,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdYWaa_HjNmI7O9h6v_LBT-125l3iks5t-mvIgaJpZM4LBgqS
.
Most helpful comment
(1) dispatches to base R. (2) dispatches to GForce grouping.
(You can pass
verbose=TRUEto queries to get more insight.)GForce was changed here (a1b1c083ab9e12d9dcb49b05d7053592f0409ae2) but I don't see any notes in NEWS or any tests added.
R's
mintreats ordered factors and non-ordered factors differently, as you nicely showed.Yes I guess we should be in line with base R in this regard. Thanks for highlighting.