I think we should always know how many groups there are. I've got a computation where I am applying a function by keyed groups in a data.table, and it takes a loooooog time. It'd be nice to have a progress bar for by operations so I could track the function's progress through groups in the data.table.
For example, if I split my data into a list of data.tables, I could use pbapply::pblapply to track it's progress through each group (but I'd much rather use a single data.table and by!).
e.g.
library(data.table)
x <- data.table(iris, key='Species')
x[,list(m=mean(Sepal.Width)),by='Species'] # progress bar here goes 0%, 33%, 66%, 100%
I imagine this is easiest to implement as another thing printed out by verbose = TRUE, no?
@MichaelChirico makes sense..
Yes, that makes sense
I would love to have this, as I'm using data.table on a very large dataset with 7 M rows and 1.6M keys.
@zachmayer you can do your progress bar any way you want (with new lines or without), as answered in this SO. Please close the issue it works for you.
That's perfect! For future reference, here is the solution I'm using:
{R}
library(data.table)
dt = data.table(a=1:4, b=c("a","b"))
grpn = uniqueN(dt$b)
pb <- txtProgressBar(min = 0, max = grpn, style = 3)
dt[, {setTxtProgressBar(pb, .GRP); Sys.sleep(0.5); sum(a)}, b]
close(pb)
To use @zachmayer 's method with ":=":
library(data.table)
dt = data.table(a=1:4, b=c("a","b"))
grpn = uniqueN(dt$b)
pb <- txtProgressBar(min = 0, max = grpn, style = 3)
dt[, x:={setTxtProgressBar(pb, .GRP); Sys.sleep(0.5);sum(a);}, b]
close(pb)
dt
Most helpful comment
To use @zachmayer 's method with ":=":