I've noticed a significant drop in data.table performance after upgrading to R 3.5.
For example:
set.seed(1)
types <- c("A", "B", "C", "D", "E", "F")
obs <- 4e7
dt1 <- data.table(percent = round(runif(obs, min = 0, max = 1), digits = 2),
type = as.factor(sample(types, obs, replace = TRUE)))
microbenchmark::microbenchmark(
test1 <- one[, list(percent_total = sum(percent)), by = type]
, times = 30)
On 3.5 gives me:
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 522.53 539.668 576.2352 572.8301 602.0069 677.892 30
>sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.5.0 tools_3.5.0
But on 3.4.3 it's ~20% faster:
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 415.5938 445.8137 477.6562 470.0536 499.8974 576.9203 30
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.4.3 tools_3.4.3
Thank you for reporting. Confirmed on latest devel.
Smaller example.
library(data.table)
set.seed(1)
types <- c("A", "B", "C", "D", "E", "F")
obs <- 4e7
dt1 <- data.table(percent = round(runif(obs, min = 0, max = 1), digits = 2),
type = as.factor(sample(types, obs, replace = TRUE)))
system.time(dt1[, list(percent_total = sum(percent)), by = type])
3.4.4 timings are same as in 3.4.3.
3.5.1 timings are same as in 3.5.0.
I tested latest devel on 3.5.1 and 3.4.4 and there is no performance difference anymore. It is likely to be fixed by recent rework of many C functions. Please re-open if issue is still valid for you on latest devel.
@jangorecki
Unfortunately, the issue is still valid. After rerunning tests (with 4e6 observations) I get:
#Platform: x86_64-pc-linux-gnu (64-bit)
#R3.5.1
#1.11.8
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 50.22021 53.60446 56.84797 54.67532 57.61063 118.4352 100
#R3.4.4
#1.11.8
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 43.01133 44.16637 46.24535 44.63486 44.84428 89.37498 100
And 1.12.0 gives me x2 time increase on both R versions. (I think this is due to #3395):
#1.12.0
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 90.60795 93.11161 119.436 109.1086 146.69 178.8412 30
It looks like I don't have permissions to re-open this issue. Please re-open it if you can confirm my tests.
@s-Nick-s I think it is related to number of threads (as in the link you mentioned). Below is R 3.5.2 vs R 3.4.4 using only single thread in both cases, data.table 1.12.0.
Please confirm that setting threads to 1 makes R 3.5+ faster than R 3.4.
library(data.table)
set.seed(1)
types <- c("A", "B", "C", "D", "E", "F")
setDTthreads(1L)
obs <- 4e6
dt1 <- data.table(percent = round(runif(obs, min = 0, max = 1), digits = 2),
type = as.factor(sample(types, obs, replace = TRUE)))
system.time(dt1[, list(percent_total = sum(percent)), by = type])
microbenchmark::microbenchmark(
test1 <- dt1[, list(percent_total = sum(percent)), by = type]
, times = 30)
3.5.2
user system elapsed
0.095 0.012 0.107
> microbenchmark::microbenchmark(
+ test1 <- dt1[, list(percent_total = sum(percent)), by = type]
+ , times = 30)
Unit: milliseconds
expr min
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 85.36912
lq mean median uq max neval
87.16382 99.1758 98.38073 105.0603 120.9579 30
3.4.4
user system elapsed
0.103 0.004 0.107
> microbenchmark::microbenchmark(
+ test1 <- dt1[, list(percent_total = sum(percent)), by = type]
+ , times = 30)
Unit: milliseconds
expr min
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 101.8106
lq mean median uq max neval
106.1996 115.2929 114.8559 122.1709 136.3099 30
@jangorecki
For me the 1.12.0 with setDTthreads(1L) has about the same speed on both R versions.
I guess that means that this issue is fixed in 1.12.0
Thank you for testing this. I will re-test and confirm once the thread number issue is fixed.
Hi @jangorecki
I worked with @s-Nick-s a while back when he discovered the performance drop. I did some tests these days, and although I find 1.12.+ to give same or better time on R3.5+ vs R.3.4.4, i do find that running R3.4.4 with data.table 1.11.8 is about 2x as quick as running this example on anything newer than that combination (any R version beginning R3.5 all the way to R4.0.0) with any version of data.table 1.12.x (I did use setDTthreads(1L))
R3.4.4 + data.table 1.11.8
Unit: milliseconds
expr min lq mean median
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 51.30958 55.41111 73.35613 59.93117
uq max neval
87.61779 154.0714 30
R3.6.3 + data.table 1.12.8
Unit: milliseconds
expr min lq mean median
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 121.8025 135.3461 178.0072 178.2529
uq max neval
195.0769 347.6791 30
Please if you have the time, make a clean install of R3.4.4 + data.table 1.11.8 and verify this.
Looking forward to your reply,
Cheers.
@RaynorJim Thanks for letting us know. I run some benchmarks
#docker run --rm -it r-base:3.4.4
#docker run --rm -it r-base:3.6.3
install.packages("microbenchmark", quiet=TRUE)
#install.packages("https://cloud.r-project.org/src/contrib/Archive/data.table/data.table_1.11.8.tar.gz", repos=NULL)
#install.packages("https://cloud.r-project.org/src/contrib/Archive/data.table/data.table_1.12.0.tar.gz", repos=NULL)
#install.packages("data.table", quiet=TRUE) # 1.12.8
options(width=999)
library(data.table)
set.seed(1)
types = c("A", "B", "C", "D", "E", "F")
#obs = 4e6
#obs = 4e7
dt1 = data.table(
percent = round(runif(obs, min = 0, max = 1), digits = 2),
type = as.factor(sample(types, obs, replace = TRUE))
)
setDTthreads(1L)
th1=microbenchmark::microbenchmark(th01 = dt1[, list(percent_total = sum(percent)), by = type], times = 10)
setDTthreads(40L)
th2=microbenchmark::microbenchmark(th40 = dt1[, list(percent_total = sum(percent)), by = type], times = 10)
rbind(th1,th2)
q("no")
And got the following timings (milliseconds)
R 3.4.4
expr min lq mean median uq max neval
11.8 th01 62.63322 63.61023 69.46025 67.22793 70.69045 92.64278 30
11.8 th40 62.75540 66.81539 71.80807 69.78321 75.12829 87.32864 30
12.0 th01 162.46948 175.45990 192.31498 197.09693 204.19589 227.6725 30
12.0 th40 58.82906 67.18446 81.93788 73.72452 96.58676 120.0546 30
12.8 th01 166.99444 182.70240 193.66958 193.62505 208.8364 221.9188 30
12.8 th40 70.08485 73.37909 87.06192 75.40962 102.3513 129.4137 30
R 3.6.3
expr min lq mean median uq max neval
11.8 th01 70.20075 82.61568 84.79386 86.75024 88.96923 93.98895 30
11.8 th40 70.03898 72.06237 83.17097 86.10463 89.22889 106.69598 30
12.0 th01 169.67496 178.29791 187.48729 184.22019 196.46390 224.7092 30
12.0 th40 60.06581 65.27729 73.94895 70.57542 77.37153 106.6533 30
12.8 th01 183.24098 183.93141 196.07175 198.5182 206.24578 211.0278 30
12.8 th40 67.44557 71.16225 83.00839 75.6796 98.07114 110.1393 30
R 3.4.4
expr min lq mean median uq max neval
11.8 th01 812.8381 824.5753 835.4540 836.2424 847.0650 870.853 30
11.8 th40 813.1851 818.7180 832.1787 826.0476 848.4874 873.471 30
12.0 th01 2201.2274 2230.5826 2283.4230 2291.6118 2328.9514 2356.1045 30
12.0 th40 582.6014 619.0834 646.6066 647.6043 674.6852 726.8556 30
12.8 th01 2182.8077 2202.8348 2218.1768 2213.1107 2244.6785 2255.4742 10
12.8 th40 724.3372 729.4303 748.9661 736.9562 780.0272 796.9019 10
R 3.6.3
expr min lq mean median uq max neval
11.8 th01 920.6722 936.0939 946.9035 943.9847 956.2975 977.2512 10
11.8 th40 935.6252 944.1761 954.3911 950.5467 956.6025 995.5108 10
12.0 th01 2241.5981 2252.7387 2262.9659 2260.5077 2274.1610 2282.4533 10
12.0 th40 625.8483 639.3864 659.6355 661.8368 675.7402 689.2096 10
12.8 th01 2229.2616 2279.0100 2324.2365 2346.7960 2364.8414 2390.7510 10
12.8 th40 664.3308 699.2312 740.8568 747.6503 779.1357 782.5371 10
Will come back soon to make conclusion. I would appriciate if you could briefly check if those numbers match against yours, more or less.
One observation that is quite clear from above timings is that single threaded code is slower starting from 1.12.0 comparing to 1.11.8, this is covered by #3330.
At the same time we can see that more threads doesn't speed up 1.11.8, this is because parallelism was not used much there yet, as we can see in news entries for v1.12.0.
Let's then focus on R 3.4.4 vs post-R 3.5, here R 3.6.3.
As we can see here, there is a difference, but it was not really that big:
expr min lq mean median uq max neval
3.4.4 11.8 th01 62.63322 63.61023 69.46025 67.22793 70.69045 92.64278 30
3.6.3 11.8 th01 70.20075 82.61568 84.79386 86.75024 88.96923 93.98895 30
4e7
expr min lq mean median uq max neval
3.4.4 11.8 th01 812.8381 824.5753 835.4540 836.2424 847.0650 870.853 30
3.6.3 11.8 th01 920.6722 936.0939 946.9035 943.9847 956.2975 977.2512 10
Now lets see if that is still valid, doesn't seem so:
expr min lq mean median uq max neval
3.4.4 12.8 th01 166.99444 182.70240 193.66958 193.62505 208.8364 221.9188 30
3.6.3 12.8 th01 183.24098 183.93141 196.07175 198.5182 206.24578 211.0278 30
4e7
expr min lq mean median uq max neval
3.4.4 12.8 th01 2182.8077 2202.8348 2218.1768 2213.1107 2244.6785 2255.4742 10
3.6.3 12.8 th01 2229.2616 2279.0100 2324.2365 2346.7960 2364.8414 2390.7510 10
If you are after minimizing speed, and you cannot use multiple threads, then going back to 1.11.8 is probably the best choice for your. You may also want to upvote and subscribe to #3330.
Closing this issue for now. @RaynorJim If you believe your case is not covered by #3330 please let us know, we can always re-open.
Appologies, I have been busy all day. I wasn't aware single threaded performance is covered by #3330, I will watch that issue.
Indeed i cannot move away from 1.11.8 because we run all our code on a parallel cluster and all our CPUs are maxed out, so i am looking for as much speed from data.table as possible per one thread.
Thanks again for checking this out, I appreciate it.