Sparklyr: mutate_each: Error: evaluation nested too deeply: infinite recursion / options(expressions=)?

Created on 16 Aug 2016  路  4Comments  路  Source: sparklyr/sparklyr

Does this error mean that the dataset was too big? Is there a way to make it work with this options(expressions=) (but how)?

> library(sparklyr)
> spark_disconnect(sc)
> sc <- spark_connect("local", version = "2.0.0")
> 
> mat_1000x1000 <- matrix(runif(1*(10^6)), ncol = 1000, nrow = 1000)
> mat_1000x1001 <- cbind(mat_1000x1000, 1)
> colnames(mat_1000x1001) <- paste0("topic", 1:1001)
> mat_1000x1001 <- as.data.frame(mat_1000x1001)
> mat_1000x1001_tbl <- copy_to(sc, mat_1000x1001, "mat_1000x1001", overwrite = TRUE)
> 
> library(dplyr)
> mat_1000x1001_tbl %>%
+   mutate_each(funs(norm = . / topic1001)) %>%
+   select(contains("norm")) %>%
+   ml_kmeans(centers=3) -> km_model
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?

Session Info

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=pl_PL.UTF-8       LC_NUMERIC=C               LC_TIME=pl_PL.UTF-8        LC_COLLATE=pl_PL.UTF-8    
 [5] LC_MONETARY=pl_PL.UTF-8    LC_MESSAGES=pl_PL.UTF-8    LC_PAPER=pl_PL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0    ggplot2_2.1.0  sparklyr_0.3.4

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      withr_1.0.2      digest_0.6.9     rprojroot_1.0-2  assertthat_0.1   rappdirs_0.3.1  
 [7] grid_3.3.1       R6_2.1.2         plyr_1.8.4       gtable_0.2.0     DBI_0.4-1        magrittr_1.5    
[13] scales_0.4.0     lazyeval_0.2.0   labeling_0.3     config_0.1.0     tools_3.3.1      munsell_0.4.3   
[19] parallel_3.3.1   yaml_2.1.13      colorspace_1.2-6 tibble_1.1      
dplyr

Most helpful comment

makes munging large data sets (the kind one needs spark for) hard to deal with.

It does seem related to number of columns, the problem does not arise with a small number of columns.

testdf <- data.frame(a1 = rnorm(1e5), a2 = rnorm(1e5))
testdf_tbl <- copy_to(sc, testdf)
testdf_tbl <- copy_to(sc, testdf, overwrite = TRUE)
testdf_tbl %>% mutate_all(funs(sign(.))) %>% head()
Source:   query [?? x 2]
Database: spark connection master=local[24] app=sparklyr local=TRUE

     a1    a2
  <dbl> <dbl>
1     1     1
2     1    -1
3     1     1
4     1    -1
5     1     1
6     1    -1

works.

but a table with 10 rows and 1000 columns fails.

testmat<-matrix(runif(10*1000), ncol=1000) 
testdf <- as.data.frame(testmat)
testdf_tbl <- copy_to(sc, testdf, overwrite = TRUE)
testdf_tbl %>% mutate_all(funs(sign(.))) %>% head()
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?

All 4 comments

The error here is occurring due to the use of mutate_each; at least, attempting to evaluate this reproduces the error for me:

mat_1000x1001_tbl %>%
  mutate_each(funs(norm = . / topic1001))

makes munging large data sets (the kind one needs spark for) hard to deal with.

It does seem related to number of columns, the problem does not arise with a small number of columns.

testdf <- data.frame(a1 = rnorm(1e5), a2 = rnorm(1e5))
testdf_tbl <- copy_to(sc, testdf)
testdf_tbl <- copy_to(sc, testdf, overwrite = TRUE)
testdf_tbl %>% mutate_all(funs(sign(.))) %>% head()
Source:   query [?? x 2]
Database: spark connection master=local[24] app=sparklyr local=TRUE

     a1    a2
  <dbl> <dbl>
1     1     1
2     1    -1
3     1     1
4     1    -1
5     1     1
6     1    -1

works.

but a table with 10 rows and 1000 columns fails.

testmat<-matrix(runif(10*1000), ncol=1000) 
testdf <- as.data.frame(testmat)
testdf_tbl <- copy_to(sc, testdf, overwrite = TRUE)
testdf_tbl %>% mutate_all(funs(sign(.))) %>% head()
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?

This is still a case

This seems finally fixed in Spark 2.2.0:

> mat_1000x1001_tbl %>%
+   mutate_each(funs(norm = . / topic1001)) %>%
+   select(contains("norm")) %>%
+   ml_kmeans(centers=3) -> km_model
`mutate_each()` is deprecated.
Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
To map `funs` over all variables, use `mutate_all()`
* No rows dropped by 'na.omit' call
> km_model
K-means clustering with 3 clusters

Cluster centers:
  topic1_norm topic2_norm topic3_norm topic4_norm topic5_norm topic6_norm topic7_norm topic8_norm topic9_norm
  topic10_norm topic11_norm topic12_norm topic13_norm topic14_norm topic15_norm topic16_norm topic17_norm
  topic18_norm topic19_norm topic20_norm topic21_norm topic22_norm topic23_norm topic24_norm topic25_norm
  topic26_norm topic27_norm topic28_norm topic29_norm topic30_norm topic31_norm topic32_norm topic33_norm
  topic34_norm topic35_norm topic36_norm topic37_norm topic38_norm topic39_norm topic40_norm topic41_norm
  topic42_norm topic43_norm topic44_norm topic45_norm topic46_norm topic47_norm topic48_norm topic49_norm
Was this page helpful?
0 / 5 - 0 ratings

Related issues

maggiemhanna picture maggiemhanna  路  4Comments

isomorphisms picture isomorphisms  路  3Comments

wanting0wang picture wanting0wang  路  3Comments

dsblr picture dsblr  路  4Comments

hanfernandes picture hanfernandes  路  3Comments