Sparklyr: OutOfMemoryError

Created on 12 Dec 2016  路  3Comments  路  Source: sparklyr/sparklyr

loading a small dataset works file, iris_tbl <- copy_to(sc, iris), but loading a slightly large one gives OutOfMemoryError. The memory should be enough though.

configuration (1 master + 3 workers)

config <- spark_config()
config$spark.executor.memory <- "16G"
config$spark.driver.memory <- "16G"

sparkmaster <- Sys.getenv("sparkmaster")
sc <- spark_connect(master = sparkmaster, config=config)
> babynames_tbl <- copy_to(sc, babynames, "babynames")
|================================================================================| 100%   65 MB
Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.lang.Double.valueOf(Double.java:521)
        at scala.runtime.BoxesRunTime.boxToDouble(BoxesRunTime.java:79)
        at sparklyr.Utils$$anonfun$5$$anonfun$6.apply(utils.scala:224)
        at sparklyr.Utils$$anonfun$5$$anonfun$6.apply(utils.scala:218)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofInt.foreach(ArrayOps.scala:234)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofInt.map(ArrayOps.scala:234)
        at sparklyr.Utils$$anonfun$5.apply(utils.scala:218)
        at sparklyr.Utils$$anonfun$5.apply(utils.scala:216)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at sparklyr.Utils$.createDataFrameFromText(utils.scala:216)
        at sparklyr.Utils.createDataFrameFromText(utils.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sparklyr.Invoke$.invoke(invoke.scala:94)
        at sparklyr.StreamHandler$.handleMethodCall(stream.scala:89)
        at sparklyr.StreamHandler$.read(stream.scala:55)
        at sparklyr.BackendHandler.channelRead0(handler.scala:49)
        at sparklyr.BackendHandler.channelRead0(handler.scala:14)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
> 
spark settings

Most helpful comment

@eijoac you can try executing the same operation by setting the following additional parameter before creating spark context object.

config <- spark_config()
config$sparklyr.shell.driver-memory <- "4G"
config$sparklyr.shell.executor-memory <- "4G"
config$spark.yarn.executor.memoryOverhead <- "1g"
sc <- spark_connect(master = "local", config = config)

It works for me.

All 3 comments

@eijoac you can try executing the same operation by setting the following additional parameter before creating spark context object.

config <- spark_config()
config$sparklyr.shell.driver-memory <- "4G"
config$sparklyr.shell.executor-memory <- "4G"
config$spark.yarn.executor.memoryOverhead <- "1g"
sc <- spark_connect(master = "local", config = config)

It works for me.

Thanks. Adding sparklyr.shell.driver-memory and sparklyr.shell.executor-memory worked (in my case, it is standalone cluster mode). I wonder why it worked? Does copy_to use Spark shell?

copy_to does use memory proportional to the object being copied. Glad the right settings worked in this case.

Was this page helpful?
0 / 5 - 0 ratings