Sparklyr: Large Uploads

Created on 17 Aug 2017  路  4Comments  路  Source: sparklyr/sparklyr

Milestone feature to support large file uploads in copy_to, through the standard connections and other methods like Livy, etc.

Related to:

featurerequest wishlist

Most helpful comment

Hi there @javierluraschi , I'm also stumbling in this issue and trying to copy larger files with a livy connection. I've seen that this issue has moved to "Done" in the SparklyrBoard. Was it already been developed? Any improvements on this thread? Is it possible to tweak the "livy.rsc.rpc.max.size" in livy connection (possibly here): https://github.com/sparklyr/sparklyr/blob/31b0f557d8acf616729f2aa9a7863f177bc1c96c/R/livy_connection.R#L113)?

Thank you!

All 4 comments

I can confirm, copy_to leads to error:

Failed to execute Livy statement with error: java.lang.ClassFormatError: Unknown constant tag 78 in class file $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$

Has anyone encountered this before? I am getting the error:

Error in livy_validate_http_response("Failed to create livy session",  : 
  Failed to create livy session (Client error: (400) Bad Request): "Unrecognized field \"rsc.rpc.max.size\" (class com.cloudera.livy.server.interactive.CreateInteractiveRequest), not marked as ignorable (15 known properties: \"executorCores\", \"conf\", \"driverMemory\", \"name\", \"driverCores\", \"pyFiles\", \"archives\", \"queue\", \"kind\", \"executorMemory\", \"files\", \"jars\", \"proxyUser\", \"numExecutors\", \"heartbeatTimeoutInSecond\" [truncated]])\n at [Source: HttpInputOverHTTP@6b405105; line: 1, column: 102] (through reference chain: com.cloudera.livy.server.interactive.CreateInteractiveRequest[\"rsc.rpc.max.size\"])"

after running the sequence of commands:

config <- spark_config()
config[["livy.rsc.rpc.max.size"]] <- 104857600
config[["spark.driver.memory"]] <- "2G"
config[["spark.executor.memory"]] <- "1G"
livy_conf <- livy_config( config = config )

sc <- spark_connect(master = "http://...:8998", method = "livy", config = liv_conf)

(sparklyr version sparklyr_0.6.3)

I am having the same issue as @alexfun when trying to tune livy_config() to see if I can copy_to about 600K rows of data.

Looking at the Livy API docs it doesn't appear as of the current version of livy that livy.rsc.rpc.max.size is a viable param in the body.

You can start a connection with this format by passing it in as a key-value pair into the conf param

config <- spark_config()
config[["conf"]] <- list(livy.rsc.rpc.max.size = 104857600)
config[["spark.driver.memory"]] <- "2G"
config[["spark.executor.memory"]] <- "1G"
livy_conf <- livy_config( config = config )
sc <- spark_connect(master = "http://...:8998", method = "livy", config = liv_conf)

but I'm not sure that's doing anything.

Hi there @javierluraschi , I'm also stumbling in this issue and trying to copy larger files with a livy connection. I've seen that this issue has moved to "Done" in the SparklyrBoard. Was it already been developed? Any improvements on this thread? Is it possible to tweak the "livy.rsc.rpc.max.size" in livy connection (possibly here): https://github.com/sparklyr/sparklyr/blob/31b0f557d8acf616729f2aa9a7863f177bc1c96c/R/livy_connection.R#L113)?

Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maggiemhanna picture maggiemhanna  路  4Comments

saraswatmks picture saraswatmks  路  3Comments

joscani picture joscani  路  4Comments

Fooourche picture Fooourche  路  3Comments

dangulod picture dangulod  路  4Comments