I have dataset uploaded to Spark.
Then I try to perform some methods for invoke.
I get an error if I use invoke with select
or groupBy
, but I can successfully run invoke on drop
iris_tbl <- copy_to(sc, iris)
x <- spark_dataframe(iris_tbl)
Success:
x %>% invoke("drop","Species")
<jobj[63]>
class org.apache.spark.sql.Dataset
[Sepal_Length: double, Sepal_Width: double ... 2 more fields]
Error:
```
x %>% invoke("select","Species")
Error: java.lang.Exception: No matched method found for class org.apache.spark.sql.Dataset.select
at sparklyr.Invoke$.invoke(invoke.scala:91)
at sparklyr.StreamHandler$.handleMethodCall(stream.scala:89)
at sparklyr.StreamHandler$.read(stream.scala:55)
at sparklyr.BackendHandler.channelRead0(handler.scala:49)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
... ```
@VadymBoikov currently, sparklyr does not support variable parameters which is the case of select
(see http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset), since it takes a String
followed by a String*
.
In order to workaround this issue, you can pass an empty String[]
by calling select()
as follows:
x %>% invoke("select", "Species", list())
It works, thank you
@javierluraschi - how could I select multiple columns?
EDIT: Answer for anyone who ends up here.
cols = list("Species" , "Petal.Length" "Petal.Width")
x %>% invoke("select", cols[[1]], cols[2:length(cols)])
Most helpful comment
@javierluraschi - how could I select multiple columns?
EDIT: Answer for anyone who ends up here.