Sparklyr: sdf_bind_rows filling in NaNs

Created on 29 Jun 2017 · 4Comments · Source: sparklyr/sparklyr

sdf_bind_rows is filling NaNs in the following example. My guess would be: is some of it is the fact that year is of time int in one column and of type dbl in another. R users would expect type promotion at this point. But also notice the count column is also NaN-out (the input legitimately did have a NaN, but notice how we lost all values).

reprex below.

Also I am seeing a lot of warnings of the form:

Warning message:
Translator is missing window functions:
count, n_distinct

Is there some way to force sparklyr to re-install its local sparks? I am running on MacOS.

suppressPackageStartupMessages(library("dplyr"))
library("sparklyr")
packageVersion("dplyr")
#> [1] '0.7.1.9000'
packageVersion("sparklyr")
#> [1] '0.5.6.9008'

my_db <- sparklyr::spark_connect(version='2.0.2', 
                                 master = "local")

a <- sparklyr::spark_read_parquet(my_db, 'a', '~/data/a')
b <- sparklyr::spark_read_parquet(my_db, 'b', '~/data/b')

print(a)
#> # Source:   table<a> [?? x 3]
#> # Database: spark_connection
#>    year count  name
#>   <dbl> <dbl> <chr>
#> 1  2005     6     a
#> 2  2007     1     b
#> 3  2010   NaN     c

print(b, n=100)
#> # Source:   table<b> [?? x 3]
#> # Database: spark_connection
#>     year  name count
#>    <int> <chr> <dbl>
#>  1  2006     a     0
#>  2  2007     a     0
#>  3  2008     a     0
#>  4  2009     a     0
#>  5  2010     a     0
#>  6  2005     b     0
#>  7  2006     b     0
#>  8  2008     b     0
#>  9  2009     b     0
#> 10  2010     b     0
#> 11  2005     c     0
#> 12  2006     c     0
#> 13  2007     c     0
#> 14  2008     c     0
#> 15  2009     c     0
#> 16  2005     d     0
#> 17  2006     d     0
#> 18  2007     d     0
#> 19  2008     d     0
#> 20  2009     d     0
#> 21  2010     d     0

v <- sparklyr::sdf_bind_rows(list(a,b))

print(v, n=1000)
#> # Source:   table<sparklyr_tmp_be3445305142> [?? x 3]
#> # Database: spark_connection
#>     year count  name
#>    <dbl> <dbl> <chr>
#>  1   NaN   NaN     a
#>  2   NaN   NaN     b
#>  3   NaN   NaN     c
#>  4   NaN   NaN     a
#>  5   NaN   NaN     a
#>  6   NaN   NaN     a
#>  7   NaN   NaN     a
#>  8   NaN   NaN     a
#>  9   NaN   NaN     b
#> 10   NaN   NaN     b
#> 11   NaN   NaN     b
#> 12   NaN   NaN     b
#> 13   NaN   NaN     b
#> 14   NaN   NaN     c
#> 15   NaN   NaN     c
#> 16   NaN   NaN     c
#> 17   NaN   NaN     c
#> 18   NaN   NaN     c
#> 19   NaN   NaN     d
#> 20   NaN   NaN     d
#> 21   NaN   NaN     d
#> 22   NaN   NaN     d
#> 23   NaN   NaN     d
#> 24   NaN   NaN     d

Source