Dplyr: Double dispatch for bind_rows() and joins

Created on 21 Feb 2017  Â·  15Comments  Â·  Source: tidyverse/dplyr

bind_rows() currently only works with data frames (or things very similar to dataframes), but it should work for more data structures:

  • data.tables (#1539)
  • database tables (#2373, #2055)
  • data frames with problems attribute (#1467)

Getting this to work is challenging because bind_rows() takes ...: how can we find the right method? One approach would be to use reduce() to reduce the problem to finding the right method for a pair of tables. However, this will lose many of the performance benefits of bind_rows() because it will have to grow a data frame.

I think the best compromise will be to find the GCD of the classes of all elements of .... If this is not null, call bind_rows_n.common_class(dots). If it is NULL, fall back to reduce(dots, bind_rows_2).

feature verbs

Most helpful comment

Still no fix for sf objects?

All 15 comments

This might be implicit but bind_rows also fails with sf objects (https://github.com/hadley/dplyr/issues/2459)

@hadley I'm working on a bind_rows implementation for sparklyr. What needs to happen in dplyr for me to expose bind_rows.tbl_spark?

@kevinykuo nothing, because it's not currently possible :(

I just realised that the problem is now even harder since we support vectors. Some vectors have classes (e.g. factors and dates), and we'll dispatch on more vector classes in the future as we sort out c-level vector dispatch.

Though one easy improvement would be to call collect() on objects inheriting from tbl_lazy. And we could still deal with objects inheriting from data frames if we find a reasonable solution for propagating attributes. However I don't think we'll be able to deal with objects like sp.

Edit: actually we could consider all S4 objects and S3 lists as potential data frames. Or we could have a is_recursive_vector() generic in rlang or S3.

Would it make sense to have a version of bind_rows() that works just within each backend?

EDIT: OK I misunderstood before so my previous question didn't make sense. Now to clarify, are you thinking bind_rows_2 would attempt some sort of class conversion, @hadley ?

Units get lost in bind_rows as well

I'd love to see a fix for this. I've got a bunch of data.frames with survival:Surv columns and mismatched columns that need rbind-ing. If there is a plan for dealing with this already, I might have time to put together an implementation.

Still no fix for sf objects?

I'm building a package that has a special column variable and I'm facing this issue. Another thread mentioned that I can write a bind_rows method to fix this on my side? I looked at the code and it's all Rcpp, how could I add this functionality? What method(s) would I create?

I think I'm getting a similar error. If anyone already as a solution please let me know.
Here is the code snippet.

samp2 %>%
rowwise() %>%
mutate (OddsR =oddsRatio(matrix(c(PRE_CORRECT_NUMBER, PRE_INCORRECT, POST_CORRECT_NUMBER, POST_INCORRECT),2,2) ) )

Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes
2: In mutate_impl(.data, dots) :
Vectorizing 'oddsRatio' elements may not preserve their attributes

The same problem with fs paths:

  dplyr::bind_rows(
    dplyr::tibble(fs::path(getwd())),
    dplyr::tibble(fs::path(getwd()))
)
#> Warning in bind_rows_(x, .id): Vectorizing 'fs_path' elements may not
#> preserve their attributes

#> Warning in bind_rows_(x, .id): Vectorizing 'fs_path' elements may not
#> preserve their attributes
#> # A tibble: 2 x 1
#>   `fs::path(getwd())`                                        
#>   <chr>                                                      
#> 1 C:/Users/lw/AppData/Local/Temp/RtmpmC9NBt/reprex8202e935860
#> 2 C:/Users/lw/AppData/Local/Temp/RtmpmC9NBt/reprex8202e935860

Created on 2018-11-09 by the reprex package (v0.2.1)

Session info

devtools::session_info()
#> Error in packageVersion(pkg, lib.loc = .libPaths()): package 'ps' not found

cc: @jimhester

Any updates here?

Also applies to all join functions.

The same problem with vctrs::list_of() columns

dplyr::bind_rows(
  list(dplyr::tibble(x = vctrs::list_of(1)))
)
#> Warning in bind_rows_(x, .id): Vectorizing 'vctrs_list_of' elements may not
#> preserve their attributes
#> # A tibble: 1 x 1
#>   x        
#>   <list>   
#> 1 <dbl [1]>

Created on 2019-10-02 by the reprex package (v0.3.0)

bind_rows() is now implemented on top of vctrs::vec_rbind() which seems to deal with these issues:

library(dplyr,warn.conflicts = FALSE)

bind_rows(
  tibble(fs::path(getwd())),
  tibble(fs::path(getwd()))
)
#> # A tibble: 2 x 1
#>   `fs::path(getwd())`                                                      
#>   <chr>                                                                    
#> 1 /private/var/folders/4b/hn4fq98s6810s4ccv2f9hm2h0000gn/T/RtmpOQ3n1o/repr…
#> 2 /private/var/folders/4b/hn4fq98s6810s4ccv2f9hm2h0000gn/T/RtmpOQ3n1o/repr…
vctrs::vec_rbind(
  tibble(fs::path(getwd())),
  tibble(fs::path(getwd()))
)
#>                                                                        fs::path(getwd())
#> 1 /private/var/folders/4b/hn4fq98s6810s4ccv2f9hm2h0000gn/T/RtmpOQ3n1o/reprex4fd752792fb4
#> 2 /private/var/folders/4b/hn4fq98s6810s4ccv2f9hm2h0000gn/T/RtmpOQ3n1o/reprex4fd752792fb4



bind_rows(
  tibble(x = vctrs::list_of(1))
)
#> # A tibble: 1 x 1
#>             x
#>   <list<dbl>>
#> 1         [1]
vctrs::vec_rbind(
  tibble(x = vctrs::list_of(1))
)
#> # A tibble: 1 x 1
#>             x
#>   <list<dbl>>
#> 1         [1]

Created on 2019-11-27 by the reprex package (v0.3.0.9000)

Was this page helpful?
0 / 5 - 0 ratings