Dplyr: bind_rows() creates multiple `.id` columns

Created on 1 Feb 2019  Â·  5Comments  Â·  Source: tidyverse/dplyr

library(dplyr,warn.conflicts = FALSE)

iris%>%
  tibble::as_tibble()%>%
  dplyr::select(Species,dplyr::everything())%>%
  dplyr::mutate(Species = as.numeric(factor(Species)))%>%
  split(.$Species)%>%
  purrr::map_df(.f = function(x) x%>%dplyr::slice(1),.id = 'Species')
#> # A tibble: 3 x 6
#>   Species Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   <chr>     <dbl>        <dbl>       <dbl>        <dbl>       <dbl>
#> 1 1             1          5.1         3.5          1.4         0.2
#> 2 2             2          7           3.2          4.7         1.4
#> 3 3             3          6.3         3.3          6           2.5

Created on 2019-02-01 by the reprex package (v0.2.1)

Session info

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2019-02-01
#> Packages -----------------------------------------------------------------
#>  package    * version date       source                          
#>  assertthat   0.2.0   2017-04-11 CRAN (R 3.5.0)                  
#>  base       * 3.5.1   2018-07-05 local                           
#>  cli          1.0.1   2018-09-25 CRAN (R 3.5.0)                  
#>  compiler     3.5.1   2018-07-05 local                           
#>  crayon       1.3.4   2017-09-16 CRAN (R 3.5.0)                  
#>  datasets   * 3.5.1   2018-07-05 local                           
#>  devtools     1.13.6  2018-06-27 CRAN (R 3.5.0)                  
#>  digest       0.6.18  2018-10-10 CRAN (R 3.5.0)                  
#>  dplyr      * 0.8.0   2019-01-17 Github (tidyverse/dplyr@9aa5846)
#>  evaluate     0.12    2018-10-09 CRAN (R 3.5.0)                  
#>  fansi        0.4.0   2018-10-05 CRAN (R 3.5.0)                  
#>  glue         1.3.0   2018-11-14 Github (tidyverse/glue@35c61e9) 
#>  graphics   * 3.5.1   2018-07-05 local                           
#>  grDevices  * 3.5.1   2018-07-05 local                           
#>  htmltools    0.3.6   2017-04-28 CRAN (R 3.5.0)                  
#>  knitr        1.20    2018-02-20 CRAN (R 3.5.0)                  
#>  magrittr     1.5     2014-11-22 CRAN (R 3.5.0)                  
#>  memoise      1.1.0   2017-04-21 CRAN (R 3.5.0)                  
#>  methods    * 3.5.1   2018-07-05 local                           
#>  pillar       1.3.1   2018-12-15 CRAN (R 3.5.0)                  
#>  pkgconfig    2.0.2   2018-08-16 CRAN (R 3.5.0)                  
#>  purrr        0.3.0   2019-01-27 CRAN (R 3.5.2)                  
#>  R6           2.3.0   2018-10-04 CRAN (R 3.5.0)                  
#>  Rcpp         1.0.0   2018-11-07 CRAN (R 3.5.0)                  
#>  rlang        0.3.1   2019-01-08 CRAN (R 3.5.2)                  
#>  rmarkdown    1.11    2018-12-08 CRAN (R 3.5.0)                  
#>  stats      * 3.5.1   2018-07-05 local                           
#>  stringi      1.2.4   2018-07-20 CRAN (R 3.5.0)                  
#>  stringr      1.3.1   2018-05-10 CRAN (R 3.5.0)                  
#>  tibble       2.0.1   2019-01-12 CRAN (R 3.5.2)                  
#>  tidyselect   0.2.5   2018-10-11 CRAN (R 3.5.0)                  
#>  tools        3.5.1   2018-07-05 local                           
#>  utf8         1.1.4   2018-05-24 CRAN (R 3.5.0)                  
#>  utils      * 3.5.1   2018-07-05 local                           
#>  withr        2.1.2   2018-03-15 CRAN (R 3.5.0)                  
#>  yaml         2.2.0   2018-07-25 CRAN (R 3.5.0)


i guess this relates more specifically to how dplyr::bind_rows treats .id. There is no check to see if the .id column already exists in the bindable data and if there are competing classes (bindable[[.id]] vs character)

bug

All 5 comments

dplyr reprex:

dfs <- list(a = mtcars[1:2, ], b = mtcars[2:3, ])
dplyr::bind_rows(dfs, .id = "cyl")
#>   cyl  mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1   a 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2   a 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3   b 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 4   b 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

@hadley should we just make this an error ?

We should make it consistent with https://principles.tidyverse.org/names-attribute.html, and we should make sure we tackle as many naming issues as possible in a single fix so we can be as consistent as possible.

See analysis in https://github.com/tidyverse/tidyr/issues/547 — this issue is not actually about name repair, it's about what to do when the user explicitly supplies a column name that already exists in the data. Our current principle is to be consistent with mutate() and replace the existing variable, so I think we'd need a strong reason to do differently here.

Looks fixed now.

library(dplyr,warn.conflicts = FALSE)

iris%>%
  tibble::as_tibble()%>%
  dplyr::select(Species,dplyr::everything())%>%
  dplyr::mutate(Species = as.numeric(factor(Species)))%>%
  split(.$Species)%>%
  purrr::map_df(.f = function(x) x%>%dplyr::slice(1),.id = 'Species')
#> # A tibble: 3 x 5
#>   Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   <chr>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1 1                5.1         3.5          1.4         0.2
#> 2 2                7           3.2          4.7         1.4
#> 3 3                6.3         3.3          6           2.5

Created on 2019-11-27 by the reprex package (v0.3.0.9000)

Was this page helpful?
0 / 5 - 0 ratings