Dplyr: wrong column names when using `across` with multiple functions

Created on 11 Mar 2020 · 6Comments · Source: tidyverse/dplyr

Hello,

I noticed that across mixed up the column names in the output when I used multiple functions all at once. Please see the reprex blow

Thanks!


library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)

### ok
iris %>%
  group_by(Species) %>%
  summarise_if(is.numeric, ~ mean(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct>             <dbl>       <dbl>        <dbl>       <dbl>
#> 1 setosa             5.01        3.43         1.46       0.246
#> 2 versicolor         5.94        2.77         4.26       1.33 
#> 3 virginica          6.59        2.97         5.55       2.03

iris %>%
  group_by(Species) %>%
  summarise_if(is.numeric, ~ max(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct>             <dbl>       <dbl>        <dbl>       <dbl>
#> 1 setosa              5.8         4.4          1.9         0.6
#> 2 versicolor          7           3.4          5.1         1.8
#> 3 virginica           7.9         3.8          6.9         2.5

### also ok
iris %>%
  group_by(Species) %>%
  summarise(
    across(c(Sepal.Length:Petal.Width), ~ mean(.x, na.rm = TRUE), names = "mean_{col}"),
    across(c(Sepal.Length:Petal.Width), ~ max(.x, na.rm = TRUE), names = "max_{col}")
  ) 
#> # A tibble: 3 x 9
#>   Species    mean_Sepal.Length mean_Sepal.Width mean_Petal.Length
#> * <fct>                  <dbl>            <dbl>             <dbl>
#> 1 setosa                  5.01             3.43              1.46
#> 2 versicolor              5.94             2.77              4.26
#> 3 virginica               6.59             2.97              5.55
#>   mean_Petal.Width max_Sepal.Length max_Sepal.Width max_Petal.Length
#> *            <dbl>            <dbl>           <dbl>            <dbl>
#> 1            0.246              5.8             4.4              1.9
#> 2            1.33               7               3.4              5.1
#> 3            2.03               7.9             3.8              6.9
#>   max_Petal.Width
#> *           <dbl>
#> 1             0.6
#> 2             1.8
#> 3             2.5

Output has wrong column names with multiple functions

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(is.numeric, my_func, names = "{fn}.{col}")) 
#> # A tibble: 3 x 9
#>   Species    mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01             3.43             1.46           0.246
#> 2 versicolor              5.94             2.77             4.26           1.33 
#> 3 virginica               6.59             2.97             5.55           2.03 
#>   mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1               5.8              4.4              1.9             0.6
#> 2               7                3.4              5.1             1.8
#> 3               7.9              3.8              6.9             2.5

^{Created on 2020-03-10 by the reprex package (v0.3.0)}

bug each-col ↔️

Source

tungmilan

All 6 comments

Looks like the problem only arises when names are set. Here's a more minimal reprex:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = 1:3, y = 2:4)
df %>% summarise(across(everything(), list(min = min, max = max)))
#> # A tibble: 1 x 4
#>   x_min x_max y_min y_max
#>   <int> <int> <int> <int>
#> 1     1     2     3     4
df %>% summarise(across(everything(), list(min = min, max = max), names = "{fn}.{col}"))
#> # A tibble: 1 x 4
#>   min.x max.x min.y max.y
#>   <int> <int> <int> <int>
#> 1     1     2     3     4

^{Created on 2020-03-11 by the reprex package (v0.3.0)}

Thanks for spotting this!

hadley on 11 Mar 2020

👍1

Thanks @hadley ! I got the same output w/ or w/o names arg

library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(is.numeric, my_func))
#> # A tibble: 3 x 9
#>   Species    Sepal.Length_mean Sepal.Length_max Sepal.Width_mean Sepal.Width_max
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01             3.43             1.46           0.246
#> 2 versicolor              5.94             2.77             4.26           1.33 
#> 3 virginica               6.59             2.97             5.55           2.03 
#>   Petal.Length_mean Petal.Length_max Petal.Width_mean Petal.Width_max
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1               5.8              4.4              1.9             0.6
#> 2               7                3.4              5.1             1.8
#> 3               7.9              3.8              6.9             2.5

^{Created on 2020-03-11 by the reprex package (v0.3.0)}

tungmilan on 11 Mar 2020

I got the same output w/ or w/o names arg

So did @hadley! 😉

With the example hadley gave, it's returning (but mislabelling) min.x, min.y, max.x, max.y.

batpigandme on 11 Mar 2020

😄1

@batpigandme you are right. I didn't notice that :)

tungmilan on 11 Mar 2020

Oh yeah, I was confused.

hadley on 11 Mar 2020

Just wanted to post but came across this issue. I think it's another example of the same problem so I am adding it here. Note that the lead and lags are all mixed up and wrong.

```
tibble(x1 = 1:10, x2 = 11:20) %>%
mutate(across(
starts_with("x"),
list(lag1 = ~lag(.x, 1), lag2 = ~lag(.x, 2), lead1 = ~lead(.x, 1)),
))