Dplyr: wrong column names when using `across` with multiple functions

Created on 11 Mar 2020  路  6Comments  路  Source: tidyverse/dplyr

Hello,

I noticed that across mixed up the column names in the output when I used multiple functions all at once. Please see the reprex blow

Thanks!


library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)

### ok
iris %>%
  group_by(Species) %>%
  summarise_if(is.numeric, ~ mean(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct>             <dbl>       <dbl>        <dbl>       <dbl>
#> 1 setosa             5.01        3.43         1.46       0.246
#> 2 versicolor         5.94        2.77         4.26       1.33 
#> 3 virginica          6.59        2.97         5.55       2.03

iris %>%
  group_by(Species) %>%
  summarise_if(is.numeric, ~ max(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct>             <dbl>       <dbl>        <dbl>       <dbl>
#> 1 setosa              5.8         4.4          1.9         0.6
#> 2 versicolor          7           3.4          5.1         1.8
#> 3 virginica           7.9         3.8          6.9         2.5

### also ok
iris %>%
  group_by(Species) %>%
  summarise(
    across(c(Sepal.Length:Petal.Width), ~ mean(.x, na.rm = TRUE), names = "mean_{col}"),
    across(c(Sepal.Length:Petal.Width), ~ max(.x, na.rm = TRUE), names = "max_{col}")
  ) 
#> # A tibble: 3 x 9
#>   Species    mean_Sepal.Length mean_Sepal.Width mean_Petal.Length
#> * <fct>                  <dbl>            <dbl>             <dbl>
#> 1 setosa                  5.01             3.43              1.46
#> 2 versicolor              5.94             2.77              4.26
#> 3 virginica               6.59             2.97              5.55
#>   mean_Petal.Width max_Sepal.Length max_Sepal.Width max_Petal.Length
#> *            <dbl>            <dbl>           <dbl>            <dbl>
#> 1            0.246              5.8             4.4              1.9
#> 2            1.33               7               3.4              5.1
#> 3            2.03               7.9             3.8              6.9
#>   max_Petal.Width
#> *           <dbl>
#> 1             0.6
#> 2             1.8
#> 3             2.5

Output has wrong column names with multiple functions

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(is.numeric, my_func, names = "{fn}.{col}")) 
#> # A tibble: 3 x 9
#>   Species    mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01             3.43             1.46           0.246
#> 2 versicolor              5.94             2.77             4.26           1.33 
#> 3 virginica               6.59             2.97             5.55           2.03 
#>   mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1               5.8              4.4              1.9             0.6
#> 2               7                3.4              5.1             1.8
#> 3               7.9              3.8              6.9             2.5

Created on 2020-03-10 by the reprex package (v0.3.0)

bug each-col 鈫旓笍

All 6 comments

Looks like the problem only arises when names are set. Here's a more minimal reprex:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = 1:3, y = 2:4)
df %>% summarise(across(everything(), list(min = min, max = max)))
#> # A tibble: 1 x 4
#>   x_min x_max y_min y_max
#>   <int> <int> <int> <int>
#> 1     1     2     3     4
df %>% summarise(across(everything(), list(min = min, max = max), names = "{fn}.{col}"))
#> # A tibble: 1 x 4
#>   min.x max.x min.y max.y
#>   <int> <int> <int> <int>
#> 1     1     2     3     4

Created on 2020-03-11 by the reprex package (v0.3.0)

Thanks for spotting this!

Thanks @hadley ! I got the same output w/ or w/o names arg

library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)

my_func <- list(
  mean = ~ mean(., na.rm = TRUE),
  max  = ~ max(., na.rm = TRUE)
)

iris %>%
  group_by(Species) %>%
  summarise(across(is.numeric, my_func))
#> # A tibble: 3 x 9
#>   Species    Sepal.Length_mean Sepal.Length_max Sepal.Width_mean Sepal.Width_max
#> * <fct>                  <dbl>            <dbl>            <dbl>           <dbl>
#> 1 setosa                  5.01             3.43             1.46           0.246
#> 2 versicolor              5.94             2.77             4.26           1.33 
#> 3 virginica               6.59             2.97             5.55           2.03 
#>   Petal.Length_mean Petal.Length_max Petal.Width_mean Petal.Width_max
#> *             <dbl>            <dbl>            <dbl>           <dbl>
#> 1               5.8              4.4              1.9             0.6
#> 2               7                3.4              5.1             1.8
#> 3               7.9              3.8              6.9             2.5

Created on 2020-03-11 by the reprex package (v0.3.0)

I got the same output w/ or w/o names arg

So did @hadley! 馃槈

With the example hadley gave, it's returning (but mislabelling) min.x, min.y, max.x, max.y.

@batpigandme you are right. I didn't notice that :)

Oh yeah, I was confused.

Just wanted to post but came across this issue. I think it's another example of the same problem so I am adding it here. Note that the lead and lags are all mixed up and wrong.

```
tibble(x1 = 1:10, x2 = 11:20) %>%
mutate(across(
starts_with("x"),
list(lag1 = ~lag(.x, 1), lag2 = ~lag(.x, 2), lead1 = ~lead(.x, 1)),
))

> # A tibble: 10 x 8

> x1 x2 x1_lag1 x1_lag2 x1_lead1 x2_lag1 x2_lag2 x2_lead1

>

> 1 1 11 NA NA NA NA 2 12

> 2 2 12 1 11 NA NA 3 13

> 3 3 13 2 12 1 11 4 14

> 4 4 14 3 13 2 12 5 15

> 5 5 15 4 14 3 13 6 16

> 6 6 16 5 15 4 14 7 17

> 7 7 17 6 16 5 15 8 18

> 8 8 18 7 17 6 16 9 19

> 9 9 19 8 18 7 17 10 20

> 10 10 20 9 19 8 18 NA NA

Was this page helpful?
0 / 5 - 0 ratings