Hello,
I noticed that across mixed up the column names in the output when I used multiple functions all at once. Please see the reprex blow
Thanks!
library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)
### ok
iris %>%
group_by(Species) %>%
summarise_if(is.numeric, ~ mean(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46 0.246
#> 2 versicolor 5.94 2.77 4.26 1.33
#> 3 virginica 6.59 2.97 5.55 2.03
iris %>%
group_by(Species) %>%
summarise_if(is.numeric, ~ max(.x, na.rm = TRUE))
#> # A tibble: 3 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> * <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.8 4.4 1.9 0.6
#> 2 versicolor 7 3.4 5.1 1.8
#> 3 virginica 7.9 3.8 6.9 2.5
### also ok
iris %>%
group_by(Species) %>%
summarise(
across(c(Sepal.Length:Petal.Width), ~ mean(.x, na.rm = TRUE), names = "mean_{col}"),
across(c(Sepal.Length:Petal.Width), ~ max(.x, na.rm = TRUE), names = "max_{col}")
)
#> # A tibble: 3 x 9
#> Species mean_Sepal.Length mean_Sepal.Width mean_Petal.Length
#> * <fct> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46
#> 2 versicolor 5.94 2.77 4.26
#> 3 virginica 6.59 2.97 5.55
#> mean_Petal.Width max_Sepal.Length max_Sepal.Width max_Petal.Length
#> * <dbl> <dbl> <dbl> <dbl>
#> 1 0.246 5.8 4.4 1.9
#> 2 1.33 7 3.4 5.1
#> 3 2.03 7.9 3.8 6.9
#> max_Petal.Width
#> * <dbl>
#> 1 0.6
#> 2 1.8
#> 3 2.5
Output has wrong column names with multiple functions
my_func <- list(
mean = ~ mean(., na.rm = TRUE),
max = ~ max(., na.rm = TRUE)
)
iris %>%
group_by(Species) %>%
summarise(across(is.numeric, my_func, names = "{fn}.{col}"))
#> # A tibble: 3 x 9
#> Species mean.Sepal.Length max.Sepal.Length mean.Sepal.Width max.Sepal.Width
#> * <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46 0.246
#> 2 versicolor 5.94 2.77 4.26 1.33
#> 3 virginica 6.59 2.97 5.55 2.03
#> mean.Petal.Length max.Petal.Length mean.Petal.Width max.Petal.Width
#> * <dbl> <dbl> <dbl> <dbl>
#> 1 5.8 4.4 1.9 0.6
#> 2 7 3.4 5.1 1.8
#> 3 7.9 3.8 6.9 2.5
Created on 2020-03-10 by the reprex package (v0.3.0)
Looks like the problem only arises when names are set. Here's a more minimal reprex:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(x = 1:3, y = 2:4)
df %>% summarise(across(everything(), list(min = min, max = max)))
#> # A tibble: 1 x 4
#> x_min x_max y_min y_max
#> <int> <int> <int> <int>
#> 1 1 2 3 4
df %>% summarise(across(everything(), list(min = min, max = max), names = "{fn}.{col}"))
#> # A tibble: 1 x 4
#> min.x max.x min.y max.y
#> <int> <int> <int> <int>
#> 1 1 2 3 4
Created on 2020-03-11 by the reprex package (v0.3.0)
Thanks for spotting this!
Thanks @hadley ! I got the same output w/ or w/o names arg
library(dplyr, warn.conflicts = FALSE)
options(tibble.width = Inf)
my_func <- list(
mean = ~ mean(., na.rm = TRUE),
max = ~ max(., na.rm = TRUE)
)
iris %>%
group_by(Species) %>%
summarise(across(is.numeric, my_func))
#> # A tibble: 3 x 9
#> Species Sepal.Length_mean Sepal.Length_max Sepal.Width_mean Sepal.Width_max
#> * <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46 0.246
#> 2 versicolor 5.94 2.77 4.26 1.33
#> 3 virginica 6.59 2.97 5.55 2.03
#> Petal.Length_mean Petal.Length_max Petal.Width_mean Petal.Width_max
#> * <dbl> <dbl> <dbl> <dbl>
#> 1 5.8 4.4 1.9 0.6
#> 2 7 3.4 5.1 1.8
#> 3 7.9 3.8 6.9 2.5
Created on 2020-03-11 by the reprex package (v0.3.0)
I got the same output w/ or w/o
namesarg
So did @hadley! 馃槈
With the example hadley gave, it's returning (but mislabelling) min.x, min.y, max.x, max.y.
@batpigandme you are right. I didn't notice that :)
Oh yeah, I was confused.
Just wanted to post but came across this issue. I think it's another example of the same problem so I am adding it here. Note that the lead and lags are all mixed up and wrong.
```
tibble(x1 = 1:10, x2 = 11:20) %>%
mutate(across(
starts_with("x"),
list(lag1 = ~lag(.x, 1), lag2 = ~lag(.x, 2), lead1 = ~lead(.x, 1)),
))