When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:
library(tidyverse)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>% group_by(category) %>% summarize(
x=mean(x),
y=median(y),
z=first(z)
)
results in output:
# A tibble: 3 x 4
category x y z
<chr> <dbl> <dbl> <dbl>
1 a 6 6 8
2 b 5 1 8
3 c 2 8 1
My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but it would be useful if I have lots of variables that I want to take the mean of, lots of medians, etc.
Obviously, this issue is the same for all the new _all's, _at's and _if's. Perhaps this is a feature still in development; if so, I would be a fan of seeing it released as soon as possible.
Hi @profdave, don't know if it will help you but here are some examples in order to illustrate what I understand you want
First, a reminder that summarize_at
aims at applying one or more functions to a selection of columns.
library(dplyr, warn.conflicts = F)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>%
group_by(category) %>%
summarize_at(vars(x, y), funs(min, max))
#> # A tibble: 3 x 5
#> category x_min y_min x_max y_max
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 a 4 3 7 9
#> 2 b 2 0 8 8
#> 3 c 1 1 3 9
I understood you want to map several functions to some different specific columns.
Using purrr
from the tidyverse
, we can get around it like this to illustrate:
library(purrr)
list(c("x"), c("y")) %>%
map2(lst(min = min, max = max), ~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>%
reduce(inner_join)
#> Joining, by = "category"
#> # A tibble: 3 x 3
#> category x y
#> <chr> <dbl> <dbl>
#> 1 a 4 9
#> 2 b 2 8
#> 3 c 1 9
In the example above, fist you select some column to apply function in a list, you map them to a list of same length with the different functions you want and it will apply respectively in .x
and .y
in summarize_at
. At then end, you combine the result in a data.frame by joining (reduce
apply a function on a list)
It can use every feature of summarize at
like applying several functions to several columns.
list(.vars = lst("x", "y", c("y", "z")),
.funs = lst(min, max, funs(mean = mean, median = median))) %>%
pmap(~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>%
reduce(inner_join, by = "category")
#> # A tibble: 3 x 7
#> category x y y_mean z_mean y_median z_median
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 4 9 6 2.6666667 6 0
#> 2 b 2 8 3 5.6666667 1 8
#> 3 c 1 9 6 0.6666667 8 1
You can do the same with all summarise_*
functions.
Is this the kind of result you seek ? If not, I will delete this post.
Eventually, I do not know if we could implement one function to do that or include it in summarise_at
behaviour. However, in the meantime, the examples above could help clarify the FR and help you.
Thanks very much @cderv, it looks like this is exactly what I was talking about. I'll study it more closely (and get myself 100% up to date on purrr) to understand it better. But would it really be so hard to incorporate this functionality into dplyr? You know better than I do, of course, but I think it would be very helpful to the average user.
library(dplyr, warn.conflicts = FALSE)
df <- tribble(
~category, ~x, ~y, ~z,
#----------------------
'a', 4, 6, 8,
'a', 7, 3, 0,
'a', 7, 9, 0,
'b', 2, 8, 8,
'b', 5, 1, 8,
'b', 8, 0, 1,
'c', 2, 1, 1,
'c', 3, 8, 0,
'c', 1, 9, 1
)
df %>%
group_by(category) %>%
summarise_all(funs(mean, median, first))
#> # A tibble: 3 x 10
#> category x_mean y_mean z_mean x_median y_median z_med… x_fi… y_fi… z_fi…
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 6.00 6.00 2.67 7.00 6.00 0 4.00 6.00 8.00
#> 2 b 5.00 3.00 5.67 5.00 1.00 8.00 2.00 8.00 8.00
#> 3 c 2.00 6.00 0.667 2.00 8.00 1.00 2.00 1.00 1.00
Most helpful comment
Hi @profdave, don't know if it will help you but here are some examples in order to illustrate what I understand you want
First, a reminder that
summarize_at
aims at applying one or more functions to a selection of columns.I understood you want to map several functions to some different specific columns.
Using
purrr
from thetidyverse
, we can get around it like this to illustrate:In the example above, fist you select some column to apply function in a list, you map them to a list of same length with the different functions you want and it will apply respectively in
.x
and.y
insummarize_at
. At then end, you combine the result in a data.frame by joining (reduce
apply a function on a list)It can use every feature of
summarize at
like applying several functions to several columns.You can do the same with all
summarise_*
functions.Is this the kind of result you seek ? If not, I will delete this post.
Eventually, I do not know if we could implement one function to do that or include it in
summarise_at
behaviour. However, in the meantime, the examples above could help clarify the FR and help you.