Dplyr: summarise_at using different functions for different variables

Created on 13 Sep 2017  Â·  3Comments  Â·  Source: tidyverse/dplyr

When I use group_by and summarise in dplyr, I can naturally apply different summary functions to different variables. For instance:

library(tidyverse)

    df <- tribble(
      ~category,   ~x,  ~y,  ~z,
      #----------------------
          'a',      4,   6,   8,
          'a',      7,   3,   0,
          'a',      7,   9,   0,
          'b',      2,   8,   8,
          'b',      5,   1,   8,
          'b',      8,   0,   1,
          'c',      2,   1,   1,
          'c',      3,   8,   0,
          'c',      1,   9,   1
     )

    df %>% group_by(category) %>% summarize(
      x=mean(x),
      y=median(y),
      z=first(z)
    )

results in output:

    # A tibble: 3 x 4
      category     x     y     z
         <chr> <dbl> <dbl> <dbl>
    1        a     6     6     8
    2        b     5     1     8
    3        c     2     8     1

My question is, how would I do this with summarise_at? Obviously for this example it's unnecessary, but it would be useful if I have lots of variables that I want to take the mean of, lots of medians, etc.

Obviously, this issue is the same for all the new _all's, _at's and _if's. Perhaps this is a feature still in development; if so, I would be a fan of seeing it released as soon as possible.

Most helpful comment

Hi @profdave, don't know if it will help you but here are some examples in order to illustrate what I understand you want

First, a reminder that summarize_at aims at applying one or more functions to a selection of columns.

library(dplyr, warn.conflicts = F)
df <- tribble(
  ~category,   ~x,  ~y,  ~z,
  #----------------------
  'a',      4,   6,   8,
  'a',      7,   3,   0,
  'a',      7,   9,   0,
  'b',      2,   8,   8,
  'b',      5,   1,   8,
  'b',      8,   0,   1,
  'c',      2,   1,   1,
  'c',      3,   8,   0,
  'c',      1,   9,   1
)
df %>% 
  group_by(category) %>% 
  summarize_at(vars(x, y), funs(min, max))
#> # A tibble: 3 x 5
#>   category x_min y_min x_max y_max
#>      <chr> <dbl> <dbl> <dbl> <dbl>
#> 1        a     4     3     7     9
#> 2        b     2     0     8     8
#> 3        c     1     1     3     9

I understood you want to map several functions to some different specific columns.
Using purrr from the tidyverse, we can get around it like this to illustrate:

library(purrr)
list(c("x"), c("y")) %>% 
  map2(lst(min = min, max = max), ~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>% 
  reduce(inner_join)
#> Joining, by = "category"
#> # A tibble: 3 x 3
#>   category     x     y
#>      <chr> <dbl> <dbl>
#> 1        a     4     9
#> 2        b     2     8
#> 3        c     1     9

In the example above, fist you select some column to apply function in a list, you map them to a list of same length with the different functions you want and it will apply respectively in .x and .y in summarize_at. At then end, you combine the result in a data.frame by joining (reduce apply a function on a list)

It can use every feature of summarize at like applying several functions to several columns.

list(.vars = lst("x", "y", c("y", "z")),
     .funs = lst(min, max, funs(mean = mean, median = median))) %>% 
  pmap(~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>% 
  reduce(inner_join, by = "category")
#> # A tibble: 3 x 7
#>   category     x     y y_mean    z_mean y_median z_median
#>      <chr> <dbl> <dbl>  <dbl>     <dbl>    <dbl>    <dbl>
#> 1        a     4     9      6 2.6666667        6        0
#> 2        b     2     8      3 5.6666667        1        8
#> 3        c     1     9      6 0.6666667        8        1

You can do the same with all summarise_* functions.

Is this the kind of result you seek ? If not, I will delete this post.

Eventually, I do not know if we could implement one function to do that or include it in summarise_at behaviour. However, in the meantime, the examples above could help clarify the FR and help you.

All 3 comments

Hi @profdave, don't know if it will help you but here are some examples in order to illustrate what I understand you want

First, a reminder that summarize_at aims at applying one or more functions to a selection of columns.

library(dplyr, warn.conflicts = F)
df <- tribble(
  ~category,   ~x,  ~y,  ~z,
  #----------------------
  'a',      4,   6,   8,
  'a',      7,   3,   0,
  'a',      7,   9,   0,
  'b',      2,   8,   8,
  'b',      5,   1,   8,
  'b',      8,   0,   1,
  'c',      2,   1,   1,
  'c',      3,   8,   0,
  'c',      1,   9,   1
)
df %>% 
  group_by(category) %>% 
  summarize_at(vars(x, y), funs(min, max))
#> # A tibble: 3 x 5
#>   category x_min y_min x_max y_max
#>      <chr> <dbl> <dbl> <dbl> <dbl>
#> 1        a     4     3     7     9
#> 2        b     2     0     8     8
#> 3        c     1     1     3     9

I understood you want to map several functions to some different specific columns.
Using purrr from the tidyverse, we can get around it like this to illustrate:

library(purrr)
list(c("x"), c("y")) %>% 
  map2(lst(min = min, max = max), ~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>% 
  reduce(inner_join)
#> Joining, by = "category"
#> # A tibble: 3 x 3
#>   category     x     y
#>      <chr> <dbl> <dbl>
#> 1        a     4     9
#> 2        b     2     8
#> 3        c     1     9

In the example above, fist you select some column to apply function in a list, you map them to a list of same length with the different functions you want and it will apply respectively in .x and .y in summarize_at. At then end, you combine the result in a data.frame by joining (reduce apply a function on a list)

It can use every feature of summarize at like applying several functions to several columns.

list(.vars = lst("x", "y", c("y", "z")),
     .funs = lst(min, max, funs(mean = mean, median = median))) %>% 
  pmap(~ df %>% group_by(category) %>% summarise_at(.x, .y)) %>% 
  reduce(inner_join, by = "category")
#> # A tibble: 3 x 7
#>   category     x     y y_mean    z_mean y_median z_median
#>      <chr> <dbl> <dbl>  <dbl>     <dbl>    <dbl>    <dbl>
#> 1        a     4     9      6 2.6666667        6        0
#> 2        b     2     8      3 5.6666667        1        8
#> 3        c     1     9      6 0.6666667        8        1

You can do the same with all summarise_* functions.

Is this the kind of result you seek ? If not, I will delete this post.

Eventually, I do not know if we could implement one function to do that or include it in summarise_at behaviour. However, in the meantime, the examples above could help clarify the FR and help you.

Thanks very much @cderv, it looks like this is exactly what I was talking about. I'll study it more closely (and get myself 100% up to date on purrr) to understand it better. But would it really be so hard to incorporate this functionality into dplyr? You know better than I do, of course, but I think it would be very helpful to the average user.

library(dplyr, warn.conflicts = FALSE)

df <- tribble(
  ~category,   ~x,  ~y,  ~z,
  #----------------------
      'a',      4,   6,   8,
      'a',      7,   3,   0,
      'a',      7,   9,   0,
      'b',      2,   8,   8,
      'b',      5,   1,   8,
      'b',      8,   0,   1,
      'c',      2,   1,   1,
      'c',      3,   8,   0,
      'c',      1,   9,   1
 )

df %>%
  group_by(category) %>%
  summarise_all(funs(mean, median, first))
#> # A tibble: 3 x 10
#>   category x_mean y_mean z_mean x_median y_median z_med… x_fi… y_fi… z_fi…
#>   <chr>     <dbl>  <dbl>  <dbl>    <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl>
#> 1 a          6.00   6.00  2.67      7.00     6.00   0     4.00  6.00  8.00
#> 2 b          5.00   3.00  5.67      5.00     1.00   8.00  2.00  8.00  8.00
#> 3 c          2.00   6.00  0.667     2.00     8.00   1.00  2.00  1.00  1.00
Was this page helpful?
0 / 5 - 0 ratings