Dplyr: Feed formulas into summarise() as objects

Created on 25 Apr 2018  路  4Comments  路  Source: tidyverse/dplyr

I've spent a decent amount of time reading (and rereading and rereading) various explanations of NSE and dplyr programming, but have not been able to solve what seems like it should be a simple problem. I'd like to define summarization formulas in advance, and then write a function that feeds those formulas into summarise() for evaluation.

The motivation for this is to build a package specific to a set of data that sets up complicated aggregations of that data in a standard way that doesn't need to be remembered by the users, but allows them to quickly and easily pull the data and aggregate it properly, by only remembering the name of the aggregation.

Here's an example of what I mean:

price_per_carat <- ~ sum(price) / sum(carat)
price_depth_ratio <- ~ sum(price) / sum(depth)

summarise_as <- function(.data, ...) {
  # some kind of black magic
}

diamonds %>%
  group_by(cut, clarity)
  summarise_as(price_per_carat, price_depth_ratio)

# A tibble: 40 x 4
# Groups:   cut [?]
     cut clarity price_per_carat price_depth_ratio
   <ord>   <ord>           <dbl>             <dbl>
 1  Fair      I1        2721.185          56.37119
 2  Fair     SI2        4297.840          80.31302
 3  Fair     SI1        4362.573          65.84666
 4  Fair     VS2        4715.875          65.60159
 5  Fair     VS1        4734.064          66.17081
 6  Fair    VVS2        4843.546          53.29967
 7  Fair    VVS1        5824.159          64.07653
 8  Fair      IF        4030.679          31.83685
 9  Good      I1        2989.670          57.94697
10  Good     SI2        4424.404          73.58696
# ... with 30 more rows

In case that wasn't perfectly clear, I'm basically trying to find a way to get the summarise_as(price_per_carat, price_depth_ratio) to behave as I had typed summarise(price_per_carat = sum(price) / sum(carat), price_depth_ratio = sum(price) / sum(depth)).

I've tried all sorts of forms of quo(), enquo(), as.formula, deparse(substitute()), etc. to try to get it to work, and all have failed. What's the proper way to do this?

Most helpful comment

Maybe that's easier:

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice

exprs <- list( 
  price_per_carat = expr(sum(price) / sum(carat)), 
  price_depth_ratio = expr(sum(price) / sum(depth))
)

diamonds %>%
  group_by(cut, clarity) %>%
  summarise(!!!exprs)
#> # A tibble: 40 x 4
#> # Groups:   cut [?]
#>    cut   clarity price_per_carat price_depth_ratio
#>    <ord> <ord>             <dbl>             <dbl>
#>  1 Fair  I1                2721.              56.4
#>  2 Fair  SI2               4298.              80.3
#>  3 Fair  SI1               4363.              65.8
#>  4 Fair  VS2               4716.              65.6
#>  5 Fair  VS1               4734.              66.2
#>  6 Fair  VVS2              4844.              53.3
#>  7 Fair  VVS1              5824.              64.1
#>  8 Fair  IF                4031.              31.8
#>  9 Good  I1                2990.              57.9
#> 10 Good  SI2               4424.              73.6
#> # ... with 30 more rows

Created on 2018-04-25 by the reprex package (v0.2.0).

All 4 comments

If you want to use formulas, here is one way:

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice

summarise_as <- function(.data, ...) {
  dots  <- quos(...)
  names <- map_chr(dots, ~as.character(quo_get_expr(.x)))
  names(dots) <- names

  get_expr <- function(q){
    formula <- eval_tidy(quo_get_expr(q), quo_get_env(q))
    expr(!!formula[[2]])
  }

  exprs <- map( dots, get_expr)
  summarise(.data, !!!exprs)
}

price_per_carat <- ~ sum(price) / sum(carat)
price_depth_ratio <- ~ sum(price) / sum(depth)

diamonds %>%
  group_by(cut, clarity) %>%
  summarise_as(price_per_carat, price_depth_ratio)
#> # A tibble: 40 x 4
#> # Groups:   cut [?]
#>    cut   clarity price_per_carat price_depth_ratio
#>    <ord> <ord>             <dbl>             <dbl>
#>  1 Fair  I1                2721.              56.4
#>  2 Fair  SI2               4298.              80.3
#>  3 Fair  SI1               4363.              65.8
#>  4 Fair  VS2               4716.              65.6
#>  5 Fair  VS1               4734.              66.2
#>  6 Fair  VVS2              4844.              53.3
#>  7 Fair  VVS1              5824.              64.1
#>  8 Fair  IF                4031.              31.8
#>  9 Good  I1                2990.              57.9
#> 10 Good  SI2               4424.              73.6
#> # ... with 30 more rows

maybe @lionel- has a better approach

Maybe that's easier:

library(tidyverse)
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice

exprs <- list( 
  price_per_carat = expr(sum(price) / sum(carat)), 
  price_depth_ratio = expr(sum(price) / sum(depth))
)

diamonds %>%
  group_by(cut, clarity) %>%
  summarise(!!!exprs)
#> # A tibble: 40 x 4
#> # Groups:   cut [?]
#>    cut   clarity price_per_carat price_depth_ratio
#>    <ord> <ord>             <dbl>             <dbl>
#>  1 Fair  I1                2721.              56.4
#>  2 Fair  SI2               4298.              80.3
#>  3 Fair  SI1               4363.              65.8
#>  4 Fair  VS2               4716.              65.6
#>  5 Fair  VS1               4734.              66.2
#>  6 Fair  VVS2              4844.              53.3
#>  7 Fair  VVS1              5824.              64.1
#>  8 Fair  IF                4031.              31.8
#>  9 Good  I1                2990.              57.9
#> 10 Good  SI2               4424.              73.6
#> # ... with 30 more rows

Created on 2018-04-25 by the reprex package (v0.2.0).

I'll close this now. Perhaps consider https://community.rstudio.com for things that are questions more than issues.

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

Was this page helpful?
0 / 5 - 0 ratings