Dplyr: Use mutate_at with multiple sets of .vars and multiple .funs

Created on 26 Jun 2019 · 4Comments · Source: tidyverse/dplyr

Hi,

I think this is a feature request (certainly I don't think there is a neat way to do this at the moment).

Could mutate_at and friends be adapted to be vectorised for multiple vars sets, each acting on a different set of .funs?

I have a set of transformations to make on my dataset.

Firstly, we have a set of columns containing TeamCodes which are provided to me as strings in CSV, either numeric or beginning with C followed by a number. Each of those column names ends in TeamCode. We can more cleanly manage those more easily by tidying them to be a signed integer, where the C is just used for the sign.

We then have a set of other operations - such as changing a series of columns from a factor to a logical value.

If I was acting upon a single variable, I'd do them all within a single mutate call:

my_data <- dplyr::mutate(my_data,
  TeamCode = as.integer(sub("C", "-", x, ignore.case = TRUE)),
  S2CoMAtrialFibrillation =
    (.data[["S2CoMAtrialFibrillation"]] == "Y"))

But because we're using sets of variables, as far as I understand it, I can only work on a single set at a time:

  my_data <- dplyr::mutate_at(my_data,
    .vars = dplyr::vars(dplyr::ends_with("TeamCode")),
    .funs = teamcode_to_number
  )

    my_data <- dplyr::mutate_at(my_data,
      .vars = dplyr::vars(optional_boolean_column_names),
      .funs = factor_to_logical_y_na_n)

Would it be possible to consider modifying mutate_at and the scoped variables so that .vars could be a list of vars (indicating multiple sets of variables to modify). Then .funs would need to be a list of the same length (or a list of lists if you wanted to do multiple functions on that variable set).

That would achieve parity with what is possible with the unstopped version of mutate whilst still allowing scoping.

Apologies if there is a much better way to do this that I haven't seen!

Source

md0u80c9

Most helpful comment

Usually, when some task involves lists, there is a good chance it can be done with purrr package. If I understood you correctly, your goal is to consecutively apply mutate_at() transformations on some data frame/tibble. This type of operations usually can be done with some kind of "reduce" functions: they start with some initial value and consecutively apply some function (previous output serves as input for current) going through vector of arguments.

"Vectorized mutate_at()" can be done with reduce2() function from purrr: it goes through two vectors (lists) of arguments simultaneously while consecutively passing modified version of initial value (data frame in this case). Here is a toy example of its application in this case:

library(tidyverse)
data <- tibble(a1 = 1, a2 = 2, b1 = 3, b2 = 4)

# Initialize lists for `.var` and `.funs`
vars_list <- list(vars(starts_with("a")), vars(ends_with("1")))
funs_list <- list(~ . + 1, ~ . + 10)

# Consecutively transform `data` with `mutate_at()`. Here `..1`, `..2`, and `..3` mean first,
# second, and third arguments of anonymous lambda-function
reduce2(
  .x = vars_list,
  .y = funs_list,
  .f = ~mutate_at(..1, .vars = ..2, .funs = ..3),
  .init = data
)
#> # A tibble: 1 x 4
#>      a1    a2    b1    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    12     3    13     4

# Result is the same as after these consecutive computations
data %>% 
  mutate_at(.vars = vars_list[[1]], .funs = funs_list[[1]]) %>% 
  mutate_at(.vars = vars_list[[2]], .funs = funs_list[[2]])
#> # A tibble: 1 x 4
#>      a1    a2    b1    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    12     3    13     4

^{Created on 2019-06-27 by the reprex package (v0.3.0)}

echasnovski on 27 Jun 2019

👍3

All 4 comments

library(tidyverse)
data <- tibble(a1 = 1, a2 = 2, b1 = 3, b2 = 4)

# Initialize lists for `.var` and `.funs`
vars_list <- list(vars(starts_with("a")), vars(ends_with("1")))
funs_list <- list(~ . + 1, ~ . + 10)

# Consecutively transform `data` with `mutate_at()`. Here `..1`, `..2`, and `..3` mean first,
# second, and third arguments of anonymous lambda-function
reduce2(
  .x = vars_list,
  .y = funs_list,
  .f = ~mutate_at(..1, .vars = ..2, .funs = ..3),
  .init = data
)
#> # A tibble: 1 x 4
#>      a1    a2    b1    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    12     3    13     4

# Result is the same as after these consecutive computations
data %>% 
  mutate_at(.vars = vars_list[[1]], .funs = funs_list[[1]]) %>% 
  mutate_at(.vars = vars_list[[2]], .funs = funs_list[[2]])
#> # A tibble: 1 x 4
#>      a1    a2    b1    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    12     3    13     4

^{Created on 2019-06-27 by the reprex package (v0.3.0)}

echasnovski on 27 Jun 2019

👍3

It might be a case where a straight for loop is simpler to grasp:

library(dplyr, warn.conflicts = FALSE)

data <- tibble(a1 = 1, a2 = 2, b1 = 3, b2 = 4)

mutate2 <- function(data, .vars, .funs) {
  stopifnot(length(.vars) == length(.funs))
  for (i in seq_along(.vars)) {
    data <- mutate_at(data, .vars[[i]], .funs[[i]])
  }
  data
}
mutate2(data, 
  list(vars(starts_with("a")), vars(ends_with("1"))), 
  list(~ . + 1, ~ . + 10)
)
#> # A tibble: 1 x 4
#>      a1    a2    b1    b2
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    12     3    13     4

^{Created on 2019-07-02 by the reprex package (v0.3.0.9000)}

romainfrancois on 2 Jul 2019

In any case, this is out of scope for dplyr.

romainfrancois on 2 Jul 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/