Dplyr: Matrix-y version of across()

Created on 9 Feb 2020 · 11Comments · Source: tidyverse/dplyr

Since that will generally be what people want. The major downside would be making it too magical, which is especially challenging for across since it’s hard to debug because it must always be embedded in a verb.

Maybe we need an across variant that returns a matrix usually, and a vector when called within rowwise?

each-row ↕️ feature

Source

hadley

Most helpful comment

We've decided to implement c_across() for this use case.

hadley on 1 Mar 2020

👍2

All 11 comments

If in the rowwise context, across() is likely to be mainly used to select and not to transform (i.e. without defining fns to do stuff like df %>% rowwise() %>% mutate(foo = bar(across())) where bar needs a vector), perhaps there is room for 2 distinct functions rather than turning across() into something too versatile.

courtiol on 9 Feb 2020

I'm not sure I'm following. Do you have some pretend code @hadley ?

romainfrancois on 10 Feb 2020

@romainfrancois, check discussion in #4840

courtiol on 10 Feb 2020

@romainfrancois Also discussion at https://github.com/tidyverse/dplyr/issues/4837

bwiernik on 12 Feb 2020

👍1

The experimental verb lay (https://github.com/romainfrancois/lay) is probably a better idea:

across() would remain type stable
how to apply a function on the tibble returned by across() would be identical whether one is using rowwise() or not

courtiol on 16 Feb 2020

The basic problem (as nicely described by @bwiernik) is that you might want to compute a "rowwise" summary like so:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(w = runif(3), x = runif(3), y = runif(3), z = runif(3))
df %>% rowwise() %>% mutate(m = mean(c(w, x, y, z)))
#> # A tibble: 3 x 5
#> # Rowwise: 
#>       w     x      y      z     m
#>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.364 0.229 0.850  0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778  0.450
#> 3 0.466 0.459 0.599  0.432  0.489

This is obviously tedious if you have many columns.

Currently there are two ways to use across() here:

# Use rowwise() and coerce to a vector:
df %>% rowwise() %>% mutate(m = mean(unlist(across(w:z))))
#> # A tibble: 3 x 5
#> # Rowwise: 
#>       w     x      y      z     m
#>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.364 0.229 0.850  0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778  0.450
#> 3 0.466 0.459 0.599  0.432  0.489

# Use existing rowwise function:
df %>% mutate(m = rowMeans(across(w:z)))
#> # A tibble: 3 x 5
#>       w     x      y      z     m
#>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.364 0.229 0.850  0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778  0.450
#> 3 0.466 0.459 0.599  0.432  0.489
# Or apply
df %>% mutate(m = apply(across(w:z), 1, mean))
#> # A tibble: 3 x 5
#>       w     x      y      z     m
#>   <dbl> <dbl>  <dbl>  <dbl> <dbl>
#> 1 0.364 0.229 0.850  0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778  0.450
#> 3 0.466 0.459 0.599  0.432  0.489

If we had a variant of across() that returned a matrix instead of a data frame, and where the function was applied across rows, rather than columns, we could write:

df %>% rowwise %>% mutate(m = mean(something(w:z)))
df %>% mutate(m = something(w:z, mean)))

It's a bit hard to know what to call this function, but it seems like it should be related to across() since it's closely related.

hadley on 23 Feb 2020

OTOH if this somehow became an additional feature of across() it would also solve’ #4770 because you could write (e.g.) across(is.numeric, ~ .x > 0, row_fn = any)

hadley on 23 Feb 2020

Perhaps, vector_across() or similar would be a good name for a separate function, returning a nrow × ncol matrix, which, following R subsetting of marices, becomes a vector if only one row or column.

bwiernik on 23 Feb 2020

oh ok, so that's essentially this: https://github.com/romainfrancois/lay

Maybe over() as in %>% mutate(m = over(w:z, mean))

I'd argue that having across() handle both a function to apply to each column and another function to apply to each "row" would be "much" for one function.

romainfrancois on 23 Feb 2020

👍2

What about through()?

Here is a dull implementation just to illustrate the syntax:

through <- function(vars, fn) apply(across({{vars}}), 1, rlang::as_function(fn))

library(dplyr)

iris %>% tibble() %>%
  mutate(Petal.Sum = through(starts_with("Petal"), sum))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Sum
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>       <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa       1.60
#>  2          4.9         3            1.4         0.2 setosa       1.60

#> # … with 140 more rows

iris %>% tibble() %>%
  mutate(Petal.Sum = through(starts_with("Petal"), ~ sum(.x)))
#> # A tibble: 150 x 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Sum
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>       <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa       1.60
#>  2          4.9         3            1.4         0.2 setosa       1.60

courtiol on 24 Feb 2020

We've decided to implement c_across() for this use case.

hadley on 1 Mar 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings