Since that will generally be what people want. The major downside would be making it too magical, which is especially challenging for across since it’s hard to debug because it must always be embedded in a verb.
Maybe we need an across variant that returns a matrix usually, and a vector when called within rowwise?
If in the rowwise context, across() is likely to be mainly used to select and not to transform (i.e. without defining fns to do stuff like df %>% rowwise() %>% mutate(foo = bar(across())) where bar needs a vector), perhaps there is room for 2 distinct functions rather than turning across() into something too versatile.
I'm not sure I'm following. Do you have some pretend code @hadley ?
@romainfrancois, check discussion in #4840
@romainfrancois Also discussion at https://github.com/tidyverse/dplyr/issues/4837
The experimental verb lay (https://github.com/romainfrancois/lay) is probably a better idea:
The basic problem (as nicely described by @bwiernik) is that you might want to compute a "rowwise" summary like so:
library(dplyr, warn.conflicts = FALSE)
df <- tibble(w = runif(3), x = runif(3), y = runif(3), z = runif(3))
df %>% rowwise() %>% mutate(m = mean(c(w, x, y, z)))
#> # A tibble: 3 x 5
#> # Rowwise:
#> w x y z m
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.364 0.229 0.850 0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778 0.450
#> 3 0.466 0.459 0.599 0.432 0.489
This is obviously tedious if you have many columns.
Currently there are two ways to use across() here:
# Use rowwise() and coerce to a vector:
df %>% rowwise() %>% mutate(m = mean(unlist(across(w:z))))
#> # A tibble: 3 x 5
#> # Rowwise:
#> w x y z m
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.364 0.229 0.850 0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778 0.450
#> 3 0.466 0.459 0.599 0.432 0.489
# Use existing rowwise function:
df %>% mutate(m = rowMeans(across(w:z)))
#> # A tibble: 3 x 5
#> w x y z m
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.364 0.229 0.850 0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778 0.450
#> 3 0.466 0.459 0.599 0.432 0.489
# Or apply
df %>% mutate(m = apply(across(w:z), 1, mean))
#> # A tibble: 3 x 5
#> w x y z m
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.364 0.229 0.850 0.0777 0.380
#> 2 0.729 0.282 0.0116 0.778 0.450
#> 3 0.466 0.459 0.599 0.432 0.489
If we had a variant of across() that returned a matrix instead of a data frame, and where the function was applied across rows, rather than columns, we could write:
df %>% rowwise %>% mutate(m = mean(something(w:z)))
df %>% mutate(m = something(w:z, mean)))
It's a bit hard to know what to call this function, but it seems like it should be related to across() since it's closely related.
OTOH if this somehow became an additional feature of across() it would also solve’ #4770 because you could write (e.g.) across(is.numeric, ~ .x > 0, row_fn = any)
Perhaps, vector_across() or similar would be a good name for a separate function, returning a nrow × ncol matrix, which, following R subsetting of marices, becomes a vector if only one row or column.
oh ok, so that's essentially this: https://github.com/romainfrancois/lay
Maybe over() as in %>% mutate(m = over(w:z, mean))
I'd argue that having across() handle both a function to apply to each column and another function to apply to each "row" would be "much" for one function.
What about through()?
Here is a dull implementation just to illustrate the syntax:
through <- function(vars, fn) apply(across({{vars}}), 1, rlang::as_function(fn))
library(dplyr)
iris %>% tibble() %>%
mutate(Petal.Sum = through(starts_with("Petal"), sum))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Sum
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 1.60
#> 2 4.9 3 1.4 0.2 setosa 1.60
#> # … with 140 more rows
iris %>% tibble() %>%
mutate(Petal.Sum = through(starts_with("Petal"), ~ sum(.x)))
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Sum
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 1.60
#> 2 4.9 3 1.4 0.2 setosa 1.60
We've decided to implement c_across() for this use case.
Most helpful comment
We've decided to implement
c_across()for this use case.