As a simple example, let's say you have a normalising constant in one dataframe column and you need to normalise many other columns. mutate_at
seems like it would be convenient to do this, but you can't refer to other columns in the dataframe apart from .
cleanly.
Example:
library(tidyverse)
## Works but hardcoding dataset name feels lame.
iris %>%
mutate_at(vars(starts_with("Sepal")),
~ . / iris$Petal.Width) %>%
head()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.50 1.4 0.2 setosa
#> 2 24.5 15.00 1.4 0.2 setosa
#> 3 23.5 16.00 1.3 0.2 setosa
#> 4 23.0 15.50 1.5 0.2 setosa
#> 5 25.0 18.00 1.4 0.2 setosa
#> 6 13.5 9.75 1.7 0.4 setosa
## Fails
iris %>%
mutate_at(vars(starts_with("Sepal")),
~ . / Petal.Width) %>%
head()
#> Error in mutate_impl(.data, dots): Evaluation error: object 'Petal.Width' not found.
## Fails differently
iris %>%
mutate_at(vars(starts_with("Sepal")),
~ . / .data$Petal.Width) %>%
head()
#> Error in mutate_impl(.data, dots): Column `Sepal.Length` must be length 150 (the number of rows) or one, not 0
Created on 2019-02-15 by the reprex package (v0.2.1)
Not sure what's happening here, but this works:
library(dplyr, warn.conflicts = FALSE)
iris %>%
mutate_at(vars(starts_with("Sepal")),
list(~ . / .data$Petal.Width)) %>%
head()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.50 1.4 0.2 setosa
#> 2 24.5 15.00 1.4 0.2 setosa
#> 3 23.5 16.00 1.3 0.2 setosa
#> 4 23.0 15.50 1.5 0.2 setosa
#> 5 25.0 18.00 1.4 0.2 setosa
#> 6 13.5 9.75 1.7 0.4 setosa
Created on 2019-02-15 by the reprex package (v0.2.1.9000)
Oh thanks!
Well that sort of solves MY problem, but there seems to be some inconsistency here worth cleaning up. I also discovered this, which really feels like it should work:
library(tidyverse)
## fails
iris %>%
mutate_at(vars(starts_with("Sepal")),
function(col, normaliser){
col / normaliser
},
.data$Petal.Width) %>%
head()
#> Error in mutate_impl(.data, dots): Column `Sepal.Length` must be length 150 (the number of rows) or one, not 0
## works
iris %>%
mutate_at(vars(starts_with("Sepal")),
function(col, normaliser){
col / normaliser
},
iris$Petal.Width) %>%
head()
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.50 1.4 0.2 setosa
#> 2 24.5 15.00 1.4 0.2 setosa
#> 3 23.5 16.00 1.3 0.2 setosa
#> 4 23.0 15.50 1.5 0.2 setosa
#> 5 25.0 18.00 1.4 0.2 setosa
#> 6 13.5 9.75 1.7 0.4 setosa
Created on 2019-02-15 by the reprex package (v0.2.1)
If .funs
is a formula, it is processed by rlang::as_function()
, but, if .funs
is a list, it is processed by dplyr:::as_fun()
. Probably, the former should also be handled by as_fun()
? (Or, does as_function()
need .env
argument specified?)
https://github.com/tidyverse/dplyr/blob/ce7f2bc2ecfe1c6085126da8cc4d4640ec8f4f1f/R/funs.R#L85-L99
I thought we can just remove this if branch, but things seem a bit more complicated...
https://github.com/tidyverse/dplyr/blob/d2e77ee6a83cd301cb7c1e73383347d69e757a9d/R/funs.R#L91-L92
The change make some tests fail, because as_fun()
is yet different from as_function()
in that it accepts .
only.
library(dplyr, warn.conflicts = FALSE)
library(rlang)
d <- head(mtcars, 2)
# c.f. https://github.com/tidyverse/dplyr/blob/d2e77ee6a83cd301cb7c1e73383347d69e757a9d/tests/testthat/test-colwise-select.R#L104
select_if(d, ~ is_integerish(.x))
#> mpg cyl disp hp vs am gear carb
#> Mazda RX4 21 6 160 110 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 0 1 4 4
select_if(d, list(~ is_integerish(.x)))
#> Error: Can't convert a list to function
Created on 2019-02-15 by the reprex package (v0.2.1)
Should this work? (might be related to #4150)
library(dplyr, warn.conflicts = FALSE)
summarise_at(data.frame(x = c(1:10, NA)), vars(starts_with("x")), ~mean(.), na.rm = TRUE)
#> x
#> 1 NA
Created on 2019-02-17 by the reprex package (v0.2.1)
In summary, we can provide the followings as .funs
:
or
library(dplyr, warn.conflicts = FALSE)
# a function
summarise_at(data.frame(x = c(1:10, NA)), vars(x), mean, na.rm = TRUE)
#> x
#> 1 5.5
summarise_at(data.frame(x = c(1:10, NA)), vars(x), function(x, ...) mean(x, ...), na.rm = TRUE)
#> x
#> 1 5.5
# a name of a function
summarise_at(data.frame(x = c(1:10, NA)), vars(x), "mean", na.rm = TRUE)
#> x
#> 1 5.5
# a purrr-style lambda notation
summarise_at(data.frame(x = c(1:10, NA)), vars(x), ~mean(.), na.rm = TRUE)
#> x
#> 1 NA
# wrapped by list
summarise_at(data.frame(x = c(1:10, NA)), vars(x),
list(mean, function(x, ...) mean(x, ...), "mean", ~mean(.)),
na.rm = TRUE)
#> fn1 fn2 mean..3 mean..4
#> 1 5.5 5.5 5.5 5.5
Created on 2019-02-17 by the reprex package (v0.2.1)
Among those, currently it's only a bare purrr-style lambda function notation that is handled by rlang::as_function()
and ignores ...
. Semantically, I feel this is the right behaviour. I don't think na.rm = TRUE
should be passed to mean()
in the following.
summarise_at(data.frame(x = c(1:10, NA)), vars(starts_with("x")), ~mean(.), na.rm = TRUE)
But... probably this is needed for backward-compatibility for funs()
?
By the way, I'm not sure if this should work. Is ...
inside of the context of mutate()
?
iris %>%
mutate_at(vars(starts_with("Sepal")),
function(col, normaliser){
col / normaliser
},
.data$Petal.Width) %>%
head()
All these should work:
library(dplyr, warn.conflicts = FALSE)
iris <- head(iris, 3)
iris %>%
mutate_if(is.numeric, ~ . / iris$Petal.Width)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, ~ . / Petal.Width)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, ~ . / .data$Petal.Width)
#> Error: Column `Sepal.Length` must be length 3 (the number of rows) or one, not 0
iris %>%
mutate_if(is.numeric, ~ .x / iris$Petal.Width)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, ~ .x / Petal.Width)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, ~ .x / .data$Petal.Width)
#> Error: Column `Sepal.Length` must be length 3 (the number of rows) or one, not 0
iris %>%
mutate_if(is.numeric, list(~ . / iris$Petal.Width))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, list(~ . / Petal.Width))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, list(~ . / .data$Petal.Width))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 25.5 17.5 7.0 1 setosa
#> 2 24.5 15.0 7.0 1 setosa
#> 3 23.5 16.0 6.5 1 setosa
iris %>%
mutate_if(is.numeric, list(~ .x / iris$Petal.Width))
#> Error: object '.x' not found
iris %>%
mutate_if(is.numeric, list(~ .x / Petal.Width))
#> Error: object '.x' not found
iris %>%
mutate_if(is.numeric, list(~ .x / .data$Petal.Width))
#> Error: object '.x' not found
There are I believe two issues:
When formulas are used, e.g. ~ . / Petal.Width
we go through rlang::as_function()
there is a bit of internal handling for this, but apparently .data
is not in scope
List of formulas go through as_fun
instead of as_function
so .x
is not properly recognized as equivalent to .
@MilesMcBain this should not work:
iris %>%
mutate_at(vars(starts_with("Sepal")),
function(col, normaliser){
col / normaliser
},
.data$Petal.Width)
The additional arguments are not quoted or dealt with specially, so .data$Petal.Width
is evaluated normally.
@yutannihilation I guess you'd reason that if you supply a formula, you are in total control of the arguments, however here's one trick:
library(dplyr)
tbl <- data.frame(x = c(1:10, NA))
tbl %>%
summarise_at(vars(starts_with("x")), ~ mean(.) , na.rm = TRUE)
#> x
#> 1 NA
tbl %>%
summarise_at(vars(starts_with("x")), ~ mean(...) , na.rm = TRUE)
#> x
#> 1 5.5
Wow, cool... I didn't notice this.
It might just be an accidental feature 馃
The additional arguments are not quoted or dealt with specially
FYI, I found we can use !!!
and !!
(only for LHS) for additional arguments by the power of rlang::list2()
. Is this intensional feature...?
library(dplyr, warn.conflicts = FALSE)
d <- data.frame(x = 1:4)
mutate_at(d, "x", list(function(x, y, z) x + y + z), y = 10, z = 100)
#> x
#> 1 111
#> 2 112
#> 3 113
#> 4 114
mutate_at(d, "x", list(function(x, y, z) x + y + z), !!!list(y = 10, z = 100))
#> x
#> 1 111
#> 2 112
#> 3 113
#> 4 114
a <- quo(y)
b <- quo(z)
mutate_at(d, "x", list(function(x, y, z) x + y + z), !!a := 10, !!b := 100)
#> x
#> 1 111
#> 2 112
#> 3 113
#> 4 114
Created on 2019-02-22 by the reprex package (v0.2.1)
Sounds legit, the list2()
call is in dplyr:::as_fun_list()
I see.
Is this intensional feature...?
I think it's accidental and I wouldn't rely on it. For instance purrr doesn't support splicing in argument lists and probably will not.
(Unless the mapped function supports splicing itself of course.)
Oh, should I file a new issue for it? I guess here will be soon closed by #4217.
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/