What's the best way of doing it?
The following
iris %>% select(which(sapply(., is.numeric)))
works with the latest version of magrittr
. However,
iris %>% mutate_each(funs(log), which(sapply(., is.numeric)))
throws Error in lapply(X = X, FUN = FUN, ...) : object '.' not found
, even with the updated chain operator. Is this the expected behaviour?
In general, what is the best way to filter columns by some boolean condition in dplyr
? Would it be worth having another selection function (in addition to starts_with
, ends_with
, etc.) acting directly on columns instead of column names? E.g. something along the lines of
iris %>% select(satisfies(is.numeric))
Right, just noticed that
iris %>% mutate_each_q(funs(log), which(sapply(., is.numeric)))
does work, which makes sense.
There's no best way currently. I think it's going to be fairly hard to implement across backend types, but maybe I could add some generic methods for determining column types.
The purrr
package now offers nice solutions to @steromano's original question, at least for data frames.
To keep all numeric columns:
iris %>%
purrr::keep(is.numeric) %>%
head(2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
To mutate all numeric columns with a single function:
iris %>%
purrr::map_if(is.numeric, log) %>%
head(2)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 1.629241 1.252763 0.3364722 -1.609438 setosa
#> 2 1.589235 1.098612 0.3364722 -1.609438 setosa
Nice! select_if() worked perfectly. Thank you.
Most helpful comment
The
purrr
package now offers nice solutions to @steromano's original question, at least for data frames.To keep all numeric columns:
To mutate all numeric columns with a single function: