I wonder if there is a possibility to add _each to filter() in order to filter on multiple columns without implicitly naming them (this comes in handy for initial validations on dataframes):
df <- data.frame(replicate(5,sample(1:10,10,rep=TRUE)))
Instead of using:
df %>% filter(X1 >= 2, X2 >= 2, X3 >= 2, X4 >= 2, X5 >= 2)
or
df %>% filter(!rowSums(. < 2))
We could use something like: filter_each(funs(. >= 2))
This would be even more convenient in a situation in which we would like to apply a filter on all columns but one and mimic and hypothetical: filter_each(funs(. >= 2), -X5)
Right now the best altervative we've found on SO is:
df %>% slice(which(!rowSums(select(., -matches('X5')) < 2L)))
or
df %>% filter(!rowSums(.[, !colnames(.) %in% 'X5', drop = FALSE] < 2))
Can you please provide a realistic example of when you'd use this?
Lets say I want to filter out the value with the largest difference between it and the column mean, for all columns but X5:
df %>% filter(!X1 == outlier(X5), !X2 == outlier(X2), !X3 == outlier(X3), !X4 == outlier(X4))
I would do something like:
df %>% filter_each(funs(!. == outlier(.)), -X5)
This seems sufficiently esoteric that I don't think it needs to be built into dplyr.
Actually, "filter_each()" function satisfying the above task would be very helpful.
I deal with huge annotation files (Matrix or df) with several columns.And I need to filter the df with "AND" operations on multiple columns.
I would appreciate if you can re-consider to implement this. It will make life easier.
It would be handy if there was a shorthand in dplyr for filtering several columns with the same criteria.
I have a data frame with about 26,000 rows of employee data. Here is an example of a filter that I often perform on the data frame:
universities <- c(1:100, 110:120)
ccentres <- c(โ133โ, โ133aโ, โ133bโ, โ133cโ, โ133dโ, โ130โ, โ135โ)
datfiltered <- dat %>% filter( RESPLEVEL == โ3Aโ,
INSTITUTIONID %in% universities,
CCENTRE1 %in% ccentres |
CCENTRE2 %in% ccentres |
CCENTRE3 %in% ccentres |
CCENTRE4 %in% ccentres |
CCENTRE5 %in% ccentres |
CCENTRE6 %in% ccentres )
There is filter_at(), filter_if() and filter_all() in the dev version.
Most helpful comment
There is
filter_at(),filter_if()andfilter_all()in the dev version.